[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types (note that invalid casts will fail with an error just like invalid DECIMAL_V2 casts, while failed casts to other types do no lead to warning or error): - from STRING to DATE. The string value must be formatted as -MM-dd HH:mm:ss.S. The date component is mandatory, the time component is optional. If the time component is present, it will be truncated silently. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 24: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 24 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 23 Apr 2019 13:33:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 24: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4052/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 24 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 23 Apr 2019 08:16:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 24: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 24 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 23 Apr 2019 08:16:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java File fe/src/main/java/org/apache/impala/catalog/Function.java: http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@191 PS11, Line 191: public boolean compare(Function other, CompareMode mode) { > Thanks for putting this together! I checked this and Attila's points make sense to me. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 11 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Sat, 20 Apr 2019 00:26:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 23: Code-Review+2 Todd is out right now, but I took a look at the changes and believe you addressed his last round of feedback. Let's get this merged and we can do any fixups later. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 23 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Sat, 20 Apr 2019 00:24:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 23: Code-Review+1 (1 comment) Looked at the delta, good catch by csaba http://gerrit.cloudera.org:8080/#/c/12481/19/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/19/be/src/exprs/cast-functions-ir.cc@312 PS19, Line 312: if (UNLIKELY(!dv.IsValid())) { : ctx->SetError("String to Date parse failed."); : return DateVal::null(); : } > Casts from STRING to most types don't return an error and don't emit a warn Yeah this was a good point that I missed. We've tried to be stricter with new functionality, e.g. DECIMAL_V2 errors in more cases than before. I think in principle we should avoid different behaviour between constants and variables, i.e. f(1) and f(x) where x=1 should have exactly the same behaviour. This is an invariant that most programmers would use to reason about their programs. So the current patch's behaviour is good. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 23 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 18 Apr 2019 00:31:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 23: Code-Review+1 Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 23 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 17 Apr 2019 18:09:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 23: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2821/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 23 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 17 Apr 2019 14:23:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#23). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types (note that invalid casts will fail with an error just like invalid DECIMAL_V2 casts, while failed casts to other types do no lead to warning or error): - from STRING to DATE. The string value must be formatted as -MM-dd HH:mm:ss.S. The date component is mandatory, the time component is optional. If the time component is present, it will be truncated silently. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. -
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 23: > Uploaded patch set 23. Change has been rebased -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 23 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 17 Apr 2019 13:42:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 22: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 22 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 17 Apr 2019 09:58:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 22: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2815/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 22 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 17 Apr 2019 09:07:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types (note that invalid casts will fail with an error just like invalid DECIMAL_V2 casts, while failed casts to other types do no lead to warning or error): - from STRING to DATE. The string value must be formatted as -MM-dd HH:mm:ss.S. The date component is mandatory, the time component is optional. If the time component is present, it will be truncated silently. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. -
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 21: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/12481/21//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/21//COMMIT_MSG@21 PS21, Line 21: casts will fail with an error, just like invalid DECIMAL_V2 casts): It could be added that "while failed casts to other types do no lead to warning or error". -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 21 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 16 Apr 2019 14:04:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 21: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2740/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 21 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 11 Apr 2019 21:09:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 21: Fixed a couple failing e2e tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 21 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 11 Apr 2019 20:26:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types (note that invalid casts will fail with an error, just like invalid DECIMAL_V2 casts): - from STRING to DATE. The string value must be formatted as -MM-dd HH:mm:ss.S. The date component is mandatory, the time component is optional. If the time component is present, it will be truncated silently. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2736/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 20 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 11 Apr 2019 15:31:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types (note that invalid casts will fail with an error, just like invalid DECIMAL_V2 casts): - from STRING to DATE. The string value must be formatted as -MM-dd HH:mm:ss.S. The date component is mandatory, the time component is optional. If the time component is present, it will be truncated silently. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 19: (4 comments) http://gerrit.cloudera.org:8080/#/c/12481/19//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/19//COMMIT_MSG@20 PS19, Line 20: Explicit casting between DATE and other types: Please mention that invalid conversion result in a warning, which is different than the other casts work. http://gerrit.cloudera.org:8080/#/c/12481/19//COMMIT_MSG@24 PS19, Line 24: truncted typo: truncated http://gerrit.cloudera.org:8080/#/c/12481/19/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/19/be/src/exprs/cast-functions-ir.cc@312 PS19, Line 312: if (UNLIKELY(!dv.IsValid())) { : ctx->AddWarning("String to Date parse failed."); : return DateVal::null(); : } I totally prefer returning an error if we are casting a constant, but I am not sure about values fetched during the query. What will happen is that warnings will be added to the list of general errors, and when the number of errors reach MAX_ERRORS (query option, 100 by default in my minicluster), the query will fail. If we want to fail more consistently, SetError() could be called instead of AddWarning(). Postgres seems to do this, it fails at the first row it cannot cast. On the other side, the "return NULL without warning" seems like a useful thing when the values are from a table, as it makes it easy for the user to handle the case when not every string is a valid date. So I would prefer to return error only for constant casts. FunctionContext has a IsArgConstant() functions, but I don't know whether it works during constant folding. If we want to postpone this decision, an error could be returned for now, and we could change Impala to be more forgiving in the future and have some special logic for constant casts. http://gerrit.cloudera.org:8080/#/c/12481/19/be/src/exprs/cast-functions-ir.cc@323 PS19, Line 323: ctx->AddWarning("Timestamp to Date conversion failed. " : "Timestamp has no date component."); Note that "dateless timestamps" are nearly useless, see IMPALA-5942. I regret not dropping them among the breaking changes in Impala 3.0. A comment could be added to related functions/tests to avoid giving the impression that these can be actually used for things. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 19 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 10 Apr 2019 14:35:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 19: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 19 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 08 Apr 2019 22:00:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 19: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2658/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 19 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 05 Apr 2019 14:26:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 19: (2 comments) http://gerrit.cloudera.org:8080/#/c/12481/18/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/18/be/src/exprs/cast-functions-ir.cc@301 PS18, Line 301: "The valid date range for the Timestamp type is 1400-01-01..-12-31."); > Maybe reword a bit: Done http://gerrit.cloudera.org:8080/#/c/12481/18/testdata/workloads/functional-query/queries/QueryTest/date.test File testdata/workloads/functional-query/queries/QueryTest/date.test: http://gerrit.cloudera.org:8080/#/c/12481/18/testdata/workloads/functional-query/queries/QueryTest/date.test@564 PS18, Line 564: # Impala returns the same results as PostgreSQL. > I don't know if it matters but it looks like hive 3 also returns the same f The main issue with the first and second expression in this select is that they return TIMESTAMPs in Impala and STRINGs in Hive3. This is one of the subtle differences between how implicit casts work in Impala and Hive3. Impala implicitly converts all three parameters to TIMESTAMP, whereas Hive3 converts all three to STRING. Consequently the return types are different too. Unfortunately changing this behavior in Impala would break backward compatibility, as that would change TIMESTAMP behavior too, not just DATE behavior. You are correct about the third expression: it returns a TIMESTAMP both in Impala and Hive3. Hive3's string representation of the TIMESTAMP contains an extra ".0" padding, but I don't think that matters. I've moved this expression to the previous section. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 19 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 05 Apr 2019 13:41:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd HH:mm:ss.S. The date component is mandatory, the time component is optional. If the time component is present, it will be truncted silently. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 18: (2 comments) Thanks for the changes. http://gerrit.cloudera.org:8080/#/c/12481/18/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/18/be/src/exprs/cast-functions-ir.cc@301 PS18, Line 301: "The valid date range in Timestamps is 1400-01-01..-12-31"); Maybe reword a bit: "The valid date range for the Timestamp type is..." http://gerrit.cloudera.org:8080/#/c/12481/18/testdata/workloads/functional-query/queries/QueryTest/date.test File testdata/workloads/functional-query/queries/QueryTest/date.test: http://gerrit.cloudera.org:8080/#/c/12481/18/testdata/workloads/functional-query/queries/QueryTest/date.test@564 PS18, Line 564: select coalesce('2012-01-01', cast('2012-02-02' as timestamp), cast('2012-02-02' as timestamp)), I don't know if it matters but it looks like hive 3 also returns the same for this line and the next except for an extra digit of padding: 2012-01-01 0:00:00 I.e. there are cases where it matches both that seem to have arbitrarily ended up here versus the above query. Again, I don't know if it matters. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 18 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 05 Apr 2019 00:33:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2652/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 18 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 04 Apr 2019 20:29:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2651/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 17 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 04 Apr 2019 20:14:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd HH:mm:ss.S. The date component is mandatory, the time component is optional. If the time component is present, it will be truncted silently. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/12481/17/be/src/runtime/date-test.cc File be/src/runtime/date-test.cc: http://gerrit.cloudera.org:8080/#/c/12481/17/be/src/runtime/date-test.cc@61 PS17, Line 61: const DateValue v4 = ParseValidateDate("1990-10-20 23:59:59.9", true, 1990, 10, 20); line too long (94 > 90) -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 17 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 04 Apr 2019 19:49:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 17: (3 comments) http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@20 PS11, Line 20: - Explicit casting between DATE and other types: : - from STRING > This seems to be stricter than CAST() in postgres, mysql, and hive (see htt Fixed it to allow and silently truncate the time component. http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@38 PS11, Line 38: > DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP : implicit conversions are now all possible, the existing function : overload resolution logic is not adequate anymore. : For example, it resolves the > Should we be emitting a WARNING for any out-of-range values? it seems surpr I've added warnings to indicate that DATE->TIMESTAMP, STRING->DATE, TIMESTAMP->DATE conversions failed. Adding warning to STRING->TIMESTAMP conversions needs more work. Since these conversions are not related to the DATE type I'd prefer to fix them separately. http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java File fe/src/main/java/org/apache/impala/catalog/Function.java: http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@191 PS11, Line 191: // Compares this to 'other' for 'mode'. > I'm afraid about introducing new semantics here which we'll need to break a Thanks for putting this together! - Impala already works as Hive 3.1 in rows 8-11, 13-14, 17, 20 and 22. - I've changed STRING->DATE conversions to accept (and silently truncate) the time component. This fixes rows 1-7, 12 and 21. - Row 15, 16 and 19 happens because Impala and PostgreSQL convert STRING, DATE, TIMESTAMP parameters to TIMESTAMP, while Hive converts them to STRING. Fixing these might not be possible w/o breaking backward compatibility. - Row 18 returns an error in Hive 2.1, and probably fails in Hive 3.1 as well (haven't tried it though). Impala and PostgreSQL convert DATE and TIMESTAMP parameters to TIMESTAMP. - Impala's behavior in row 23 is expected as valid timestamps in Impala start with year 1400. Adding YEAR(DATE) won't fix this problem either as we will still have to resolve YEAR('0009-02-15') to YEAR(TIMESTAMP) instead of YEAR(DATE) not to break backward compatibility. Users will have to be more implicit to avoid this issue and call YEAR(DATE '0009-02-15'). The best we can do is to issue a warning (which I haven't done here as it requires quite bit of work and technically it is a TIMESTAMP issue not a DATE issue. I'd prefer to do it in a separate patch.) -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 17 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 04 Apr 2019 19:48:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd HH:mm:ss.S. The date component is mandatory, the time component is optional. If the time component is present, it will be truncted silently. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 11: (3 comments) http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@20 PS11, Line 20: - from STRING to DATE. The string value must be formatted as : -MM-dd. This seems to be stricter than CAST() in postgres, mysql, and hive (see https://docs.google.com/spreadsheets/d/1IkceWy8lBSkF5NaL-Jls2y86zosyD9pwxtO7qKpSxZw/edit#gid=0 row 31) http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@38 PS11, Line 38: E.g: year('2019-02-15') must resolve to :year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is :not implemented yet, so this is not an issue at the moment but :it will be in the future. > Not necessarily. The valid range for TIMESTAMP is from 1400-01-01 to -1 Should we be emitting a WARNING for any out-of-range values? it seems surprising to silently emit NULL for out-of-range casts (I had a customer complain about this last week wrt timestamp) http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java File fe/src/main/java/org/apache/impala/catalog/Function.java: http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@191 PS11, Line 191: public boolean compare(Function other, CompareMode mode) { > I agree that the function overload resolution algorithm needs to be redesig I'm afraid about introducing new semantics here which we'll need to break again later. I did some testing on your patch relative to Hive 2, Hive 3, postgres, and MariaDB, and the results were pretty interesting: https://docs.google.com/spreadsheets/d/1IkceWy8lBSkF5NaL-Jls2y86zosyD9pwxtO7qKpSxZw/edit#gid=0 It seems like at least in the cases where Hive3 and Postgres agree, we should probably be giving the same results as well? It certainly seems like Impala is giving NULL in some places I wouldn't have expected. Maybe we can add a few more queries here, and a column for Impala (prior to the patch) to make sure we aren't introducing any backward-incompatible behavior? In other words, I think the priorities should be: 1) don't break existing queries 2) for queries where postgres and Hive3 agree, we should give the same result 3) for queries where postgres and Hive differ, we should probably go with Hive3 or collaborate with the Hive team to see if they think this is a bug In the case that this would require a big restructuring, I'd prefer if in the meantime we result these queries in an error like "ambiguous types detected, insert casts to resolve". Later, if we can figure out correct implicit casting rules, we can loosen it. But I'm against introducing new behavior which we already know that we'll have to break. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 11 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 02 Apr 2019 19:35:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 16: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/conditional-functions-ir.cc File be/src/exprs/conditional-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/conditional-functions-ir.cc@37 PS11, Line 37: IS_NULL_COMPUTE_ > Done Sorry for the churn here induced by my original comment. I do think there's something worth revisiting later just as a maintainability tihng - there would be some value in infrastructure to "stamp out" these more mechanical aspects of a data type, but it requires some thought. E.g. for something like a numeric type, where there's a set of common operations and functions, it seems like we should be able to define a new numeric type in one place and have the required definitions added automatically elsewhere, rather than having to chase down all the locations in the codebase. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 16 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 02 Apr 2019 17:49:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2571/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 16 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 28 Mar 2019 13:42:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@351 PS14, Line 351: return "AVG requires a numeric, timestamp or date parameter: " + toSql(); > Needs update Done -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 15 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 28 Mar 2019 13:00:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@351 PS14, Line 351: return "AVG requires a numeric, timestamp or date parameter: " + toSql(); Needs update http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/util/FunctionUtils.java File fe/src/main/java/org/apache/impala/util/FunctionUtils.java: http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/util/FunctionUtils.java@243 PS11, Line 243: return max_func; > That's correct, currently we never raise "ambiguous resolution" error. I don't think this behaviour was a well thought-out decision. The earliest versions of this that I saw was very hacky and actually returned an arbitrary function when there were multiple valid overloads - whichever was listed first in the catalog. I changed that to be a bit more intelligent about choosing functions that don't result in loss of information and also and checking functions in a deterministic order. It's not the way I would have designed it but it was the least disruptive way to evolve it. Introducing an error where there wasn't one before is pretty risky because it will probably break existing queries. Your example seems highly likely to be used in practice. There are some cases where it might not be a breaking change to raise an error, because they didn't occur before, e.g. if one of the inputs is a DATE. But not sure how interesting those cases are. I tend to think that something like this is probably the least of the evils, but the complexity of the behaviour is not ideal. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 11 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 27 Mar 2019 17:59:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 15: (5 comments) http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java File fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java: http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java@52 PS14, Line 52: public PartitionSpec(List partitionSpec) { > Do we ever expect this to get called prior to analysis? Done http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java@156 PS14, Line 156: } > same (Precondition on analysis if possible) Done http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java File fe/src/main/java/org/apache/impala/catalog/Function.java: http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@191 PS11, Line 191: // Compares this to 'other' for 'mode'. > hm, I spent some time researching this across multiple databases: I agree that the function overload resolution algorithm needs to be redesigned but I don't think we should do it now as it would break backward compatibility. We can reconsider implicit casting rules and re-implement the resolution logic (preferably modelled after what Hive does) in a separate change when we are ready for a breaking change. What do you think? http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@236 PS11, Line 236: private int calcSuperTypeOfMatchScore(Function other, boolean strict) { > I think we could simplify this code a lot by adding some utility function l Done http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/util/FunctionUtils.java File fe/src/main/java/org/apache/impala/util/FunctionUtils.java: http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/util/FunctionUtils.java@243 PS11, Line 243: return max_func; > it's interesting that our function resolution returns the first fit, from t That's correct, currently we never raise "ambiguous resolution" error. Adding an error to that effect sounds reasonable but I'm worried that it will introduce a breaking change. E.g. year('2011-01-01') resolves to year(TIMESTAMP) right now, but after introducing year(DATE) users will get an error. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 15 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 27 Mar 2019 17:42:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2563/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 15 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 27 Mar 2019 17:32:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 14: (5 comments) Looking pretty close to good. I just have some questions digging into the function resolution stuff. I don't have a really specific example of something I think is wrong, but I want to make sure we're pretty clear on what our function resolution policy is and that we think it won't have any unexpected user behavior. I'll spend some more time on that tomorrow http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java File fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java: http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java@52 PS14, Line 52: public List getPartitionSpecKeyValues() { return partitionSpec_; } Do we ever expect this to get called prior to analysis? Maybe it makes sense to add a bool analyzed_ and Preconditions.checkState(analyzed_); here to ensure that we don't ever return the pre-analysis KVs? This makes sure that we don't expose out the partitionSpec_ before it becomes immutable. http://gerrit.cloudera.org:8080/#/c/12481/14/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java@156 PS14, Line 156: public List toThrift() { same (Precondition on analysis if possible) http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java File fe/src/main/java/org/apache/impala/catalog/Function.java: http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@191 PS11, Line 191: public boolean compare(Function other, CompareMode mode) { > I took a quick look at the SQL ANSI standard, but couldn't find anything ab hm, I spent some time researching this across multiple databases: DB2's logic is very clearly described: https://www.ibm.com/support/knowledgecenter/en/SSEPEK_11.0.0/sqlref/src/tpc/db2z_determinbestfitfunction.html (it seems to go left-to-right) Postgres's is also pretty well described and seems relatively similar to what's done here: https://www.postgresql.org/docs/10/typeconv-func.html Hive's also seems to bail earlier with an "ambiguous" error when doing method resolution for UDFs: "Closest match is defined as the one that requires the least number of arguments to be converted. In case more than one matches are found, the method throws an ambiguous method exception" Comparing Postgres to what's implemented here, it seems like our two gaps are: 1) they have a well-defined concept of "preferred type". I checked the pg catalog and for the 'date/time' category, timestamptz is the 'preferred type': => select oid,typname, typispreferred from pg_type where typcategory = 'D'; oid | typname | typispreferred ---+-+ 702 | abstime | f 1082 | date| f 1083 | time| f 1114 | timestamp | f 1184 | timestamptz | t 1266 | timetz | f 12401 | time_stamp | f (7 rows) It seems that, because it only allows implicit conversion to "preferred" types, it's somewhat more strict than what we do today, though. 2) they have a concept of 'unknown' type to represent string literals in queries as being distinct from an expression with a defined type (eg any other expression than a string literal). I wonder if we should be treating StringLiteral casts differently than other exprs? http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/catalog/Function.java@236 PS11, Line 236: // Check trailing varargs. I think we could simplify this code a lot by adding some utility function like: Type[] tryExtendArgsToLength(int numArgs) { if (!hasVarArgs_ || argTypes_.length <= numArgs) return argTypes_; Type[] ret = Arrays.copyOf(argTypes_, numArgs); for (int i = argTypes_.length; i < numArgs; i++) { ret[i] = getVarArgsType(); } return ret; } then in these comparison functions, we can extend the varargs to match the user-provided call types and do the rest of the logic on fixed-length argument lists, without having to duplicate code between L227-234 and L239-247. http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/util/FunctionUtils.java File fe/src/main/java/org/apache/impala/util/FunctionUtils.java: http://gerrit.cloudera.org:8080/#/c/12481/11/fe/src/main/java/org/apache/impala/util/FunctionUtils.java@243 PS11, Line 243: return max_func; it's interesting that our function resolution returns the first fit, from the order returned by the catalog. Best I can tell, we never will fail with an "ambiguous resolution" error. Now that we have both DATE and TIMESTAMP (and assumedly we'll add DATETIME at some point?) is there really no case that we want to force an explicit match by
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2521/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 14 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 22 Mar 2019 16:25:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2506/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 13 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 21 Mar 2019 20:34:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 13: > I don't think we should bundle a wholesale reformatting change into > this patch. Just suggested fixing the name of the method for the > ones you added. If we want to reformat, let's do that in a separate > gerrit. otherwise this is going to be a pain to review/rebase/etc. > > I'll take a look at your changes today or tomorrow. Okay, I'll do that. Please ignore patch-set 13. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 13 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 21 Mar 2019 20:18:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 13: I don't think we should bundle a wholesale reformatting change into this patch. Just suggested fixing the name of the method for the ones you added. If we want to reformat, let's do that in a separate gerrit. otherwise this is going to be a pain to review/rebase/etc. I'll take a look at your changes today or tomorrow. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 13 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 21 Mar 2019 20:03:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 13: > Uploaded patch set 13. This patch-set contains the lowercase method name changes in Java files. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 13 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 21 Mar 2019 19:54:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2504/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 12 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 21 Mar 2019 19:32:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 12: (32 comments) http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@29 PS11, Line 29: : - Implicit casting between DATE and other typ > it's interesting that this differs from the behavior of TIMESTAMP today, at Implicit casting from STRING to TIMESTAMP works in Impala in most cases (parameters to function calls/operators, values in insert/select statements, etc). - Hive does these kind of implicit conversions too, - PostgreSQL on the other hand doesn't. - Couldn't find anything about STRING->TIMESTAMP implicit conversions in the SQL ANSI standard. Timestamp arithmetic expressions in Impala are an exception: in TimestampArithmeticExpr.java implicit conversions are explicitly forbidden. - Neither Hive nor PostgreSQL allow implicit conversions in arithmetic expressions. http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@38 PS11, Line 38: ution logic is not adequate anymore. : For example, it resolves the : if(false, '2011-01-01', DATE '1499-02-02') function call to the : if(BOOLEAN, TIMESTAMP, TIMESTA > wouldn't the result be the same either way for this example? Not necessarily. The valid range for TIMESTAMP is from 1400-01-01 to -12-31. The valid range for DATE is from -01-01 to -12-31. Therefore: year(cast('1399-12-31' as TIMESTAMP)) returns NULL, whereas year(cast('1399-12-31' as DATE)) will return 1399. In Hive the valid range for TIMESTAMP and DATE is the same, so this is not an issue. http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@42 PS11, Line 42: DATE, DATE > how is this defined, specifically? I've changed the wording to describe the function resolution algorithm and related issues in more detail. http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@70 PS11, Line 70: iders overloaded function > It seems easier to leave it enabled so long as the current support is corre I agree with Tim, introducing a feature flag for DATE support would take a considerable effort since this patch touches lots of files. Testing whether turning off the feature flag works as expected (i.e. we get the same behavior and error messages as before) can also be complicated. Also, the patch includes a sql-parser change. Moving the parser change behind a feature flag is probably not that straightforward either. http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/codegen/codegen-anyval.cc File be/src/codegen/codegen-anyval.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/codegen/codegen-anyval.cc@68 PS11, Line 68: return cg->i64_type(); > why do we lower to i64 instead of i32 given the slot size is only 4 bytes? The highest 4 bytes store the actual value, the lowest byte is used for the "is_null" flag. Similarly, TYPE_INT represents an i32 value but its lowered type is i64. http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exec/hdfs-table-sink.cc@500 PS11, Line 500: (type.type == TYPE_DATE) { : ColumnDescriptor col_desc = table_desc_->col_descs()[num_clustering_cols + i]; : stringstream error_msg; > nit: why not just pass that string directly? Done http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/aggregate-functions-ir.cc@381 PS11, Line 381: s should not have a negative perfor > Yea, my feeling was just that one fewer feature to test and bug-fix and sup Removed avg(DATE) to avoid semantic problems. http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/conditional-functions-ir.cc File be/src/exprs/conditional-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/conditional-functions-ir.cc@37 PS11, Line 37: IS_NULL_COMPUTE_ > per comment elsewhere, I found it clearer in the old version since I didn't Done http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/expr-test.cc@3386 PS11, Line 3386: : TEST_F(ExprTest, CastDateExprs) { > add some tests for invalid dates like feb 30 Done http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/expr-test.cc@3475 PS11, Line 3475: TestError("cast(cast(null as date) as double)"); > can you test a before-the-epoch timestamp that includes a time portion? eg Done http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/runtime/date-value.h File be/src/runtime/date-value.h:
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/aggregate-functions-ir.cc@381 PS11, Line 381: val_struct->sum / val_struct->count > I agree avg(date) is likely not important. Kind-of nice to have for consist Yea, my feeling was just that one fewer feature to test and bug-fix and support is good if it's not an important use case. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 11 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 18 Mar 2019 16:36:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 11: (3 comments) http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@70 PS11, Line 70: complete DATE type implementation > should we consider turning it off by default and only enabling it once we'r It seems easier to leave it enabled so long as the current support is correct. I think we should convince ourselves that the testing is adequate before merging this though. http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@72 PS11, Line 72: - Add date support to the random query generator. > should we transition TPC-DS test tables over to 'date' instead of 'string' Seems reasonable to do once the support is complete, I don't think it's a requirement to call DATE support "complete" though. http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/aggregate-functions-ir.cc@381 PS11, Line 381: val_struct->sum / val_struct->count > Is this correct behavior when the dates are negative? Dividing as signed in I agree avg(date) is likely not important. Kind-of nice to have for consistency but I guess if there's any concerns about the semantics we can just avoid the issues by not having it. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 11 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 18 Mar 2019 16:13:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/12481/10/tests/query_test/test_date_queries.py File tests/query_test/test_date_queries.py: http://gerrit.cloudera.org:8080/#/c/12481/10/tests/query_test/test_date_queries.py@107 PS10, Line 107: def test_impala_shell(self): We don't need to test this for multiple file formats, right? -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 10 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 18 Mar 2019 16:08:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@29 PS11, Line 29: - from STRING to DATE if the source string value is used in a : context where a DATE value is expected. > it's interesting that this differs from the behavior of TIMESTAMP today, at The interval implementation in Impala is kind-of weird because it's implemented as a special operator instead of as an INTERVAL data type. I don't think this is related to the (lack of) implicit casting but it does relate to the standard support. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 11 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 18 Mar 2019 16:02:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 11: (32 comments) nice work, pretty impressive test coverage. Mostly small items here. http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@29 PS11, Line 29: - from STRING to DATE if the source string value is used in a : context where a DATE value is expected. it's interesting that this differs from the behavior of TIMESTAMP today, at least for timestamp arithmetic expressions: SELECT '2019-01-01 12;12:12' + interval 1 day does not do an implicit conversion. (interestingly it does in MySQL). Does the ANSI standard give rules for this? http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@38 PS11, Line 38: E.g: year('2019-02-15') must resolve to :year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is :not implemented yet, so this is not an issue at the moment but :it will be in the future. wouldn't the result be the same either way for this example? http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@42 PS11, Line 42: better fit how is this defined, specifically? http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@70 PS11, Line 70: complete DATE type implementation should we consider turning it off by default and only enabling it once we're sure it's stable and correct? http://gerrit.cloudera.org:8080/#/c/12481/11//COMMIT_MSG@72 PS11, Line 72: - Add date support to the random query generator. should we transition TPC-DS test tables over to 'date' instead of 'string' for the appropriate columns? http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/codegen/codegen-anyval.cc File be/src/codegen/codegen-anyval.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/codegen/codegen-anyval.cc@68 PS11, Line 68: return cg->i64_type(); why do we lower to i64 instead of i32 given the slot size is only 4 bytes? seems like we're using int32 below http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exec/hdfs-table-sink.cc@500 PS11, Line 500: stringstream error_msg; : error_msg << "Cannot write DATE column to a PARQUET table."; : return Status(error_msg.str()); nit: why not just pass that string directly? Could we make this better by including the column name from the table desc? http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/aggregate-functions-ir.cc@381 PS11, Line 381: val_struct->sum / val_struct->count Is this correct behavior when the dates are negative? Dividing as signed integers means we'll "truncate towards zero" which means that average of (1969-01-01, 1969-01-02) would be 1969-01-02 whereas average of 1971-01-01 and 1971-01-02 would be 1971-01-01. I think that's surprising because it exposes the epoch reference point in the semantics of the operation. Does the ANSI standard have anything to say here? Checking briefly on sqlfiddle: - postgres 9.6 doesn't support AVG(date) at all. Neither does Oracle 11g R2 nor SQL Server 2017. - mysql 5.6 implicitly casts the dates to an DECIMIAL and return something weird like 19690101.5. Not sure what's going on under the covers. - hive 3.1 doesn't support AVG(date) either Maybe we should remove this functionality for now and consider adding it back based on user demand? http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/conditional-functions-ir.cc File be/src/exprs/conditional-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/conditional-functions-ir.cc@37 PS11, Line 37: ITERATE_OVER_SEQ per comment elsewhere, I found it clearer in the old version since I didn't need to understand boost magic http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/expr-test.cc@3386 PS11, Line 3386: : TEST_F(ExprTest, CastDateExprs) { add some tests for invalid dates like feb 30 http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/exprs/expr-test.cc@3475 PS11, Line 3475: TestDateValue("cast(cast('1400-01-01 00:00:00' as timestamp) as date)", can you test a before-the-epoch timestamp that includes a time portion? eg '1960-01-01 23:59' and make sure it truncates "down" and not truncate "towards the epoch"? http://gerrit.cloudera.org:8080/#/c/12481/11/be/src/runtime/date-value.h File be/src/runtime/date-value.h:
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2423/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 11 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 13 Mar 2019 13:53:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 11: > Uploaded patch set 11. Rebased it. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 11 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 13 Mar 2019 13:07:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since both STRING -> DATE and STRING -> TIMESTAMP implicit conversions are possible, the function resolution logic was changed to select the right version of overloaded functions: - If both implicit conversions are applicable equally well, STRING -> TIMESTAMP is preferred for backward compatibility reasons. E.g: year('2019-02-15') must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. - If one implicit conversion is a better fit over the other, we must choose the better one. E.g: if(false, '2011-01-01', DATE '1499-02-02') must resolve to if(BOOLEAN, DATE, DATE) instead of if(BOOLEAN, TIMESTAMP, TIMESTAMP). - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/cast-functions.h M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/conditional-functions.h M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/in-predicate-ir.cc M be/src/exprs/in-predicate.h M be/src/exprs/is-null-predicate-ir.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M be/src/exprs/null-literal.cc M be/src/exprs/null-literal.h M be/src/exprs/operators-ir.cc M be/src/exprs/operators.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr-ir.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2419/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 10 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Mar 2019 19:47:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 10: (1 comment) Added DATE error and HBASE tests. http://gerrit.cloudera.org:8080/#/c/12481/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/9//COMMIT_MSG@74 PS9, Line 74: - Add Parquet support. > Need to update this. Done -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 10 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Mar 2019 19:07:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since both STRING -> DATE and STRING -> TIMESTAMP implicit conversions are possible, the function resolution logic was changed to select the right version of overloaded functions: - If both implicit conversions are applicable equally well, STRING -> TIMESTAMP is preferred for backward compatibility reasons. E.g: year('2019-02-15') must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. - If one implicit conversion is a better fit over the other, we must choose the better one. E.g: if(false, '2011-01-01', DATE '1499-02-02') must resolve to if(BOOLEAN, DATE, DATE) instead of if(BOOLEAN, TIMESTAMP, TIMESTAMP). - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/cast-functions.h M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/conditional-functions.h M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/in-predicate-ir.cc M be/src/exprs/in-predicate.h M be/src/exprs/is-null-predicate-ir.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M be/src/exprs/null-literal.cc M be/src/exprs/null-literal.h M be/src/exprs/operators-ir.cc M be/src/exprs/operators.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr-ir.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 9: I'll leave that up to you. I think it depends whether we're trying to get this into the upstream release or not - that's probably something to figure out between you and Gabor. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 9 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Mar 2019 18:06:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 9: > (1 comment) > > Also, is the HBase data representation just a string? Is it > compatible with Hive's representation. I've done some simple ad-hoc testing with HBASE DATE tables, seems to be working fine. The HBASE date representation is string-based and compatible with Hive. On second thought (since we will have an upstream release soon), maybe it makes sense to spend couple more days on adding HBASE tests to this change and then commit everything in one patch. Splitting it into two parts would further complicate things. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 9 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Mar 2019 17:29:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 9: (1 comment) Also, is the HBase data representation just a string? Is it compatible with Hive's representation. http://gerrit.cloudera.org:8080/#/c/12481/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/9//COMMIT_MSG@74 PS9, Line 74: - Add HBase and Kudu support. Need to update this. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 9 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Mar 2019 16:44:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 9: I'm ok with deferring that testing so long as you do a couple of sanity checks to make sure it's not broken. We should file a blocker JIRA to add the testing before the release that this will land in to make sure that it gets done. Otherwise we would need to revert this. In principle we shouldn't add functionality without comprehensive tests but it's worth making an exception here because of the heroic nature of this code change. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 9 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Mar 2019 16:43:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 6: > (2 comments) > > Do you mean that, with these code changes, you can scan HBase date > tables? Or that we should have negative tests for HBase? > > I think we should have test coverage but I'm open to splitting that > out into a separate patch if that helps this get in faster. Yes, with these code changes Impala can scan/write HBase DATE tables. At the minimum, DATE-specific tests should be added to hbase-inserts.test, hbase-scan-node.test, hbase-compute-stats-incremental.test and hbase-compute-stats.test. This would also involve adding extra HBASE tables to functional_hbase. I'd rather do this in a separate change. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 6 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Mar 2019 16:01:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2378/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 9 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Mar 2019 02:14:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 9: (2 comments) Do you mean that, with these code changes, you can scan HBase date tables? Or that we should have negative tests for HBase? I think we should have test coverage but I'm open to splitting that out into a separate patch if that helps this get in faster. http://gerrit.cloudera.org:8080/#/c/12481/6/fe/src/main/java/org/apache/impala/analysis/DateLiteral.java File fe/src/main/java/org/apache/impala/analysis/DateLiteral.java: http://gerrit.cloudera.org:8080/#/c/12481/6/fe/src/main/java/org/apache/impala/analysis/DateLiteral.java@127 PS6, Line 127: return daysSinceEpoch_ - other.daysSinceEpoch_; > Done. At the very beginning 'daysSinceEpoch_' was not a member and later I Makes sense! http://gerrit.cloudera.org:8080/#/c/12481/6/testdata/workloads/functional-query/queries/QueryTest/date.test File testdata/workloads/functional-query/queries/QueryTest/date.test: http://gerrit.cloudera.org:8080/#/c/12481/6/testdata/workloads/functional-query/queries/QueryTest/date.test@36 PS6, Line 36: where '2017-11-28' in (date_col) > To be honest, I didn't give this too much thought. I just copied the timest Ah ok, that makes sense. I don't think there are likely to be bugs here but it's good to be sure. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 9 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Mar 2019 00:55:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT fileformat only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since both STRING -> DATE and STRING -> TIMESTAMP implicit conversions are possible, the function resolution logic was changed to select the right version of overloaded functions: - If both implicit conversions are applicable equally well, STRING -> TIMESTAMP is preferred for backward compatibility reasons. E.g: year('2019-02-15') must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. - If one implicit conversion is a better fit over the other, we must choose the better one. E.g: if(false, '2011-01-01', DATE '1499-02-02') must resolve to if(BOOLEAN, DATE, DATE) instead of if(BOOLEAN, TIMESTAMP, TIMESTAMP). - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT fileformat only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add HBase and Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/cast-functions.h M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/conditional-functions.h M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/in-predicate-ir.cc M be/src/exprs/in-predicate.h M be/src/exprs/is-null-predicate-ir.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M be/src/exprs/null-literal.cc M be/src/exprs/null-literal.h M be/src/exprs/operators-ir.cc M be/src/exprs/operators.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr-ir.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M be/src/exprs/timestamp-functions.h M be/src/exprs/utility-functions-ir.cc M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 8: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/2374/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 8 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Mar 2019 17:23:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 7: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/2373/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 7 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Mar 2019 17:20:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 8: I've also realized that besides TEXT, DATE should work for HBASE as well. Do you think, I should add HBASE tests to this patch-set, or it should be a separate change? -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 8 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Mar 2019 16:42:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT fileformat only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since both STRING -> DATE and STRING -> TIMESTAMP implicit conversions are possible, the function resolution logic was changed to select the right version of overloaded functions: - If both implicit conversions are applicable equally well, STRING -> TIMESTAMP is preferred for backward compatibility reasons. E.g: year('2019-02-15') must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. - If one implicit conversion is a better fit over the other, we must choose the better one. E.g: if(false, '2011-01-01', DATE '1499-02-02') must resolve to if(BOOLEAN, DATE, DATE) instead of if(BOOLEAN, TIMESTAMP, TIMESTAMP). - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT fileformat only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add HBase and Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/cast-functions.h M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/conditional-functions.h M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/in-predicate-ir.cc M be/src/exprs/in-predicate.h M be/src/exprs/is-null-predicate-ir.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M be/src/exprs/null-literal.cc M be/src/exprs/null-literal.h M be/src/exprs/operators-ir.cc M be/src/exprs/operators.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr-ir.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M be/src/exprs/timestamp-functions.h M be/src/exprs/utility-functions-ir.cc M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/12481/7/tests/query_test/test_date_queries.py File tests/query_test/test_date_queries.py: http://gerrit.cloudera.org:8080/#/c/12481/7/tests/query_test/test_date_queries.py@20 PS7, Line 20: import pytest flake8: F401 'pytest' imported but unused -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 7 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Mar 2019 16:38:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT fileformat only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since both STRING -> DATE and STRING -> TIMESTAMP implicit conversions are possible, the function resolution logic was changed to select the right version of overloaded functions: - If both implicit conversions are applicable equally well, STRING -> TIMESTAMP is preferred for backward compatibility reasons. E.g: year('2019-02-15') must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. - If one implicit conversion is a better fit over the other, we must choose the better one. E.g: if(false, '2011-01-01', DATE '1499-02-02') must resolve to if(BOOLEAN, DATE, DATE) instead of if(BOOLEAN, TIMESTAMP, TIMESTAMP). - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT fileformat only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add HBase and Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/cast-functions.h M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/conditional-functions.h M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/in-predicate-ir.cc M be/src/exprs/in-predicate.h M be/src/exprs/is-null-predicate-ir.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M be/src/exprs/null-literal.cc M be/src/exprs/null-literal.h M be/src/exprs/operators-ir.cc M be/src/exprs/operators.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr-ir.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M be/src/exprs/timestamp-functions.h M be/src/exprs/utility-functions-ir.cc M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 3: (8 comments) I did a pass over the remaining bits of the patch. Overall it looks really good. http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/runtime/date-value.h File be/src/runtime/date-value.h: http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/runtime/date-value.h@59 PS6, Line 59: DateValue(int32_t days_since_epoch) : days_since_epoch_(INVALID_DAYS_SINCE_EPOCH) { Can you make these constructors explicit? Just to avoid any potential unexpected conversions. http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/service/hs2-util.cc File be/src/service/hs2-util.cc: http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/service/hs2-util.cc@539 PS6, Line 539: RawValue::PrintValue(value, TYPE_DATE, -1, &(hs2_col_val->stringVal.value)); > Can we just use StringValue::ToString() instead? Seems like a lot of indire I still think it would be good to fix this, just so that the new code is efficient. http://gerrit.cloudera.org:8080/#/c/12481/6/fe/src/main/java/org/apache/impala/analysis/DateLiteral.java File fe/src/main/java/org/apache/impala/analysis/DateLiteral.java: http://gerrit.cloudera.org:8080/#/c/12481/6/fe/src/main/java/org/apache/impala/analysis/DateLiteral.java@40 PS6, Line 40: strValue_ = strValue; Mention that it's the same representation as DateValue in backend? http://gerrit.cloudera.org:8080/#/c/12481/6/fe/src/main/java/org/apache/impala/analysis/DateLiteral.java@127 PS6, Line 127: Why not compare daysSinceEpoch_? http://gerrit.cloudera.org:8080/#/c/12481/6/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java File fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java: http://gerrit.cloudera.org:8080/#/c/12481/6/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java@81 PS6, Line 81: alysis whether http://gerrit.cloudera.org:8080/#/c/12481/6/testdata/workloads/functional-query/queries/QueryTest/compute-stats-date.test File testdata/workloads/functional-query/queries/QueryTest/compute-stats-date.test: http://gerrit.cloudera.org:8080/#/c/12481/6/testdata/workloads/functional-query/queries/QueryTest/compute-stats-date.test@4 PS6, Line 4: create table date_tbl like functional.date_tbl; Test might be slightly clearer if this had a different name from the functional version, e.g. tmp_date_tbl or date_tbl_copy. http://gerrit.cloudera.org:8080/#/c/12481/6/testdata/workloads/functional-query/queries/QueryTest/date.test File testdata/workloads/functional-query/queries/QueryTest/date.test: http://gerrit.cloudera.org:8080/#/c/12481/6/testdata/workloads/functional-query/queries/QueryTest/date.test@35 PS6, Line 35: select count(*) from date_tbl We should test the IN predicate with both long and short lists to exercise both the set lookup and iterate code. I don't expect there to be bugs here but it's just interesting enough that it would be good to have the coverage. // Threshold based on InPredicateBenchmark results int setLookupThreshold = children_.get(0).getType().isStringType() ? 2 : 6; if (children_.size() - 1 < setLookupThreshold) useSetLookup = false; http://gerrit.cloudera.org:8080/#/c/12481/6/testdata/workloads/functional-query/queries/QueryTest/date.test@36 PS6, Line 36: where '2017-11-28' in (date_col) Is it deliberate that all this IN/NOT IN/BETWEEN predicates can be rewritten to simple comparisons by the planner? I don't understand focusing on this rather than variants that can't be simplified (testing edge cases is good, we just need to test the regular cases too). -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 05 Mar 2019 02:43:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 6: (10 comments) I did a pass over the backend code. I think it's looking pretty good. Most comments I had were actually about existing issues in the codebase. http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/exprs/aggregate-functions-ir.cc@386 PS6, Line 386: if (UNLIKELY(src.is_null)) return DateVal::null(); It's weird to handle is_null here and not in GetValue(). I think it's unnecessary. Seems to be carried over from the other functions. I'm ok if you don't want to handle it right now, but maybe file a JIRA if you agree with me that it's wrong? http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/exprs/cast-functions-ir.cc@179 PS6, Line 179: lexical_cast It would be better to use ToString() directly rather than involving the indirection of lexical_cast and operator<<. http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/exprs/conditional-functions-ir.cc File be/src/exprs/conditional-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/exprs/conditional-functions-ir.cc@35 PS6, Line 35: IS_NULL_COMPUTE_FUNCTION(BooleanVal); I know we'd talked about how to make it easier to add more data types in future. This is one case where we could have some macros to stamp out functions like this for each *Val type. Not saying we need to do it here, just thinking. http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/exprs/expr-test.cc@3466 PS6, Line 3466: a are http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/exprs/slot-ref.cc File be/src/exprs/slot-ref.cc: http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/exprs/slot-ref.cc@451 PS6, Line 451: TimestampVal SlotRef::GetTimestampVal( These functions look regular enough that we could maybe stamp them out automatically from all types with a macro, except for the parameterised types like DecimalVal. Don't need to fix now, just thinking. http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/runtime/date-value.h File be/src/runtime/date-value.h: http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/runtime/date-value.h@96 PS6, Line 96: underlaying underlying http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/runtime/date-value.cc File be/src/runtime/date-value.cc: http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/runtime/date-value.cc@68 PS6, Line 68: (void)DateParser::Parse(str, len, ); I think you'll need to use discard_result() here. I can't remember exactly but either GCC7 or Clang doesn't consider this to be actually discarding the return value. http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/service/fe-support.cc@639 PS6, Line 639: std::s nit: std:: prefix not needed since it's imported into the global namespace by common/names.h http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/service/hs2-util.cc File be/src/service/hs2-util.cc: http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/service/hs2-util.cc@539 PS6, Line 539: RawValue::PrintValue(value, TYPE_DATE, -1, &(hs2_col_val->stringVal.value)); Can we just use StringValue::ToString() instead? Seems like a lot of indirection to use this. Oh I guess TIMESTAMP does this below, oh well. http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/util/symbols-util.cc File be/src/util/symbols-util.cc: http://gerrit.cloudera.org:8080/#/c/12481/6/be/src/util/symbols-util.cc@135 PS6, Line 135: case TYPE_DATE: Another example where it's just stamped out for each *Val type. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 6 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 26 Feb 2019 01:58:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2218/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 6 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 22 Feb 2019 21:39:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exec/hdfs-scan-node-base.cc@644 PS3, Line 644: // TODO: Remove this block once DATE type is supported accross all file formats. : if (has_materialized_date_slot_ && partition->file_format() != THdfsFileFormat::TEXT) { : context->ClearStreams(); : return Status("DATE type is only supported with TEXT file format."); : } > I think it might be better to reject it in the frontend even if it means re Actually I think if you did it in the frontend it depends whether you check before/after static partition pruning. I was thinking that you could check for DATE columns after partition pruning. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 22 Feb 2019 21:15:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exec/hdfs-scan-node-base.cc@644 PS3, Line 644: // TODO: Remove this block once DATE type is supported accross all file formats. : if (has_materialized_date_slot_ && partition->file_format() != THdfsFileFormat::TEXT) { : context->ClearStreams(); : return Status("DATE type is only supported with TEXT file format."); : } > I think it would be much more difficult (or impossible) to do this in the F I think it might be better to reject it in the frontend even if it means rejecting some potentially runnable queries. Queries like the one you mentioned will be flaky, because they will succeed only if the filters arrive early enough to filter out non-text partitions. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 22 Feb 2019 21:13:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT fileformat only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since both STRING -> DATE and STRING -> TIMESTAMP implicit conversions are possible, the function resolution logic was changed to select the right version of overloaded functions: - If both implicit conversions are applicable equally well, STRING -> TIMESTAMP is preferred for backward compatibility reasons. E.g: year('2019-02-15') must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. - If one implicit conversion is a better fit over the other, we must choose the better one. E.g: if(false, '2011-01-01', DATE '1499-02-02') must resolve to if(BOOLEAN, DATE, DATE) instead of if(BOOLEAN, TIMESTAMP, TIMESTAMP). - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT fileformat only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add HBase and Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/cast-functions.h M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/conditional-functions.h M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/in-predicate-ir.cc M be/src/exprs/in-predicate.h M be/src/exprs/is-null-predicate-ir.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M be/src/exprs/null-literal.cc M be/src/exprs/null-literal.h M be/src/exprs/operators-ir.cc M be/src/exprs/operators.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr-ir.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 5: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/2216/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 5 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 22 Feb 2019 19:13:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT fileformat only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since both STRING -> DATE and STRING -> TIMESTAMP implicit conversions are possible, the function resolution logic was changed to select the right version of overloaded functions: - If both implicit conversions are applicable equally well, STRING -> TIMESTAMP is preferred for backward compatibility reasons. E.g: year('2019-02-15') must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. - If one implicit conversion is a better fit over the other, we must choose the better one. E.g: if(false, '2011-01-01', DATE '1499-02-02') must resolve to if(BOOLEAN, DATE, DATE) instead of if(BOOLEAN, TIMESTAMP, TIMESTAMP). - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT fileformat only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add HBase and Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/cast-functions.h M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/conditional-functions.h M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/in-predicate-ir.cc M be/src/exprs/in-predicate.h M be/src/exprs/is-null-predicate-ir.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M be/src/exprs/null-literal.cc M be/src/exprs/null-literal.h M be/src/exprs/operators-ir.cc M be/src/exprs/operators.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr-ir.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/12481/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/12481/4/be/src/exprs/expr-test.cc@1141 PS4, Line 1141: void ExprTest::TestCast(const string& stmt, const char* val, bool timestamp_out_of_range) { line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/12481/4/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java File fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java: http://gerrit.cloudera.org:8080/#/c/12481/4/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java@76 PS4, Line 76: // 1. Validate each partition key/value specified, ensuring a matching partition column line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/12481/4/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: http://gerrit.cloudera.org:8080/#/c/12481/4/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@3310 PS4, Line 3310: "Expected: ALL, CASE, CAST, DATE, DEFAULT, DISTINCT, EXISTS, FALSE, IF, INTERVAL, " + line too long (93 > 90) -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 4 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 22 Feb 2019 18:25:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT fileformat only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since both STRING -> DATE and STRING -> TIMESTAMP implicit conversions are possible, the function resolution logic was changed to select the right version of overloaded functions: - If both implicit conversions are applicable equally well, STRING -> TIMESTAMP is preferred for backward compatibility reasons. E.g: year('2019-02-15') must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. - If one implicit conversion is a better fit over the other, we must choose the better one. E.g: if(false, '2011-01-01', DATE '1499-02-02') must resolve to if(BOOLEAN, DATE, DATE) instead of if(BOOLEAN, TIMESTAMP, TIMESTAMP). - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT fileformat only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add HBase and Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/cast-functions.h M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/conditional-functions.h M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/in-predicate-ir.cc M be/src/exprs/in-predicate.h M be/src/exprs/is-null-predicate-ir.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M be/src/exprs/null-literal.cc M be/src/exprs/null-literal.h M be/src/exprs/operators-ir.cc M be/src/exprs/operators.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr-ir.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 3: (19 comments) Thanks for the reviews! While I was going through the comments, I realized that partitioning by DATE wasn't properly implemented. I fixed those issues as well and added some extra tests to cover partitioning functionality. http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG@11 PS3, Line 11: of values supported for the DATE type is -01-01 to -12-31. > Can you mention the literal syntax supported for date values? Done http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG@22 PS3, Line 22: - from BIGINT, INT, SMALLINT, TINYINT to DATE. The source value is : interpreted as a number of days since the epoch. : - from DATE to BIGINT, INT, SMALLINT, TINYINT. The resulting : integer is the number of days since epoch. : - from DOUBLE, FLOAT, DECIMAL to DATE. The source value's : fractional part is ignored, the integer part is interpreted as a : number of days since epoch. : - from DATE to DOUBLE, FLOAT, DECIMAL. The resulting value is the : number of days since epoch. > Hive does not support these casts (I checked with Hive 2.1.1), are you sure I introduced them as all these conversions are supported for timestamps. On the other hand if Hive doesn't support them, probably we shouldn't either. Removed them. http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG@35 PS3, Line 35: - Implicit casting between DATE and other types: > Do you plan to support the DATE '-mm-dd' literal syntax? It looks like Good idea, added support for that. http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG@40 PS3, Line 40: - Since both STRING -> DATE and STRING -> TIMESTAMP implicit > Do you have an example of where this is required? Added more information with some examples. http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG@63 PS3, Line 63: tests/query_test/test_date_queries.py. > Can we add a simple test that runs a query returning date through impala-sh Added an extra test to test_date_queries.py. http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG@64 PS3, Line 64: > I think we should plan on adding DATE support to the random query generator Added an extra section to the commit msg to describe future plans. http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exec/hdfs-scan-node-base.cc@644 PS3, Line 644: // TODO: Remove this block once DATE type is supported accross all file formats. : if (has_materialized_date_slot_ && partition->file_format() != THdfsFileFormat::TEXT) { : context->ClearStreams(); : return Status("DATE type is only supported with TEXT file format."); : } > I think that this logic should be implemented in frontend, and the backend I think it would be much more difficult (or impossible) to do this in the FE. Imagine joining a table to a partitioned table on a partition column, where each partition has a different file format. I don't think it is possible to figure out in the analyze/plan phase whether non-text partitions will be scanned or not. http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/cast-functions-ir.cc@310 PS3, Line 310: int year, month, day; : if (UNLIKELY(!dv.ToYearMonthDay(, , ))) return TimestampVal::null(); : if (year < 1400 || year > ) return TimestampVal::null(); : : const boost::gregorian::date d(year, month, day); : const boost::posix_time::time_duration t(0, 0, 0, 0); : TimestampValue tv(d, t); > performance: this could be done much faster, as both DateVal and TimestampV Done http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/cast-functions-ir.cc@330 PS3, Line 330: if (val.is_null) return DateVal::null(); : TimestampValue tv = TimestampValue::FromTimestampVal(val); : if (UNLIKELY(!tv.HasDate())) return DateVal::null(); : : const boost::gregorian::date d = tv.date(); : const DateValue dv(d.year(), d.month(), d.day()); : return dv.ToDateVal(); > same as in line 310: this could be done with a simple addition. There is a Done http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/slot-ref.cc File be/src/exprs/slot-ref.cc:
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG@22 PS3, Line 22: - from BIGINT, INT, SMALLINT, TINYINT to DATE. The source value is : interpreted as a number of days since the epoch. : - from DATE to BIGINT, INT, SMALLINT, TINYINT. The resulting : integer is the number of days since epoch. : - from DOUBLE, FLOAT, DECIMAL to DATE. The source value's : fractional part is ignored, the integer part is interpreted as a : number of days since epoch. : - from DATE to DOUBLE, FLOAT, DECIMAL. The resulting value is the : number of days since epoch. Hive does not support these casts (I checked with Hive 2.1.1), are you sure that it is useful to support them in Impala? My first impression is that these casts are more confusing than useful. http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/udf/udf.h File be/src/udf/udf.h: http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/udf/udf.h@586 PS3, Line 586: /// Date value represented as days since epoch 1970-01-01. The valid range and the fact that the value is interpreted as proleptic Gregorian could be also mentioned here. http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/util/string-parser-test.cc File be/src/util/string-parser-test.cc: http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/util/string-parser-test.cc@526 PS3, Line 526: TestDateValue("-01-01", DateValue(0, 1, 1), StringParser::PARSE_SUCCESS); : TestDateValue("-12-31", DateValue(, 12, 31), StringParser::PARSE_SUCCESS); Please add the "wrong side" of the edge values: -0001-12-31 (Hive had a bug with these kinds of values) 1-01-01 http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/util/string-parser-test.cc@531 PS3, Line 531: TestDateValue("2-11-10", invalid_date, StringParser::PARSE_FAILURE); Please add tests where month/day have the correct number of characters but the value is invalid, e.g 2018-00-10, 2018-13-10, 2018-01-0, 2018-01-32, 2019-02-39 http://gerrit.cloudera.org:8080/#/c/12481/3/common/thrift/Exprs.thrift File common/thrift/Exprs.thrift: http://gerrit.cloudera.org:8080/#/c/12481/3/common/thrift/Exprs.thrift@56 PS3, Line 56: // string representation of date formatted as -MM-dd. : 1: required string value; Why do you use string instead of i32 to represent the date? TTimestampLiteral uses binary representation instead of string. http://gerrit.cloudera.org:8080/#/c/12481/3/fe/src/main/java/org/apache/impala/analysis/DateLiteral.java File fe/src/main/java/org/apache/impala/analysis/DateLiteral.java: http://gerrit.cloudera.org:8080/#/c/12481/3/fe/src/main/java/org/apache/impala/analysis/DateLiteral.java@32 PS3, Line 32: yyy Isn't it ? http://gerrit.cloudera.org:8080/#/c/12481/3/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java: http://gerrit.cloudera.org:8080/#/c/12481/3/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@155 PS3, Line 155: isDateType By implementing the Date type the name of this function became very misleading. Can you rename it to something like isDateOrTimeType()? http://gerrit.cloudera.org:8080/#/c/12481/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: http://gerrit.cloudera.org:8080/#/c/12481/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@194 PS3, Line 194: AnalyzesOk("select cast (cast ('1970-10-10 10:00:00.123' as timestamp) as date)"); What will happen if some writes select cast('1970-10-10 10:00:00.123' as date) ? Will it return NULL without warning? http://gerrit.cloudera.org:8080/#/c/12481/3/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java: http://gerrit.cloudera.org:8080/#/c/12481/3/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@368 PS3, Line 368: assertToSqlWithImplicitCasts(ctx, "select * from functional.alltypes, " : + "functional.date_tbl where timestamp_col = date_col", : "SELECT * FROM functional.alltypes, functional.date_tbl " : + "WHERE timestamp_col = CAST(date_col AS TIMESTAMP)"); I saw some weirdness around this in Hive, it seems to convert everything to string: select cast("1970-01-01" as timestamp) > cast("1970-01-01" as date); return True What happened is that the appended 00:00:00 part made the timestamp version
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 3: (4 comments) I ran through the non-test cpp code. http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exec/hdfs-scan-node-base.cc@644 PS3, Line 644: // TODO: Remove this block once DATE type is supported accross all file formats. : if (has_materialized_date_slot_ && partition->file_format() != THdfsFileFormat::TEXT) { : context->ClearStreams(); : return Status("DATE type is only supported with TEXT file format."); : } I think that this logic should be implemented in frontend, and the backend should only contains DCHECKs. http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/cast-functions-ir.cc@310 PS3, Line 310: int year, month, day; : if (UNLIKELY(!dv.ToYearMonthDay(, , ))) return TimestampVal::null(); : if (year < 1400 || year > ) return TimestampVal::null(); : : const boost::gregorian::date d(year, month, day); : const boost::posix_time::time_duration t(0, 0, 0, 0); : TimestampValue tv(d, t); performance: this could be done much faster, as both DateVal and TimestampValue's date_ are "number of days since some epoch", so the conversion should be simply checking a range + add the difference between the to epoch. http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/cast-functions-ir.cc@330 PS3, Line 330: if (val.is_null) return DateVal::null(); : TimestampValue tv = TimestampValue::FromTimestampVal(val); : if (UNLIKELY(!tv.HasDate())) return DateVal::null(); : : const boost::gregorian::date d = tv.date(); : const DateValue dv(d.year(), d.month(), d.day()); : return dv.ToDateVal(); same as in line 310: this could be done with a simple addition. There is a function that does exactly this conversion in https://gerrit.cloudera.org/#/c/12247/8/be/src/runtime/timestamp-value.inline.h ( int64_t TimestampValue::DaysSinceUnixEpoch() ) http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/slot-ref.cc File be/src/exprs/slot-ref.cc: http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/slot-ref.cc@485 PS3, Line 485: const DateValue dv(*reinterpret_cast(t->GetSlot(slot_offset_)) Why don't we reinterpret_cast to DateValue* directly? This would avoid the unnecessary range check in DateValue's constructor. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 15 Feb 2019 13:29:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 3: (6 comments) I had some high-level questions/comments. http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG@11 PS3, Line 11: of values supported for the DATE type is -01-01 to -12-31. Can you mention the literal syntax supported for date values? http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG@35 PS3, Line 35: - Implicit casting between DATE and other types: Do you plan to support the DATE '-mm-dd' literal syntax? It looks like Hive does. I wonder if that would make it easier to write queries that depend less on implicit casting. http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG@40 PS3, Line 40: - Since both STRING -> DATE and STRING -> TIMESTAMP implicit Do you have an example of where this is required? http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG@63 PS3, Line 63: tests/query_test/test_date_queries.py. Can we add a simple test that runs a query returning date through impala-shell, just to make sure that the client-side handling of DATE there works. http://gerrit.cloudera.org:8080/#/c/12481/3//COMMIT_MSG@64 PS3, Line 64: I think we should plan on adding DATE support to the random query generator (tests/comparison), otherwise we'd be leaving an arbitrary gap in the support there. I believe the semantics of DATE should be the same for all databases so hopefully that is relatively straightforward. The coverage you have looks quite complete, and we should make efforts to understand if there are any gaps, but the extra line of defense is useful. It's also helpful if you add more standard date functions later on. http://gerrit.cloudera.org:8080/#/c/12481/3/tests/query_test/test_date_queries.py File tests/query_test/test_date_queries.py: http://gerrit.cloudera.org:8080/#/c/12481/3/tests/query_test/test_date_queries.py@18 PS3, Line 18: decimal date -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 14 Feb 2019 19:45:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2126/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 14 Feb 2019 17:20:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2125/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 2 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 14 Feb 2019 17:16:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2124/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 1 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 14 Feb 2019 16:50:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT fileformat only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from BIGINT, INT, SMALLINT, TINYINT to DATE. The source value is interpreted as a number of days since the epoch. - from DATE to BIGINT, INT, SMALLINT, TINYINT. The resulting integer is the number of days since epoch. - from DOUBLE, FLOAT, DECIMAL to DATE. The source value's fractional part is ignored, the integer part is interpreted as a number of days since epoch. - from DATE to DOUBLE, FLOAT, DECIMAL. The resulting value is the number of days since epoch. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since both STRING -> DATE and STRING -> TIMESTAMP implicit conversions are possible, the function resolution logic was changed to select the right version of overloaded functions. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT fileformat only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/cast-functions.h M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/conditional-functions.h M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/decimal-operators.h M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/in-predicate-ir.cc M be/src/exprs/in-predicate.h M be/src/exprs/is-null-predicate-ir.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M be/src/exprs/null-literal.cc M be/src/exprs/null-literal.h M be/src/exprs/operators-ir.cc M be/src/exprs/operators.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr-ir.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M be/src/exprs/timestamp-functions.h M be/src/exprs/utility-functions-ir.cc M be/src/exprs/utility-functions.h M be/src/rpc/thrift-util.cc M be/src/runtime/date-test.cc M be/src/runtime/date-value.cc M be/src/runtime/date-value.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value-test.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.inline.h M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/service/fe-support.cc M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT fileformat only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from BIGINT, INT, SMALLINT, TINYINT to DATE. The source value is interpreted as a number of days since the epoch. - from DATE to BIGINT, INT, SMALLINT, TINYINT. The resulting integer is the number of days since epoch. - from DOUBLE, DECIMAL to DATE. The source value's fractional part is ignored, the integer part is interpreted as a number of days since epoch. - from DATE to DOUBLE, DECIMAL. The resulting value is the number of days since epoch. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since both STRING -> DATE and STRING -> TIMESTAMP implicit conversions are possible, the function resolution logic was changed to select the right version of overloaded functions. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT fileformat only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/cast-functions.h M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/conditional-functions.h M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/decimal-operators.h M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/in-predicate-ir.cc M be/src/exprs/in-predicate.h M be/src/exprs/is-null-predicate-ir.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M be/src/exprs/null-literal.cc M be/src/exprs/null-literal.h M be/src/exprs/operators-ir.cc M be/src/exprs/operators.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr-ir.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M be/src/exprs/timestamp-functions.h M be/src/exprs/utility-functions-ir.cc M be/src/exprs/utility-functions.h M be/src/rpc/thrift-util.cc M be/src/runtime/date-test.cc M be/src/runtime/date-value.cc M be/src/runtime/date-value.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value-test.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.inline.h M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/service/fe-support.cc M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Attila Jeges has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12481 Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT fileformat only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Explicit casting between DATE and other types: - from STRING to DATE. The string value must be formatted as -MM-dd. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from BIGINT, INT, SMALLINT, TINYINT to DATE. The source value is interpreted as a number of days since the epoch. - from DATE to BIGINT, INT, SMALLINT, TINYINT. The resulting integer is the number of days since epoch. - from DOUBLE, DECIMAL to DATE. The source value's fractional part is ignored, the integer part is interpreted as a number of days since epoch. - from DATE to DOUBLE, DECIMAL. The resulting value is the number of days since epoch. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since both STRING -> DATE and STRING -> TIMESTAMP implicit conversions are possible, the function resolution logic was changed to select the right version of overloaded functions. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT fileformat only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/aggregator.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/cast-functions.h M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/conditional-functions.h M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/decimal-operators.h M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/in-predicate-ir.cc M be/src/exprs/in-predicate.h M be/src/exprs/is-null-predicate-ir.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M be/src/exprs/null-literal.cc M be/src/exprs/null-literal.h M be/src/exprs/operators-ir.cc M be/src/exprs/operators.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr-ir.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M be/src/exprs/timestamp-functions.h M be/src/exprs/utility-functions-ir.cc M be/src/exprs/utility-functions.h M be/src/rpc/thrift-util.cc M be/src/runtime/date-test.cc M be/src/runtime/date-value.cc M be/src/runtime/date-value.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value-test.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.inline.h M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/service/fe-support.cc M
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/12481/1/be/src/exprs/utility-functions-ir.cc File be/src/exprs/utility-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/1/be/src/exprs/utility-functions-ir.cc@218 PS1, Line 218: template StringVal UtilityFunctions::TypeOf(FunctionContext* ctx, const DateVal& input_val); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/12481/1/tests/hs2/test_fetch.py File tests/hs2/test_fetch.py: http://gerrit.cloudera.org:8080/#/c/12481/1/tests/hs2/test_fetch.py@180 PS1, Line 180: ; flake8: E703 statement ends with a semicolon http://gerrit.cloudera.org:8080/#/c/12481/1/tests/query_test/test_date_queries.py File tests/query_test/test_date_queries.py: http://gerrit.cloudera.org:8080/#/c/12481/1/tests/query_test/test_date_queries.py@20 PS1, Line 20: from copy import copy flake8: F401 'copy.copy' imported but unused http://gerrit.cloudera.org:8080/#/c/12481/1/tests/query_test/test_date_queries.py@25 PS1, Line 25: from tests.common.test_vector import ImpalaTestDimension flake8: F401 'tests.common.test_vector.ImpalaTestDimension' imported but unused http://gerrit.cloudera.org:8080/#/c/12481/1/tests/query_test/test_date_queries.py@27 PS1, Line 27: class TestDateQueries(ImpalaTestSuite): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12481/1/tests/query_test/test_date_queries.py@37 PS1, Line 37: flake8: E203 whitespace before ':' http://gerrit.cloudera.org:8080/#/c/12481/1/tests/query_test/test_date_queries.py@38 PS1, Line 38: flake8: E203 whitespace before ':' http://gerrit.cloudera.org:8080/#/c/12481/1/tests/query_test/test_date_queries.py@39 PS1, Line 39: flake8: E203 whitespace before ':' http://gerrit.cloudera.org:8080/#/c/12481/1/tests/query_test/test_date_queries.py@41 PS1, Line 41: \ flake8: E502 the backslash is redundant between brackets -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 1 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 14 Feb 2019 16:09:04 + Gerrit-HasComments: Yes