[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 18: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 18
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Tue, 13 Mar 2018 22:10:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..

IMPALA-5315: Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date/time
format to timestamp. The supported lazy date formats are:
  -[M]M-[d]d
  -[M]M-[d]d [H]H:[m]m:[s]s[.S]
  [H]H:[m]m:[s]s[.S]

We will incur a SCAN performance penalty (approximately 1/2
TotalReadThroughput) when the string is in one of these
lazy date/time format.

Testing:
Benchmarked the performance consequence by executing this SQL on
a private build over 3.8 billion rows:
select min(cast (time_string as timestamp)) from private.impala_5315

Added tests for valid and invalid date/time format strings
in expr-test.cc to be inline with existing tests for CAST() function.

Added end-to-end tests into exprs.test and
select-lazy-timestamp.test to exercise the new function within
the context of a query.

Added tests to exercise the leading and trailing white space trimming
behaviour in default and lazy date/time string format (IMPALA-6630).

Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Reviewed-on: http://gerrit.cloudera.org:8080/7009
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
A testdata/data/lazy_timestamp.csv
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
A 
testdata/workloads/functional-query/queries/QueryTest/select-lazy-timestamp.test
M tests/query_test/test_scanners.py
7 files changed, 398 insertions(+), 7 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 19
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 18:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2092/


--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 18
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Tue, 13 Mar 2018 18:22:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 18: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 18
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Tue, 13 Mar 2018 18:22:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 17: Code-Review+1

Did anyone else want to take a look? I'm happy with the code and I think the 
test coverage is good.


--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 17
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Tue, 13 Mar 2018 00:48:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-12 Thread Vincent Tran (Code Review)
Hello Lars Volker, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7009

to look at the new patch set (#17).

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..

IMPALA-5315: Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date/time
format to timestamp. The supported lazy date formats are:
  -[M]M-[d]d
  -[M]M-[d]d [H]H:[m]m:[s]s[.S]
  [H]H:[m]m:[s]s[.S]

We will incur a SCAN performance penalty (approximately 1/2
TotalReadThroughput) when the string is in one of these
lazy date/time format.

Testing:
Benchmarked the performance consequence by executing this SQL on
a private build over 3.8 billion rows:
select min(cast (time_string as timestamp)) from private.impala_5315

Added tests for valid and invalid date/time format strings
in expr-test.cc to be inline with existing tests for CAST() function.

Added end-to-end tests into exprs.test and
select-lazy-timestamp.test to exercise the new function within
the context of a query.

Added tests to exercise the leading and trailing white space trimming
behaviour in default and lazy date/time string format (IMPALA-6630).

Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
A testdata/data/lazy_timestamp.csv
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
A 
testdata/workloads/functional-query/queries/QueryTest/select-lazy-timestamp.test
M tests/query_test/test_scanners.py
7 files changed, 398 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/17
--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 17
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-12 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 16:

(4 comments)

Changed the name of the e2e test to be more appropriate for its purpose.

http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/runtime/timestamp-parse-util.cc@28
PS16, Line 28: #include 
> Standard C++ headers go before the 3rd-party library headers. I.e. line 19
Done


http://gerrit.cloudera.org:8080/#/c/7009/16/testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test
File 
testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test:

http://gerrit.cloudera.org:8080/#/c/7009/16/testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test@3
PS16, Line 3: select cast(dt_str as timestamp) from lazy_dt_str
> should not need a cast here, we want to test the text scanner parsing the d
Done


http://gerrit.cloudera.org:8080/#/c/7009/16/testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test@4
PS16, Line 4:  RESULTS
> Let's make this RESULTS: VERIFY_IS_EQUAL_SORTED
Done


http://gerrit.cloudera.org:8080/#/c/7009/16/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/7009/16/tests/query_test/test_scanners.py@874
PS16, Line 874: STRING
> This should be a timestamp column, so we can make sure that the text scanne
Done



--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 16
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Tue, 13 Mar 2018 00:23:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/exprs/expr-test.cc@6176
PS16, Line 6176:   TestStringValue("monthname(cast('2011-02-18 09:10:11.00' 
as timestamp))", "February");
> nit: long lines. It might be easier to run the patch through clang-format t
NM, these were pulled in with a rebase



--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 16
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Mon, 12 Mar 2018 23:57:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 16:

I started a dry run of the merge just in case there were any test failures or 
clang-tidy warnings: https://jenkins.impala.io/job/gerrit-verify-dryrun/2086/


--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 16
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Mon, 12 Mar 2018 21:16:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 16:

(5 comments)

I think this should be the last round of comments.

http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/exprs/expr-test.cc@6176
PS16, Line 6176:   TestStringValue("monthname(cast('2011-02-18 09:10:11.00' 
as timestamp))", "February");
nit: long lines. It might be easier to run the patch through clang-format to 
fix some of these whitespace and include ordering issues:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536


http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/runtime/timestamp-parse-util.cc@28
PS16, Line 28: #include 
Standard C++ headers go before the 3rd-party library headers. I.e. line 19 
above.


http://gerrit.cloudera.org:8080/#/c/7009/16/testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test
File 
testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test:

http://gerrit.cloudera.org:8080/#/c/7009/16/testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test@3
PS16, Line 3: select cast(dt_str as timestamp) from lazy_dt_str
should not need a cast here, we want to test the text scanner parsing the 
datetime column directly


http://gerrit.cloudera.org:8080/#/c/7009/16/testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test@4
PS16, Line 4:  RESULTS
Let's make this RESULTS: VERIFY_IS_EQUAL_SORTED

We don't guarantee the order of returned rows without an order by clause 
(although in practice it will be in the file order for this query plan).


http://gerrit.cloudera.org:8080/#/c/7009/16/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/7009/16/tests/query_test/test_scanners.py@874
PS16, Line 874: STRING
This should be a timestamp column, so we can make sure that the text scanner 
does the conversion correctly.



--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 16
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Mon, 12 Mar 2018 21:15:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-10 Thread Vincent Tran (Code Review)
Hello Lars Volker, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7009

to look at the new patch set (#16).

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..

IMPALA-5315: Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date/time
format to timestamp. The supported lazy date formats are:
  -[M]M-[d]d
  -[M]M-[d]d [H]H:[m]m:[s]s[.S]
  [H]H:[m]m:[s]s[.S]

We will incur a SCAN performance penalty (approximately 1/2
TotalReadThroughput) when the string is in one of these
lazy date/time format.

Testing:
Benchmarked the performance consequence by executing this SQL on
a private build over 3.8 billion rows:
select min(cast (time_string as timestamp)) from private.impala_5315

Added tests for valid and invalid date/time format strings
in expr-test.cc to be inline with existing tests for CAST() function.

Added end-to-end tests into exprs.test and
cast-lazy-datetime-string.test to exercise the new function within
the context of a query.

Added tests to exercise the leading and trailing white space trimming
behaviour in default and lazy date/time string format (IMPALA-6630).

Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
A testdata/data/lazy_timestamp.csv
A 
testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/query_test/test_scanners.py
7 files changed, 400 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/16
--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 16
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-10 Thread Vincent Tran (Code Review)
Hello Lars Volker, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7009

to look at the new patch set (#15).

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..

IMPALA-5315: Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date/time
format to timestamp. The supported lazy date formats are:
  -[M]M-[d]d
  -[M]M-[d]d [H]H:[m]m:[s]s[.S]
  [H]H:[m]m:[s]s[.S]

We will incur a SCAN performance penalty (approximately 1/2
TotalReadThroughput) when the string is in one of these
lazy date/time format.

Testing:
Benchmarked the performance consequence by executing this SQL on
a private build over 3.8 billion rows:
select min(cast (time_string as timestamp)) from private.impala_5315

Added tests for valid and invalid date/time format strings
in expr-test.cc to be inline with existing tests for CAST() function.

Added end-to-end tests into exprs.test and
cast-lazy-datetime-string.test to exercise the new function within
the context of a query.

Added tests to exercise the leading and trailing white space trimming
behaviour in default and lazy date/time string format (IMPALA-6630).

Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
A testdata/data/lazy_timestamp.csv
A 
testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M tests/query_test/test_scanners.py
7 files changed, 399 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/15
--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 15
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-10 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 14:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/7009/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7009/14//COMMIT_MSG@25
PS14, Line 25: in expr-test.cc to be inline with existing tests for CAST() 
function.
> I think we still need some end-to-end tests that exercise the new function
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/exprs/expr-test.cc@3086
PS14, Line 3086:   TestIsNull("cast('1909-10-2 12:32:1.111.111.2' as 
timestamp)", TYPE_TIMESTAMP);
> I spent some time trying to break the code. No luck yet. I realised we shou
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.h@182
PS14, Line 182: parse
> nit: "to be parsed" here and below.
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@217
PS14, Line 217: const char* TimestampParser::ParseDigitToken(const char* str, 
const char* str_end){
> nit: needs space
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@229
PS14, Line 229:   while(tok_end < str_end) {
> nit: needs space
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@236
PS14, Line 236: bool 
TimestampParser::ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx){
> nit: needs space
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@239
PS14, Line 239:   DCHECK(dt_ctx->fmt_len > 0);
> DCHECK_GT(dt_ctx->fmt_len, 0);
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@240
PS14, Line 240: size
> DCHECK_EQ(dt_ctx->toks.size(), 0);
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@249
PS14, Line 249: dt_ctx->toks.push_back(DateTimeFormatToken(YEAR,str - 
str_begin, tok_end - str,
> nit: needs space
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@250
PS14, Line 250: str));
> nit: replace tab with spaces
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@263
PS14, Line 263: dt_ctx->toks.push_back(DateTimeFormatToken(MONTH_IN_YEAR, 
str - str_begin, tok_end - str,
> nit: > 90 chars
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@283
PS14, Line 283:  dt_ctx->has_date_toks
> We can just return true here, right?
Indeed. Fixed.


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@340
PS14, Line 340: factional
> typo: fractional
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@346
PS14, Line 346: if (tok_end - str > 9) {
> This is really equivalent to the following, right?
That is definitely more succinct. Fixed.


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@357
PS14, Line 357: dt_ctx->has_time_toks
> this can just be 'true', right?
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@457
PS14, Line 457: NULL
> nit: nullptr
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@458
PS14, Line 458:   lazy_ctx.Reset(str, lazy_len);
> It's annoying that we have to pay the cost of initializing lazy_ctx when we
Done


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@460
PS14, Line 460:   dt_ctx = _ctx;
> nit: indentation
Done



--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 14
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Sat, 10 Mar 2018 23:18:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 14:

(17 comments)

It looks like we're getting close.

http://gerrit.cloudera.org:8080/#/c/7009/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7009/14//COMMIT_MSG@25
PS14, Line 25: in expr-test.cc to be inline with existing tests for CAST() 
function.
I think we still need some end-to-end tests that exercise the new function in 
the context of a query, and probably also one that scans a text table with this 
format. Let me know if I can help with setting those up.


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.h@182
PS14, Line 182: parse
nit: "to be parsed" here and below.


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@217
PS14, Line 217: const char* TimestampParser::ParseDigitToken(const char* str, 
const char* str_end){
nit: needs space


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@229
PS14, Line 229:   while(tok_end < str_end) {
nit: needs space


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@236
PS14, Line 236: bool 
TimestampParser::ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx){
nit: needs space


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@239
PS14, Line 239:   DCHECK(dt_ctx->fmt_len > 0);
DCHECK_GT(dt_ctx->fmt_len, 0);


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@240
PS14, Line 240: size
DCHECK_EQ(dt_ctx->toks.size(), 0);


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@249
PS14, Line 249: dt_ctx->toks.push_back(DateTimeFormatToken(YEAR,str - 
str_begin, tok_end - str,
nit: needs space


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@250
PS14, Line 250: str));
nit: replace tab with spaces


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@263
PS14, Line 263: dt_ctx->toks.push_back(DateTimeFormatToken(MONTH_IN_YEAR, 
str - str_begin, tok_end - str,
nit: > 90 chars


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@283
PS14, Line 283:  dt_ctx->has_date_toks
We can just return true here, right?


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@340
PS14, Line 340: factional
typo: fractional


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@346
PS14, Line 346: if (tok_end - str > 9) {
This is really equivalent to the following, right?

  int num_digits = min(9, tok_end - str);
  dt_ctx->toks.push_back(DateTimeFormatToken(FRACTION, str - str_begin, 
num_digits, str));


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@357
PS14, Line 357: dt_ctx->has_time_toks
this can just be 'true', right?


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@457
PS14, Line 457: NULL
nit: nullptr


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@458
PS14, Line 458:   lazy_ctx.Reset(str, lazy_len);
It's annoying that we have to pay the cost of initializing lazy_ctx when we 
don't use it. It's probably not that bad, but I'd rather that the cost of this 
extra functionality was as close to zero as possible. Can we avoid that by 
restructuring this as:

  if (dt_ctx != nullptr) {
return Parse(str, len, *dt_ctx, d, t);
  } else {
DateTimeFormatContext lazy_ctx;
if (ParseFormatTokensByStr(_ctx)) {
  // Use lazy_ctx
} else {

  *d = boost::gregorian::date();
  *t = boost::posix_time::time_duration(boost::posix_time::not_a_date_time);
  return false;
}
  }


http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@460
PS14, Line 460:   dt_ctx = _ctx;
nit: indentation



--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 14
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Thu, 08 Mar 2018 01:42:29 +

[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-03-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 14:

I've been neglecting reviews - need to get back to this one.


--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 14
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Sat, 03 Mar 2018 01:20:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-02-10 Thread Vincent Tran (Code Review)
Hello Lars Volker, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7009

to look at the new patch set (#14).

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..

IMPALA-5315: Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date/time
format to timestamp. The supported lazy date formats are:
  -[M]M-[d]d
  -[M]M-[d]d [H]H:[m]m:[s]s[.S]
  [H]H:[m]m:[s]s[.S]

We will incur a SCAN performance penalty (approximately 1/2
TotalReadThroughput) when the string is in one of these
lazy date/time format.

Testing:
Benchmarked the performance consequence by executing this SQL on
a private build over 3.8 billion rows:
select min(cast (time_string as timestamp)) from private.impala_5315

Added tests for valid and invalid date/time format strings
in expr-test.cc to be inline with existing tests for CAST() function.

Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 224 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/14
--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 14
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-02-10 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 12:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/exprs/expr-test.cc@3010
PS12, Line 3010:   TimestampValue::Parse("2001-01-21 
01:05:1.11211"));
> There's some attempt in the code to handle trailing timezone offsets. Can w
Removed timezone offset support as it was not part of the request from the Jira.


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h@179
PS12, Line 179:   /// Parse a default date/time string. The default timestamp 
format is:
> This comment doesn't seem to match the function.
Looks like I inserted my function header between a comment and its function.
Fixed


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h@191
PS12, Line 191:   static bool ParseFormatTokensByStr(DateTimeFormatContext* 
dt_ctx);
> Nit: add a space before the next function for consistency.
Done


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@218
PS12, Line 218: bool 
TimestampParser::ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx){
> I'm really struggling to understand this function, particularly in what cas
Done.
Changed to serial parsing.


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@229
PS12, Line 229: if ((*str == 'T') || (*str == 'Z') || (!isalnum(*str))) {
> Do we really need to handle arbitrary separators? It seems like this will a
No. Changed to only accept '-' for date separator, ':' for time separator and 
'.' for the fractional second separator.


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@232
PS12, Line 232: dt_ctx->toks.push_back(DateTimeFormatToken(TZ_OFFSET, 
str - str_begin,
> Do we need to support timezone offsets here?
I guess we don't have to - based on the Jira's description.
Removed.


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@282
PS12, Line 282: if (tok_len == 1) ++dt_ctx->fmt_out_len;
> Do we need to compute fmt_out_len for this case?
No. Removed


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@292
PS12, Line 292:   // lazy_ctx is made local so that we do not run into races 
where
> We can remove this comment (it's interesting but won't have much future val
Done


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@395
PS12, Line 395:
> nit: trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@461
PS12, Line 461: date TimestampParser::RealignYear(const DateTimeParseResult& 
dt_result,
> nit: can you move this back to the previous location. It's annoying that th
Done



--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 12
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Sat, 10 Feb 2018 23:59:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-02-10 Thread Vincent Tran (Code Review)
Hello Lars Volker, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7009

to look at the new patch set (#13).

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..

IMPALA-5315: Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date/time
format to timestamp. The supported lazy date formats are:
  -[M]M-[d]d
  -[M]M-[d]d [H]H:[m]m:[s]s[.S]
  [H]H:[m]m:[s]s[.S]

We will incur a SCAN performance penalty (approximately 1/2
TotalReadThroughput) when the string is in one of these
lazy date/time format.

Testing:
Benchmarked the performance consequence by executing this SQL on
a private build over 3.8 billion rows:
select min(cast (time_string as timestamp)) from private.impala_5315

Added tests for valid and invalid date/time format strings
in expr-test.cc to be inline with existing tests for CAST() function.

Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 223 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/13
--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 13
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2018-01-24 Thread Vincent Tran (Code Review)
Hello Lars Volker, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7009

to look at the new patch set (#12).

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..

IMPALA-5315: Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date/time
format to timestamp. The supported lazy date formats are:
  -[M]M-[d]d
  -[M]M-[d]d [H]H:[m]m:[s]s[.S]
  -[M]M-[d]dT[H]H:[m]m:[s]s[.S]

We will also take a variety of separators (including "-",
"/", "T", "Z", ":")

We will incur a SCAN performance penalty (approximately 1/2
TotalReadThroughput) when the string is in one of these
lazy date/time format.

Testing:
Benchmarked the performance consequence by executing this SQL on
a private build over 3.8 billion rows:
select min(cast (time_string as timestamp)) from private.impala_5315

Added tests for valid and invalid date/time format strings
in expr-test.cc to be inline with existing tests for CAST() function.

Finally, this change also moves the following two functions to be
adjacent to each other for readability:

Parse(const char* str, int len, boost::gregorian::date* d,
  boost::posix_time::time_duration* t);

Parse(const char* str, int len, const DateTimeFormatContext& dt_ctx,
  boost::gregorian::date* d, boost::posix_time::time_duration* t);

Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 156 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/12
--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 12
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2017-12-17 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 10:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7009/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7009/10//COMMIT_MSG@39
PS10, Line 39: Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
> I think we need some additional tests for bad data and some query tests tha
Sorry about that. Fixed obvious crashes due to perpetual retry and wrong result 
bugs (in the fractional component) due to a persistent CACHED_CTX after 
previous failed parse().

Added bad data tests.

When you say "query tests that scan tables with mixed-format" do you mean a 
private test on my console, or actually creating a mixed-format table in one of 
the test databases as part of the patch?


http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.h@267
PS10, Line 267:   static DateTimeFormatContext LAZY_CTX;
> Having this as a static variable doesn't seem right, since it will be share
This is an interesting challenge that I did not take into account.

I have made lazy_ctx local and CACHED_CTX global.
We accept that CACHED_CTX can be modified by multiple threads. In which case 
thread-local lazy_ctx will need to be re-parsed.

1) In most scenarios, all values in a data/time string column should share the 
same format, CACHED_CTX will unlikely to be modified.

2) In scenarios where values in date/time string columns are of mixed format, 
CAHCED_CTX will be a best-effort optimization. In the worst case, CACHED_CTX 
will always fail to match the current format so we will take the same perf. hit 
as the path without this optimization.


http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@378
PS10, Line 378: LIKELY
> LIKELY doesn't seem right here - I think we can just remove it.
Done


http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@381
PS10, Line 381: if (LAZY_CTX.fmt != NULL) dt_ctx = _CTX;
> nit: we use braces on both branches if there is an else.
Done


http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@383
PS10, Line 383: LAZY_CTX.Reset(str, len);
> nit: missing indentation.
Done



--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 10
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Mon, 18 Dec 2017 01:51:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2017-12-17 Thread Vincent Tran (Code Review)
Hello Lars Volker, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7009

to look at the new patch set (#11).

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..

IMPALA-5315: Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date
format to timestamp. The supported lazy date formats are:
  -[M]M-[d]d
  -[M]M-[d]d [H]H:[m]m:[s]s[.S]
  -[M]M-[d]dT[H]H:[m]m:[s]s[.S]

We will also take a variety of separators (including "-",
"/", "T", "Z", ":")

The parse logic will:
1) Try to use the default template timestamp formats first since it is
more performant.
2) Try to reuse the previously generated lazy
timestamp format (if exists) to maintain comparable SCAN performance if
all values in column have the same timestamp format.
3) Discover the new template format if both prior paths fail.

In the worst case, where we have to generate a new timestamp format for
each value in a column (columns with variable timestamp format), we will
incur a SCAN performance penalty (approximately 1/2
TotalReadThroughput).

Finally, this change also moves the following two functions to be
adjacent to each other for readability:

Parse(const char* str, int len, boost::gregorian::date* d,
  boost::posix_time::time_duration* t);

Parse(const char* str, int len, const DateTimeFormatContext& dt_ctx,
  boost::gregorian::date* d, boost::posix_time::time_duration* t);
Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 205 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/11
--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 11
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2017-12-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 10:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7009/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7009/10//COMMIT_MSG@39
PS10, Line 39: Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
I think we need some additional tests for bad data and some query tests that 
scan tables with mixed-format timestamps to test that it correctly switches to 
a different context.

I was able to crash things with the current patch fairly easily. Repro:

drop table if exists timestamp_strs;
create table timestamp_strs as select '2001-1-2' as str;
insert into timestamp_strs select '2001-10-2';
insert into timestamp_strs select * from timestamp_strs;
insert into timestamp_strs select * from timestamp_strs;
insert into timestamp_strs select * from timestamp_strs;
insert into timestamp_strs select * from timestamp_strs;
insert into timestamp_strs select '2001-foo-bar';
select str, cast(str as timestamp) from timestamp_strs;


http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.h@267
PS10, Line 267:   static DateTimeFormatContext LAZY_CTX;
Having this as a static variable doesn't seem right, since it will be shared 
between multiple threads, which will updated it. I don't see a particularly 
easy place to put the last context so that it's local to the thread/column 
that's executing Parse().

I'm wondering if it's just easiest to have lazy_ctx be a local variable that we 
create on the fly - we'd take a 50% perf hit for the lazy formats according to 
your benchmark but it's not clear to me that the optimisation is worth the 
extra complexity of having to manage LAZY_CTX.


http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@378
PS10, Line 378: LIKELY
LIKELY doesn't seem right here - I think we can just remove it.


http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@381
PS10, Line 381: if (LAZY_CTX.fmt != NULL) dt_ctx = _CTX;
nit: we use braces on both branches if there is an else.


http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@383
PS10, Line 383: LAZY_CTX.Reset(str, len);
nit: missing indentation.



--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 10
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Thu, 14 Dec 2017 23:21:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2017-12-13 Thread Vincent Tran (Code Review)
Hello Lars Volker, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7009

to look at the new patch set (#10).

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..

IMPALA-5315: Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date
format to timestamp. The supported lazy date formats are:
  -[M]M-[d]d
  -[M]M-[d]d [H]H:[m]m:[s]s[.S]
  -[M]M-[d]dT[H]H:[m]m:[s]s[.S]

We will also take a variety of separators (including "-",
"/", "T", "Z", ":")

The parse logic will:
1) Try to use the default template timestamp formats first since it is
more performant.
2) Try to reuse the previously generated lazy
timestamp format (if exists) to maintain comparable SCAN performance if
all values in column have the same timestamp format.
3) Discover the new template format if both prior paths fail.

In the worst case, where we have to generate a new timestamp format for
each value in a column (columns with variable timestamp format), we will
incur a SCAN performance penalty (approximately 1/2
TotalReadThroughput).

Finally, this change also moves the following two functions to be
adjacent to each other for readability:

Parse(const char* str, int len, boost::gregorian::date* d,
  boost::posix_time::time_duration* t);

Parse(const char* str, int len, const DateTimeFormatContext& dt_ctx,
  boost::gregorian::date* d, boost::posix_time::time_duration* t);
Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 152 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/10
--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 10
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2017-12-13 Thread Vincent Tran (Code Review)
Hello Lars Volker, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7009

to look at the new patch set (#8).

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..

IMPALA-5315: Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date
format to timestamp. The supported lazy date formats are:
  -[M]M-[d]d
  -[M]M-[d]d [H]H:[m]m:[s]s[.S]
  -[M]M-[d]dT[H]H:[m]m:[s]s[.S]

We will also take a variety of separators (including "-",
"/", "T", "Z", ":")

The parse logic will:
1) Try to use the default template timestamp formats first since it is
more performant.
2) Try to reuse the previously generated lazy
timestamp format (if exists) to maintain comparable SCAN performance if
all values in column have the same timestamp format.
3) Discover the new template format if both prior paths fail.

In the worst case, where we have to generate a new timestamp format for
each value in a column (collumns withvariable timestamp format), we
will incur a SCAN performance decrease (approximately 1/2
TotalReadThroughput).

Finally, this change also moves the following two functions to be
adjacent to each other for readability:

Parse(const char* str, int len, boost::gregorian::date* d,
  boost::posix_time::time_duration* t);

Parse(const char* str, int len, const DateTimeFormatContext& dt_ctx,
  boost::gregorian::date* d, boost::posix_time::time_duration* t);

Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 199 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/8
--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 8
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2017-12-13 Thread Vincent Tran (Code Review)
Hello Lars Volker, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7009

to look at the new patch set (#7).

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..

IMPALA-5315: Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date
format to timestamp. The supported lazy date formats are:
  -[M]M-[d]d
  -[M]M-[d]d [H]H:[m]m:[s]s[.S]
  -[M]M-[d]dT[H]H:[m]m:[s]s[.S]

We will also take a variety of separators (including "-",
"/", "T", "Z", ":")

The parse logic will:
1) Try to use the default template timestamp formats
first since it is more performant.
2) Try to reuse the previously generated lazy
timestamp format (if exists) to maintain comparable
SCAN performance if all values in column have the
same timestamp format.
3) Discover the new template format if both prior paths
fail.

In the worst case, where we have to generate a new
timestamp format for each value in a column (collumns with
variable timestamp format), we will incur a SCAN performance
decrease (apprimately 1/2 TotalReadThroughput).

Finally, this change also moves the following
two functions to be adjacent to each other:

Parse(const char* str, int len, boost::gregorian::date* d,
  boost::posix_time::time_duration* t);

Parse(const char* str, int len, const DateTimeFormatContext& dt_ctx,
  boost::gregorian::date* d, boost::posix_time::time_duration* t);

Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 199 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/7
--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 7
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2017-12-13 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 6:

The alternative here is that we only take this perf hit when non-default format 
is used and still maintain the previous default templates.


--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 6
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Wed, 13 Dec 2017 16:50:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2017-12-13 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 6:

I incorporated the same heuristic into ParseFormatTokens() to eliminate the 
usage of default format strings. We observe some marginal gain of 16% in 
TotalReadThroughput:

- TotalReadThroughput: 11.69 MB/sec


@Tim, since you wrote the original ParseFormatTokens(), do you see any 
additional places where we can achieve more marginal gain?
Also, do you prefer the rewrite of the existing function or creating a new one 
such as in the current patch set?


--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 6
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Wed, 13 Dec 2017 16:40:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2017-12-13 Thread Vincent Tran (Code Review)
Hello Lars Volker, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7009

to look at the new patch set (#6).

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..

IMPALA-5315: Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date
format to timestamp. The supported lazy date formats are:
  -[M]M-[d]d
  -[M]M-[d]d [H]H:[m]m:[s]s[.S]
  -[M]M-[d]dT[H]H:[m]m:[s]s[.S]

Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 137 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/6
--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 6
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2017-12-11 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 5:

Looks like the slow down is due to the 2x slower TotalReadThroughput from the 
SCAN node due to LAZY_CTX.Reset() being called for each value vs. called once 
for each DateTimeFormatContext in TimestampParser::Init() when the format 
string is stamped out.

New path
- TotalReadThroughput: 10.03 MB/sec
Old path
- TotalReadThroughput: 20.75 MB/sec

Will look for alternatives.


--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 5
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Tue, 12 Dec 2017 02:20:59 +
Gerrit-HasComments: No