[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11990 )

Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a 
non-running query
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1451/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Gerrit-Change-Number: 11990
Gerrit-PatchSet: 4
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 28 Nov 2018 07:13:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7759: Add Levenshtein edit distance built-in function

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11793 )

Change subject: IMPALA-7759: Add Levenshtein edit distance built-in function
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1450/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I549d33ab7cebfa10db2934461c8ec91e2cc1cdcb
Gerrit-Change-Number: 11793
Gerrit-PatchSet: 3
Gerrit-Owner: Greg Rahn 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 28 Nov 2018 06:56:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11990 )

Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a 
non-running query
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3502/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Gerrit-Change-Number: 11990
Gerrit-PatchSet: 5
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 28 Nov 2018 06:27:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11990 )

Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a 
non-running query
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Gerrit-Change-Number: 11990
Gerrit-PatchSet: 5
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 28 Nov 2018 06:27:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

2018-11-27 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11990 )

Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a 
non-running query
..


Patch Set 4: Code-Review+2

Trivial test fix for for non-TTY stdin.


--
To view, visit http://gerrit.cloudera.org:8080/11990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Gerrit-Change-Number: 11990
Gerrit-PatchSet: 4
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 28 Nov 2018 06:26:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

2018-11-27 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/11990 )

Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a 
non-running query
..

IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

This patch fixes the issue with Ctrl+C handling for cancelling a
non-running query to behave similar to Linux shell.

Before (pressing Ctrl+C does not do anything):
[localhost:21000] default> select

After (pressing Ctrl+C cancels the query and starts a new prompt):
[localhost:21000] default> select^C
[localhost:21000] default>

Testing:
- Added a new cancellation test
- Ran all shell E2E tests

Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
---
M shell/impala_shell.py
M tests/custom_cluster/test_client_ssl.py
M tests/shell/test_shell_interactive.py
3 files changed, 8 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/11990/4
--
To view, visit http://gerrit.cloudera.org:8080/11990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Gerrit-Change-Number: 11990
Gerrit-PatchSet: 4
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-7759: Add Levenshtein edit distance built-in function

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11793 )

Change subject: IMPALA-7759: Add Levenshtein edit distance built-in function
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11793/3/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/11793/3/be/src/exprs/string-functions-ir.cc@1127
PS3, Line 1127:   // 
https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C++
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/11793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I549d33ab7cebfa10db2934461c8ec91e2cc1cdcb
Gerrit-Change-Number: 11793
Gerrit-PatchSet: 3
Gerrit-Owner: Greg Rahn 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 28 Nov 2018 06:21:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7759: Add Levenshtein edit distance built-in function

2018-11-27 Thread Greg Rahn (Code Review)
Greg Rahn has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/11793 )

Change subject: IMPALA-7759: Add Levenshtein edit distance built-in function
..

IMPALA-7759: Add Levenshtein edit distance built-in function

This patch adds new built-in functions to calculate Levenshtein edit
distance. Implemented as levenshtein() to match PostgreSQL in
both functionality and name and also added le_dst() alias for Netezza,
compatibility, but note that levenshtein() differs in functionality in
that if either value is NULL or both values are NULL, levenshtein()
returns NULL, where Netezza's le_dst() returns the length of the not
NULL value or 0 if both values are NULL.

Testing:
- Added unit tests to expr-test.cc
- Manual test on 966289 string pairs and results match PostgreSQL
- Added changes to qgen tests for PostgreSQL comparison

Change-Id: I549d33ab7cebfa10db2934461c8ec91e2cc1cdcb
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
A tests/comparison/compat.py
M tests/comparison/discrepancy_searcher.py
M tests/comparison/funcs.py
7 files changed, 128 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/11793/3
--
To view, visit http://gerrit.cloudera.org:8080/11793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I549d33ab7cebfa10db2934461c8ec91e2cc1cdcb
Gerrit-Change-Number: 11793
Gerrit-PatchSet: 3
Gerrit-Owner: Greg Rahn 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7870: Increase the timeout in test v1 catalog

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11997 )

Change subject: IMPALA-7870: Increase the timeout in test_v1_catalog
..

IMPALA-7870: Increase the timeout in test_v1_catalog

TestAutomaticCatalogInvalidation.test_v1_catalog need to wait for a
predefined time for the invalidation to take effect. The test is flaky
recently because of it. This patch increates the timeout by 2.5x.

Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b
Reviewed-on: http://gerrit.cloudera.org:8080/11997
Reviewed-by: Todd Lipcon 
Tested-by: Impala Public Jenkins 
---
M tests/custom_cluster/test_automatic_invalidation.py
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/11997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b
Gerrit-Change-Number: 11997
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-7870: Increase the timeout in test v1 catalog

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11997 )

Change subject: IMPALA-7870: Increase the timeout in test_v1_catalog
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/11997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b
Gerrit-Change-Number: 11997
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 28 Nov 2018 04:25:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11851 )

Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/11851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2
Gerrit-Change-Number: 11851
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Wed, 28 Nov 2018 02:48:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11851 )

Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false
..

IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false

Since IMPALA-6812, we've run many of our tests against Kudu at the
READ_AT_SNAPSHOT scan level, which ensures consistent results. This
scan level is only supported if Kudu is run with the flag
--use_hybrid_clock=true (which is the default).

This patch uses the Kudu master webui to detect when use_hybrid_clock
is false and skips these tests.

Follow up work will address allowing these tests to run regardless of
the value of the flag.

Testing:
- Ran a full exhaustive build with use_hybrid_clock=false set in the
  minicluster.

Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2
Reviewed-on: http://gerrit.cloudera.org:8080/11851
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M bin/impala-config.sh
M tests/common/kudu_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_kudu.py
M tests/metadata/test_ddl.py
M tests/query_test/test_kudu.py
6 files changed, 52 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/11851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2
Gerrit-Change-Number: 11851
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Thomas Marshall 


[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file

2018-11-27 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11227 )

Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per 
file
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11227/6/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java:

http://gerrit.cloudera.org:8080/#/c/11227/6/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@392
PS6, Line 392: OpType.GET_FILE_BLOCK_LOCATIONS.getSymbol()));
> is this test working as a proper regression test now that you switched to t
Can't quite explain yet what the test is doing. Did trace though the code and 
verified the calls are working as you intended. So, this code does exercise the 
incremental refresh path, which is good.

The mystery is that, before I fixed that flipped if/then statement, the test 
also passed. So, I need to figure out why the metrics come out correct either 
way. But, didn't want to hold up the customer patch because of that bit of 
curiosity.



--
To view, visit http://gerrit.cloudera.org:8080/11227
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956
Gerrit-Change-Number: 11227
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 28 Nov 2018 02:38:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates

2018-11-27 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12000 )

Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12000/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/12000/1/be/src/runtime/coordinator-backend-state.cc@502
PS1, Line 502: ToStringFromUnixMillis(last_report_time_ms_)
An alternate approach would be to just store 'last_report_time_ms_' as string 
and convert it using ToStringFromUnixMillis() when exporting the profile as 
Thrift object or when we are pretty printing it.

That said, it's unclear whether the complication is worth it given this is not 
in the critical path of returning query results.



--
To view, visit http://gerrit.cloudera.org:8080/12000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa
Gerrit-Change-Number: 12000
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 28 Nov 2018 01:24:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2343: Add lifecycle timeline to plan nodes

2018-11-27 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11992 )

Change subject: IMPALA-2343: Add lifecycle timeline to plan nodes
..


Patch Set 7:

I also noted that a lot of Open() and GetNext() implementations have similar 
preambles, e.g.

  SCOPED_TIMER(runtime_profile_->total_time_counter());
  ScopedGetNextEventAdder ea(this, eos);
  DCHECK(row_batch != NULL);
  RETURN_IF_ERROR(ExecDebugAction(TExecNodePhase::GETNEXT, state));
  RETURN_IF_CANCELLED(state);
  RETURN_IF_ERROR(QueryMaintenance(state));

Some leave parts of those out, though it's not clear whether that happens on 
purpose. Should we file a follow-up Jira to unify the preambles, possibly with 
some generic macro?


--
To view, visit http://gerrit.cloudera.org:8080/11992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653
Gerrit-Change-Number: 11992
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 28 Nov 2018 01:41:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12000 )

Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1449/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa
Gerrit-Change-Number: 12000
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 28 Nov 2018 01:46:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2343: Add lifecycle timeline to plan nodes

2018-11-27 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11992 )

Change subject: IMPALA-2343: Add lifecycle timeline to plan nodes
..


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node-util.h
File be/src/exec/exec-node-util.h:

http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node-util.h@57
PS7, Line 57: Fetched
nit: maybe reword to "First Batch Requested" to indicate that it is still in 
progress?


http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node.h@260
PS7, Line 260: were
nit: was


http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/hbase-scan-node.cc
File be/src/exec/hbase-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/hbase-scan-node.cc@153
PS7, Line 153:   // For GetNext, most of the time is spent in 
HBaseTableScanner::ResultScanner_next,
 :   // but there's still some considerable time inside here.
I don't think this comment helps much and would be good with dropping it 
altogether.


http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/kudu-scan-node.cc@96
PS7, Line 96:   SCOPED_TIMER(materialize_tuple_timer());
Can you add a brief comment why this timer gets initialized after the preamble?


http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/singular-row-src-node.cc
File be/src/exec/singular-row-src-node.cc:

http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/singular-row-src-node.cc@33
PS7, Line 33:   ScopedOpenEventAdder ea(this);
>From the FE comment in SingularRowSrcNode is seems that this will always run 
>in a subplan:

 A SingularRowSrcNode returns the current row that is being processed by its
 containing SubplanNode. A SingularRowSrcNode can only appear in the plan tree
 of a SubplanNode. A SingularRowSrcNode returns its parent's smap such that
 substitutions are appropriately applied within the SubplanNode's second child.

In that case, do we need to instrument this method at all?


http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/unnest-node.cc
File be/src/exec/unnest-node.cc:

http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/unnest-node.cc@142
PS7, Line 142:   // Avoid expensive query maintenance overhead for small 
collections.
Did you omit the instrumentation here because it is expensive?



--
To view, visit http://gerrit.cloudera.org:8080/11992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653
Gerrit-Change-Number: 11992
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 28 Nov 2018 01:39:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7823: Clean up unused Java imports

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11987 )

Change subject: IMPALA-7823: Clean up unused Java imports
..

IMPALA-7823: Clean up unused Java imports

Cleans up unused Java imports.

Testing: This is only a source-level change. Verified that the FE builds
correctly. Ran FE tests.

Change-Id: Ic1820a7f9e55dc168510987520949026169f445e
Reviewed-on: http://gerrit.cloudera.org:8080/11987
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java
M fe/src/main/java/org/apache/impala/analysis/PlanHint.java
M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
M fe/src/main/java/org/apache/impala/authorization/SentryAuthProvider.java
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M 
fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ValueRange.java
M fe/src/main/java/org/apache/impala/rewrite/FoldConstantsRule.java
M fe/src/main/java/org/apache/impala/rewrite/RemoveRedundantStringCast.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/main/java/org/apache/impala/util/NativeLogger.java
M fe/src/test/java/org/apache/impala/analysis/StmtMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/planner/S3PlannerTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
M fe/src/test/java/org/apache/impala/util/JniUtilTest.java
M fe/src/test/java/org/apache/impala/util/TestTopNCache.java
39 files changed, 6 insertions(+), 87 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/11987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic1820a7f9e55dc168510987520949026169f445e
Gerrit-Change-Number: 11987
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-7823: Clean up unused Java imports

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11987 )

Change subject: IMPALA-7823: Clean up unused Java imports
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/11987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1820a7f9e55dc168510987520949026169f445e
Gerrit-Change-Number: 11987
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 28 Nov 2018 01:17:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12000 )

Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py@112
PS1, Line 112: r
flake8: F841 local variable 'results' is assigned to but never used



-- 
To view, visit http://gerrit.cloudera.org:8080/12000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa
Gerrit-Change-Number: 12000
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 28 Nov 2018 01:07:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates

2018-11-27 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12000


Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates
..

IMPALA-6741: Add timestamp of fragment instance's status updates

Currently, the profile of a running query doesn't contain any
timestamps for the last updates from the fragment instances.
This makes it hard to differentiate between when a fragment
instance failed to send status reports to the coordinator
for various reasons (e.g. IMPALA-2990) or a truly stuck
fragment instance.

This change adds a timestamp to a fragment instance's profile
to record the time when the coordinator last received a status
update from it. Note that it's possible that there is delay
between when the status was created on the executor and when
it arrived at the coordinator. Given that the clocks are not
necessarily synchronized across all executors, the receiving
time of the update at the coordinator seems easier to make sense of.

Sample output:

Fragment F01:
  Instance 494d948d3235441a:23eae1790001 (host=???):(Total: 15.099ms, 
non-child: 263.951us, % non-child: 1.75%)
Last report time: 2018-11-27 16:57:30.014
Hdfs split stats (:<# splits>/): 0:1/1.58 KB
Fragment Instance Lifecycle Event Timeline: 15.622ms
   - Prepare Finished: 1.026ms (1.026ms)
   - Open Finished: 1.137ms (110.297us)
   - First Batch Produced: 15.010ms (13.873ms)
   - First Batch Sent: 15.080ms (70.715us)
   - ExecInternal Finished: 15.622ms (541.181us)

Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M tests/query_test/test_observability.py
5 files changed, 85 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/12000/1
--
To view, visit http://gerrit.cloudera.org:8080/12000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa
Gerrit-Change-Number: 12000
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11967 )

Change subject: IMPALA-1048: show sinks in exec summary
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1448/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11967
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544
Gerrit-Change-Number: 11967
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 28 Nov 2018 00:37:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6924: Add child queries to profile in compute stats

2018-11-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11977 )

Change subject: IMPALA-6924: Add child queries to profile in compute stats
..


Patch Set 2:

(1 comment)

This will be very useful. I had one high level question about the approach. 
Otherwise the code looked good.

http://gerrit.cloudera.org:8080/#/c/11977/2/be/src/service/child-query.cc
File be/src/service/child-query.cc:

http://gerrit.cloudera.org:8080/#/c/11977/2/be/src/service/child-query.cc@104
PS2, Line 104:   profile_->AddInfoString("Profile", get_profile_resp.profile);
Did you think about splicing the profiles together with AddChild(). It feels a 
bit wrong putting the text version of the profile in the machine-readable 
profile since it mixes the two formats and prevents tools from working properly.

Or is the idea that a profile analysis tool should be able to link up the 
queries itself if needed based on the query ids? And this is just for human 
convenience?

If we're going down that path maybe it would make sense to add machine-readable 
metadata to the thrift struct with the child query ids - it would be good to 
have a clear story about how a tool should use this.



--
To view, visit http://gerrit.cloudera.org:8080/11977
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5006c3b366d381eed4687e550cdfc463be3d1350
Gerrit-Change-Number: 11977
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 28 Nov 2018 00:34:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7870: Increase the timeout in test v1 catalog

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11997 )

Change subject: IMPALA-7870: Increase the timeout in test_v1_catalog
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3501/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b
Gerrit-Change-Number: 11997
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 28 Nov 2018 00:34:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11227 )

Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per 
file
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1447/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11227
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956
Gerrit-Change-Number: 11227
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 28 Nov 2018 00:17:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7870: Increase the timeout in test v1 catalog

2018-11-27 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11997 )

Change subject: IMPALA-7870: Increase the timeout in test_v1_catalog
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b
Gerrit-Change-Number: 11997
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 28 Nov 2018 00:06:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11967 )

Change subject: IMPALA-1048: show sinks in exec summary
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11967/6/shell/impala_client.py
File shell/impala_client.py:

http://gerrit.cloudera.org:8080/#/c/11967/6/shell/impala_client.py@211
PS6, Line 211:
flake8: E203 whitespace before ','



--
To view, visit http://gerrit.cloudera.org:8080/11967
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544
Gerrit-Change-Number: 11967
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 28 Nov 2018 00:04:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary

2018-11-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11967 )

Change subject: IMPALA-1048: show sinks in exec summary
..


Patch Set 6:

Ok I think this is ready for review.


--
To view, visit http://gerrit.cloudera.org:8080/11967
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544
Gerrit-Change-Number: 11967
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 28 Nov 2018 00:03:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary

2018-11-27 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Lars Volker, Philip Zeyliger,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11967

to look at the new patch set (#6).

Change subject: IMPALA-1048: show sinks in exec summary
..

IMPALA-1048: show sinks in exec summary

The exec summary now includes the total time taken and memory
consumed by the data sink at the root of each fragment. Previously
the exec summary could hide where time and memory went while
executing a query.

The high-level changes are:
* Generalising logic in the exec summary and runtime profile to
  handle data sinks, not just plan nodes, including adding richer
  metadata to runtime profile nodes.
* Threading through metadata about the data sinks, like names and
  estimates, so that it can appear in the exec summary.

The major potential downside is that the new timings reported for
data stream sender can overlap with the receiver's time and
potentially cause confusion.

[localhost:21000] default> select count(distinct l_comment) from 
tpch_parquet.lineitem; summary;
Query: select count(distinct l_comment) from tpch_parquet.lineitem
Query submitted at: 2018-11-20 16:47:03 (Coordinator: 
http://tarmstrong-box:25000)
Query progress can be monitored at: 
http://tarmstrong-box:25000/query_plan?query_id=f5464383a3bb6878:54b5252b
+---+
| count(distinct l_comment) |
+---+
| 4580667   |
+---+
Fetched 1 row(s) in 4.13s
+-++--+--+---++---+---+---+
| Operator| #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | 
Peak Mem  | Est. Peak Mem | Detail|
+-++--+--+---++---+---+---+
| F02:ROOT| 1  | 59.11ms  | 59.11ms  |   || 0 B 
  | 0 B   |   |
| 06:AGGREGATE| 1  | 274.24us | 274.24us | 1 | 1  | 
16.00 KB  | 10.00 MB  | FINALIZE  |
| 05:EXCHANGE | 1  | 75.16us  | 75.16us  | 3 | 1  | 
32.00 KB  | 16.00 KB  | UNPARTITIONED |
| F01:EXCHANGE SENDER | 3  | 119.53us | 146.28us |   || 
16.00 KB  | 0 B   |   |
| 02:AGGREGATE| 3  | 19.26ms  | 19.89ms  | 3 | 1  | 
16.00 KB  | 10.00 MB  |   |
| 04:AGGREGATE| 3  | 1.06s| 1.07s| 4.58M | 4.65M  | 
96.02 MB  | 62.63 MB  |   |
| 03:EXCHANGE | 3  | 243.91ms | 246.44ms | 5.01M | 4.65M  | 
464.00 KB | 10.12 MB  | HASH(l_comment)   |
| F00:EXCHANGE SENDER | 3  | 2.41s| 2.55s|   || 
337.53 KB | 0 B   |   |
| 01:AGGREGATE| 3  | 1.05s| 1.14s| 5.01M | 4.65M  | 
97.20 MB  | 121.17 MB | STREAMING |
| 00:SCAN HDFS| 3  | 37.88ms  | 41.28ms  | 6.00M | 6.00M  | 
27.88 MB  | 80.00 MB  | tpch_parquet.lineitem |
+-++--+--+---++---+---+---+

Testing:
Added a basic observability test.

Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544
---
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/plan-root-sink.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/summary-util.cc
M common/thrift/DataSinks.thrift
M common/thrift/ExecStats.thrift
M common/thrift/RuntimeProfile.thrift
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M shell/impala_client.py
M tests/beeswax/impala_beeswax.py
M tests/query_test/test_observability.py
36 files changed, 321 insertions(+), 122 deletions(-)


  git pull ssh://gerrit.cloudera

[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file

2018-11-27 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11227 )

Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per 
file
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11227/6/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java:

http://gerrit.cloudera.org:8080/#/c/11227/6/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@392
PS6, Line 392: OpType.GET_FILE_BLOCK_LOCATIONS.getSymbol()));
is this test working as a proper regression test now that you switched to the 
constants?



--
To view, visit http://gerrit.cloudera.org:8080/11227
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956
Gerrit-Change-Number: 11227
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 28 Nov 2018 00:00:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file

2018-11-27 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11227 )

Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per 
file
..


Patch Set 6: Code-Review+1

Reran the unit tests, stepped through the code and made sure that we followed 
the refresh path.

Changed the unit test to use HDFS-provided constants rather than string 
literals.

As far as I can tell, the fix works as intended. Would prefer to see this 
tested in a dynamic cluster to gain a bit more confidence.


-- 
To view, visit http://gerrit.cloudera.org:8080/11227
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956
Gerrit-Change-Number: 11227
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 27 Nov 2018 23:42:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file

2018-11-27 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded a new patch set (#6) to the change originally created 
by Todd Lipcon. ( http://gerrit.cloudera.org:8080/11227 )

Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per 
file
..

IMPALA-7047. Refreshing partitions should not make an RPC per file

The code to handle REFRESH of a single partition was incorrectly
ignoring the previously-known file descriptors. This meant that, instead
of only calling 'getFileBlockLocations' on the files that had changed
since the prior load, it was instead calling it on every file.

In addition to refresh of single partitions this also affected refresh
of unpartitioned tables (which is implemented as a refresh of its single
"default" partition).

This patch fixes the behavior by copying over the existing file
descriptor list into the re-created partition before refreshing it. A
new unit test uses FS statistics to verify the change. The new
assertions act as a regression test and fail if I comment out the fix.

Additionally, this fixes the case where the old partition had no files
to use the optimized 'listLocatedStatus' call. This is triggered when
'REFRESH' picks up a new partition from the HMS added by an external
system.

I also tested this by pointing my dev box at a remote filesystem that
was approximately 60ms away. The initial load of an unpartitioned table
with approximately 45000 files takes around 23 seconds in this setup.
Without the patch in place, REFRESH was taking upwards of 35 minutes (I
got tired and gave up at this point). Multiplying the 60ms round trip by
45000 files estimates 45 minutes. With the fix in place, REFRESH of the
same table took around 4.5 seconds.

Clearly, in typical setups where catalogd and HDFS are on a shared local
network, the gains won't be so dramatic. But, even with a 1ms round trip
(plausible when including fixed RPC overhead and potentially congested
datacenter networks) this would save 45 seconds on this example table
with 45000 files.

Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
2 files changed, 73 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/11227/6
--
To view, visit http://gerrit.cloudera.org:8080/11227
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956
Gerrit-Change-Number: 11227
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test

2018-11-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11993 )

Change subject: IMPALA-7895: Incorrect expected results for 
spillable-buffer-sizing.test
..


Patch Set 2:

Looks like test_show_create_table failed with a NPE


--
To view, visit http://gerrit.cloudera.org:8080/11993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a
Gerrit-Change-Number: 11993
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 27 Nov 2018 23:26:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6964: Track stats about column and page sizes in Parquet reader

2018-11-27 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11575 )

Change subject: IMPALA-6964: Track stats about column and page sizes in Parquet 
reader
..


Patch Set 12:

(6 comments)

I had a comment about the lock upgrading, and some nits, but I think it's 
getting there.

http://gerrit.cloudera.org:8080/#/c/11575/12/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/11575/12/be/src/exec/hdfs-scan-node-base.cc@904
PS12, Line 904:   bytes_read_per_col_lock_);
Please wrap the lock in a scope block to release it after the for() statement 
ends.


http://gerrit.cloudera.org:8080/#/c/11575/12/be/src/exec/hdfs-scan-node-base.cc@962
PS12, Line 962: bytes_read_per_col_guard_read_lock.unlock();
There's a race between unlocking and re-locking here. You can use 
boost:upgrade_lock to achieve the same effect atomically.


http://gerrit.cloudera.org:8080/#/c/11575/12/tests/infra/test_utils.py
File tests/infra/test_utils.py:

http://gerrit.cloudera.org:8080/#/c/11575/12/tests/infra/test_utils.py@36
PS12, Line 36: def test_get_bytes_summary_stats_counter():
Can you double-check that this actually gets executed? :)


http://gerrit.cloudera.org:8080/#/c/11575/12/tests/infra/test_utils.py@37
PS12, Line 37:
nit: remove this whitespace


http://gerrit.cloudera.org:8080/#/c/11575/12/tests/infra/test_utils.py@40
PS12, Line 40:   runtime_profile = "- ExampleCounter: (Avg: 8.00 KB (8192) ; " \
maybe pick an example that's not all the same values?


http://gerrit.cloudera.org:8080/#/c/11575/12/tests/util/parse_util.py
File tests/util/parse_util.py:

http://gerrit.cloudera.org:8080/#/c/11575/12/tests/util/parse_util.py@156
PS12, Line 156:   re.X)
I think re.VERBOSE is easier to understand



--
To view, visit http://gerrit.cloudera.org:8080/11575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I322f9b324b6828df28e5caf79529085c43d7c817
Gerrit-Change-Number: 11575
Gerrit-PatchSet: 12
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Tue, 27 Nov 2018 23:18:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7870: Increase the timeout in test v1 catalog

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11997 )

Change subject: IMPALA-7870: Increase the timeout in test_v1_catalog
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1446/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b
Gerrit-Change-Number: 11997
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 27 Nov 2018 23:12:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11851 )

Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3500/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2
Gerrit-Change-Number: 11851
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Tue, 27 Nov 2018 22:52:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11851 )

Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2
Gerrit-Change-Number: 11851
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Tue, 27 Nov 2018 22:52:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7870: Increase the timeout in test v1 catalog

2018-11-27 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11997


Change subject: IMPALA-7870: Increase the timeout in test_v1_catalog
..

IMPALA-7870: Increase the timeout in test_v1_catalog

TestAutomaticCatalogInvalidation.test_v1_catalog need to wait for a
predefined time for the invalidation to take effect. The test is flaky
recently because of it. This patch increates the timeout by 2.5x.

Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b
---
M tests/custom_cluster/test_automatic_invalidation.py
1 file changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/11997/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b
Gerrit-Change-Number: 11997
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 


[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11990 )

Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a 
non-running query
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3497/


--
To view, visit http://gerrit.cloudera.org:8080/11990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Gerrit-Change-Number: 11990
Gerrit-PatchSet: 3
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 27 Nov 2018 22:36:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11993 )

Change subject: IMPALA-7895: Incorrect expected results for 
spillable-buffer-sizing.test
..


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3496/


--
To view, visit http://gerrit.cloudera.org:8080/11993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a
Gerrit-Change-Number: 11993
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 27 Nov 2018 22:19:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7823: Clean up unused Java imports

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11987 )

Change subject: IMPALA-7823: Clean up unused Java imports
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1820a7f9e55dc168510987520949026169f445e
Gerrit-Change-Number: 11987
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 27 Nov 2018 21:11:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7823: Clean up unused Java imports

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11987 )

Change subject: IMPALA-7823: Clean up unused Java imports
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3499/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1820a7f9e55dc168510987520949026169f445e
Gerrit-Change-Number: 11987
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 27 Nov 2018 21:11:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7823: Clean up unused Java imports

2018-11-27 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11987 )

Change subject: IMPALA-7823: Clean up unused Java imports
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1820a7f9e55dc168510987520949026169f445e
Gerrit-Change-Number: 11987
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 27 Nov 2018 20:59:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2343: Add lifecycle timeline to plan nodes

2018-11-27 Thread Tim Armstrong (Code Review)
Hello Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11992

to look at the new patch set (#7).

Change subject: IMPALA-2343: Add lifecycle timeline to plan nodes
..

IMPALA-2343: Add lifecycle timeline to plan nodes

Track the time when various significant events in the lifecycle
of an ExecNode occur the timeline of fragment execution. The events
tracked are: 'Open Started', 'Open Finished', 'First Batch Fetched',
'First Batch Returned', 'Last Batch Returned', 'Closed'.

This uses the existing EventSequence infrastructure so that time will
correspond to the fragment instance lifecycle timelines. It's
implemented mostly using scoped objects that add the events when
entering or exiting Open() and GetNext().

These times are not set inside subplans because it would be mostly
redundant with the counters in the containing subplan node.

Testing:
Added a basic test to verify that the event sequence is present.

Perf:
Ran TPC-H 10 locally. There was no significant perf change.

Ran TPC-H Nested locally. There was no significant perf change.

Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653
---
M be/src/exec/aggregation-node.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/cardinality-check-node.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/empty-set-node.cc
M be/src/exec/empty-set-node.h
M be/src/exec/exchange-node.cc
A be/src/exec/exec-node-util.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/select-node.cc
M be/src/exec/singular-row-src-node.cc
M be/src/exec/singular-row-src-node.h
M be/src/exec/sort-node.cc
M be/src/exec/streaming-aggregation-node.cc
M be/src/exec/subplan-node.cc
M be/src/exec/topn-node.cc
M be/src/exec/union-node.cc
M be/src/exec/unnest-node.cc
M tests/query_test/test_observability.py
26 files changed, 229 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/11992/7
--
To view, visit http://gerrit.cloudera.org:8080/11992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653
Gerrit-Change-Number: 11992
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-2343: Add lifecycle timeline to plan nodes

2018-11-27 Thread Tim Armstrong (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11992

to look at the new patch set (#6).

Change subject: IMPALA-2343: Add lifecycle timeline to plan nodes
..

IMPALA-2343: Add lifecycle timeline to plan nodes

Track the time when Open() and GetNext() are first and last called,
to make it easier to piece together the timeline for fragment execution.

This uses the existing EventSequence infrastructure so that time will
correspond to the fragment instance lifecycle timelines.

These times are not set inside subplans because it would be mostly
redundant with the counters in the containing subplan node.

Testing:
Added a basic test to verify that the event sequence is present.

Perf:
Ran TPC-H 10 locally. There was no significant perf change.

Ran TPC-H Nested locally. There was no significant perf change.

Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653
---
M be/src/exec/aggregation-node.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/cardinality-check-node.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/empty-set-node.cc
M be/src/exec/empty-set-node.h
M be/src/exec/exchange-node.cc
A be/src/exec/exec-node-util.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/select-node.cc
M be/src/exec/singular-row-src-node.cc
M be/src/exec/singular-row-src-node.h
M be/src/exec/sort-node.cc
M be/src/exec/streaming-aggregation-node.cc
M be/src/exec/subplan-node.cc
M be/src/exec/topn-node.cc
M be/src/exec/union-node.cc
M be/src/exec/unnest-node.cc
M tests/query_test/test_observability.py
26 files changed, 229 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/11992/6
--
To view, visit http://gerrit.cloudera.org:8080/11992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653
Gerrit-Change-Number: 11992
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11851 )

Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1445/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2
Gerrit-Change-Number: 11851
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Tue, 27 Nov 2018 19:47:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file

2018-11-27 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11227 )

Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per 
file
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11227/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11227/4//COMMIT_MSG@12
PS4, Line 12: since the prior load, it was instead calling it on every file.
> General question. In HDFS, we can tell if a file was added or removed. Once
In terms of files "changing", we do have to worry about a file being replaced 
with the same name as a pre-existing file. I think that's why we compare mtimes.

In terms of balancing blocks, we detect on the backend if we did a remote read 
of a block, and if so, we issue a SQL warning which tells the user to issue an 
INVALIDATE METADATA for that table.

Personally I think this sucks -- what we should probably do is have the backend 
notify the catalogd that the block locations for that file need to be 
refreshed, and do so automatically.

That said, since we're eventually hoping to move more towards a fine-grained 
caching with TTL, any such incorrect locality info would fall out of the cache 
within a bounded period of time, so maybe not a big deal anyway.


http://gerrit.cloudera.org:8080/#/c/11227/4//COMMIT_MSG@28
PS4, Line 28: I also tested this by pointing my dev box at a remote filesystem 
that
it's interesitng that there was a "reverse logic" bug in my patch, but I did 
test it and saw the desired effects. We should make sure to circle back and 
re-run a similar test to make sure I saw what I thought I was seeing.


http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@568
PS4, Line 568: stats = refreshFileMetadata(partDir, 
Collections.singletonList(partition));
> Question about the implementation of refresh (that code is not in this patc
I think Java supports arrays up to 2B elements, but HDFS only supports really 
up to a few hundred million files. Even so, I think many other parts of Impala 
are likely to fall over way before this (probably at 100k files in a partition 
lots of things start to break).

I seem to recall that, in terms of the underlying HDFS RPC itself, it already 
splits it into smaller critical sections of the NS lock on the NN, and maybe 
even into multiple RPCs.

All that to say, I'm not too concerned about this point.


http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java:

http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@371
PS4, Line 371: assertEquals(0L, 
(long)opsCounts.getLong("op_get_file_block_locations"));
> Something is odd. This test did not fail for the "reverse polarity" bug tha
This part of the test is for "REFRESH" on the table level -- does that trigger 
the code path where there was the bug?


http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@379
PS4, Line 379: assertEquals(0L, 
(long)opsCounts.getLong("op_get_file_block_locations"));
this is the one that I'm surprised didn't fail with the bug. Perhaps we can add 
some log statements or use the debugger to step through the test and understand 
better what's happening.



--
To view, visit http://gerrit.cloudera.org:8080/11227
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956
Gerrit-Change-Number: 11227
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 27 Nov 2018 19:29:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�

2018-11-27 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11851 )

Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false
..


Patch Set 4: Code-Review+2

(2 comments)

carrying forward

http://gerrit.cloudera.org:8080/#/c/11851/3/tests/common/kudu_test_suite.py
File tests/common/kudu_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/11851/3/tests/common/kudu_test_suite.py@49
PS3, Line 49: raise NotImplementedError("Multi-master not supported yet")
> Nit: might be marginally nicer to raise a NotImplementedError
Done


http://gerrit.cloudera.org:8080/#/c/11851/3/tests/common/skip.py
File tests/common/skip.py:

http://gerrit.cloudera.org:8080/#/c/11851/3/tests/common/skip.py@106
PS3, Line 106: relies
> Nit: misspelling
Done



--
To view, visit http://gerrit.cloudera.org:8080/11851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2
Gerrit-Change-Number: 11851
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Tue, 27 Nov 2018 19:24:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�

2018-11-27 Thread Thomas Marshall (Code Review)
Hello Michael Brown, David Knupp, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11851

to look at the new patch set (#4).

Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false
..

IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false

Since IMPALA-6812, we've run many of our tests against Kudu at the
READ_AT_SNAPSHOT scan level, which ensures consistent results. This
scan level is only supported if Kudu is run with the flag
--use_hybrid_clock=true (which is the default).

This patch uses the Kudu master webui to detect when use_hybrid_clock
is false and skips these tests.

Follow up work will address allowing these tests to run regardless of
the value of the flag.

Testing:
- Ran a full exhaustive build with use_hybrid_clock=false set in the
  minicluster.

Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2
---
M bin/impala-config.sh
M tests/common/kudu_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_kudu.py
M tests/metadata/test_ddl.py
M tests/query_test/test_kudu.py
6 files changed, 52 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/11851/4
--
To view, visit http://gerrit.cloudera.org:8080/11851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2
Gerrit-Change-Number: 11851
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Thomas Marshall 


[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

2018-11-27 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11990 )

Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a 
non-running query
..


Patch Set 3:

Thanks.


--
To view, visit http://gerrit.cloudera.org:8080/11990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Gerrit-Change-Number: 11990
Gerrit-PatchSet: 3
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 27 Nov 2018 19:17:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5973: Provide query plan in JSON format

2018-11-27 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11974 )

Change subject: IMPALA-5973: Provide query plan in JSON format
..


Patch Set 3:

It's a little obscure, but, for running queries, we already have a JSON 
representation of a plan that can be queried using the Impala web server. It's 
interesting in that it's a representation not just of the plan, but also of 
some of the execution statistics of said plan. The code for it is here:

// Helper for PlanToJson(), processes a single list of plan nodes which are the
// DFS-flattened representation of a single plan fragment. Called recursively, 
the
// iterator parameter is updated in place so that when a recursive call 
returns, the
// caller is pointing at the next of its children.
void PlanToJsonHelper(const map& summaries,
const vector& nodes,

I think it'd be pretty reasonable to always spit this JSON output into the 
profile at query completion.


--
To view, visit http://gerrit.cloudera.org:8080/11974
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f07e2223be58b7323e1ea2fa44b62626cdf4944
Gerrit-Change-Number: 11974
Gerrit-PatchSet: 3
Gerrit-Owner: Pranay Singh (320)
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 27 Nov 2018 19:16:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

2018-11-27 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11990 )

Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a 
non-running query
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11990/3/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/11990/3/shell/impala_shell.py@1736
PS3, Line 1736: print_to_stderr('^C')
> I've not had the opportunity to run this, but does this diff mean that the
No, original intro = '\n' never works anyway since at L1765 in the finally 
block we always set it to intro = ''



--
To view, visit http://gerrit.cloudera.org:8080/11990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Gerrit-Change-Number: 11990
Gerrit-PatchSet: 3
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 27 Nov 2018 19:01:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

2018-11-27 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11990 )

Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a 
non-running query
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11990/3/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/11990/3/shell/impala_shell.py@1736
PS3, Line 1736: print_to_stderr('^C')
I've not had the opportunity to run this, but does this diff mean that the 
"intro" is re-printed at every Ctrl-C?



--
To view, visit http://gerrit.cloudera.org:8080/11990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Gerrit-Change-Number: 11990
Gerrit-PatchSet: 3
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 27 Nov 2018 18:56:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11990 )

Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a 
non-running query
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3497/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Gerrit-Change-Number: 11990
Gerrit-PatchSet: 3
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Tue, 27 Nov 2018 18:40:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11990 )

Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a 
non-running query
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Gerrit-Change-Number: 11990
Gerrit-PatchSet: 3
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Tue, 27 Nov 2018 18:40:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11993 )

Change subject: IMPALA-7895: Incorrect expected results for 
spillable-buffer-sizing.test
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a
Gerrit-Change-Number: 11993
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 27 Nov 2018 18:19:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11993 )

Change subject: IMPALA-7895: Incorrect expected results for 
spillable-buffer-sizing.test
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3496/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a
Gerrit-Change-Number: 11993
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 27 Nov 2018 18:19:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test

2018-11-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11993 )

Change subject: IMPALA-7895: Incorrect expected results for 
spillable-buffer-sizing.test
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a
Gerrit-Change-Number: 11993
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 27 Nov 2018 18:19:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7867 (Part 2): ArrayList cleanup in analyzer

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11995 )

Change subject: IMPALA-7867 (Part 2): ArrayList cleanup in analyzer
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1444/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c0b5f40a0504fc2d324055bd2a962e35f8e744a
Gerrit-Change-Number: 11995
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 27 Nov 2018 17:54:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7867 (Part 2): ArrayList cleanup in analyzer

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11995 )

Change subject: IMPALA-7867 (Part 2): ArrayList cleanup in analyzer
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11995/1/fe/src/main/java/org/apache/impala/analysis/CreateUdfStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateUdfStmt.java:

http://gerrit.cloudera.org:8080/#/c/11995/1/fe/src/main/java/org/apache/impala/analysis/CreateUdfStmt.java@133
PS1, Line 133:   protected Function createFunction(FunctionName fnName, 
List argTypes, Type retType,
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/11995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c0b5f40a0504fc2d324055bd2a962e35f8e744a
Gerrit-Change-Number: 11995
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 27 Nov 2018 17:17:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7867 (Part 2): ArrayList cleanup in analyzer

2018-11-27 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11995


Change subject: IMPALA-7867 (Part 2): ArrayList cleanup in analyzer
..

IMPALA-7867 (Part 2): ArrayList cleanup in analyzer

Follow-on to prior patch for this JIRA. Replaces all use of ArrayList in
variable and method declarations with the interface List. Retains the
use of ArrayList for list implementations (i.e. "new" statements.)

Also replaces "List.newArrayList()" with the more modern
"new ArrayList<>()". Cleaned up a few Map and Set declarations and added
a few missing @Override annotations found during this process.

Tests: This is purely a source-level change; no functional change. Ran
all client unit tests.

Change-Id: I7c0b5f40a0504fc2d324055bd2a962e35f8e744a
---
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticInfo.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java
M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateUdaStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateUdfStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FromClause.java
M fe/src/main/java/org/apache/impala/analysis/FunctionArgs.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/PlanHint.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java
M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
35 files changed, 253 insertions(+), 258 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/11995/1
--
To view, visit http://gerrit.cloudera.org:8080/11995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7c0b5f40a0504fc2d324055bd2a962e35f8e744a
Gerrit-Change-Number: 11995
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet

2018-11-27 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11984 )

Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from 
Parquet
..


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/11984/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11984/2//COMMIT_MSG@12
PS2, Line 12: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC
> Maybe you could mention that it stores the number of nanoseconds since the
Done


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@479
PS2, Line 479: LimitedRange
> Dows this "LimitedRange" part of the name add extra value? Am I right that
There is already a function with nanosecond precision, and it can create the 
full range of timestamps:
FromUnixTimeNanos(time_t unix_time, int64_t nanos,
  const Timezone& local_tz);

That function could have been used here (after splitting nanoseconds to seconds 
and subseconds), but I think that it would slower than the specialized solution.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@507
PS2, Line 507:   /// conversion is necessary. Returns true if the schema 
describes a valid timestamp
> Could you please comment what the parameters are for, pls?
Done


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@509
PS2, Line 509: ProcessSchema
> I feel that something like GetTimestampInfoFromSchema might be a bit more s
Done


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc
File be/src/exec/parquet-common.cc:

http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@102
PS2, Line 102:   // Timestamps can be only encoded as INT64 or INT96.
> nit: You might want to adjust this comment to cover the case when this func
I am not sure if I have done what you meant. This function should be only 
called for TIMESTAMP SQL columns, and should return false if the Parquet column 
is not timestamp.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@106
PS2, Line 106: Newer solution,
> nit: No need for this part. In addition, it's not guaranteed that reading t
Some explanation about "new": LogicalType is often called "new logical type", 
as converted type was also some kind of logical type. I agree that in 2 years 
this will not be meaningful.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@122
PS2, Line 122: Older solution,
> No need for "Older solution" part of the comment.
Done


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@123
PS2, Line 123: are/were never
> nit: I think we shouldn't refer to what happened before this change as we s
This code has to handle files written by older versions of Impala/some other 
tools we try to be compatible with, so I think that referring to he past makes 
sense here.

Not specifying whether utc->local conversion is necessary was a gap in Parquet 
specification and was interpreted by different software differently. This will 
not be an issue with the new logical type, but we have to make assumptions with 
older files.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@148
PS2, Line 148:   if (e.type == parquet::Type::INT96 && 
convert_int96_timestamps) needs_conversion = true;
> Shouldn't this line be also included in ProcessSchema? Since you introduced
I thought that doing it this way is the lesser evil - convert_int96_timestamps 
is not readily available in validation logic (it depends on a flag and the 
writer of the file).


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-metadata-utils.cc
File be/src/exec/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-metadata-utils.cc@71
PS2, Line 71:   return ParquetTimestampDecoder::ProcessSchema(element, 
precision, needs_conversion);
> Just thinking out loud here: This ProcessSchema() is now used for 1) checki
Before this change the two functions were separate, but with the addition of 
logical types the code duplication became too much for me and I tried to merge 
the bulk of the two functions in ProcessSchema.

I think that the ideal way to do this would be to do the whole processing only 
once per column chunk (currently it is validated once, then processed again for 
stat filtering if needed, then once again to initialize the column reader). 
Changing this to use the same processed metadata would be a non-trivial 
refactor, as most of the code assumes everything needed for decoding can be 
simply deduced from the physical + SQL types, so handling of more complex types 
is a bit hacky.


http://gerrit.cloudera.org:8080/#/c/11984/2/fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java
File fe/src/main/java/org/apache

[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet

2018-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11984 )

Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from 
Parquet
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1443/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f
Gerrit-Change-Number: 11984
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 27 Nov 2018 16:20:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet

2018-11-27 Thread Csaba Ringhofer (Code Review)
Hello Gabor Kaszab, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11984

to look at the new patch set (#3).

Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from 
Parquet
..

IMPALA-7853: Add support to read int64 NANO timestamps from Parquet

PARQUET-1387 added int64 timestamps with nanosecond precision that
stores timestamps as nanoseconds since the Unix epoch.
As 64 bits are not enough to represent the whole 1400.. range
of Impala timestamps, this new type works with a limited range:
1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC

The benefit of the reduced range is that no validation is necessary
during scanning, as every possible 64 bit value represents a valid
timestamp in Impala. This may mean that this has the potential be
the fastest way to store timestamps in Impala + Parquet.

Another way NANO differs from MICRO and MILLI is that NANO can
be only described with new logical types in Parquet, it has no
converted type equivalent. This made implementing CREATE TABLE
LIKE PARQUET less trivial than it was for MICRO/MILLI: the type
conversion logic in ParquetHelper.java had to be rewritten to
use LogicalTypeAnnotation instead of ConvertedType.

The changes on Java side also made bumping CDH_BUILD_NUMBER
necessary.

Testing:
- added a new testfile with int64 nano timestamps
- ran core tests

Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f
---
M be/src/exec/parquet-common.cc
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M bin/impala-config.sh
M common/thrift/parquet.thrift
M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java
M testdata/data/README
A testdata/data/int64_timestamps_nano.parquet
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test
M tests/query_test/test_scanners.py
13 files changed, 174 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/11984/3
--
To view, visit http://gerrit.cloudera.org:8080/11984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f
Gerrit-Change-Number: 11984
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet

2018-11-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11984 )

Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from 
Parquet
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11984/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11984/2//COMMIT_MSG@12
PS2, Line 12: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC
Maybe you could mention that it stores the number of nanoseconds since the unix 
epoch. That's why it represents this strange time-range.


http://gerrit.cloudera.org:8080/#/c/11984/2/fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java
File fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java:

http://gerrit.cloudera.org:8080/#/c/11984/2/fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java@248
PS2, Line 248: LogicalTypeAnnotation logicalType = 
parquetType.getLogicalTypeAnnotation();
What happens when there is no "logical type" in a Parquet file, but only 
"converted type"?



--
To view, visit http://gerrit.cloudera.org:8080/11984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f
Gerrit-Change-Number: 11984
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 27 Nov 2018 15:10:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query

2018-11-27 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11990 )

Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a 
non-running query
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d
Gerrit-Change-Number: 11990
Gerrit-PatchSet: 2
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Tue, 27 Nov 2018 14:32:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet

2018-11-27 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11984 )

Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from 
Parquet
..


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@479
PS2, Line 479: LimitedRange
Dows this "LimitedRange" part of the name add extra value? Am I right that we 
don't have any other kind of "Timestamp with Nano precision"? Then we can refer 
to this implementation simply as UnixTimeNanos in my opinion.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@507
PS2, Line 507:   /// conversion is necessary. Returns true if the schema 
describes a valid timestamp
Could you please comment what the parameters are for, pls?


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@509
PS2, Line 509: ProcessSchema
I feel that something like GetTimestampInfoFromSchema might be a bit more 
self-descriptive. There might be even better names :)


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc
File be/src/exec/parquet-common.cc:

http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@102
PS2, Line 102:   // Timestamps can be only encoded as INT64 or INT96.
nit: You might want to adjust this comment to cover the case when this function 
is called on non-timestamp related types e.g. string.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@106
PS2, Line 106: Newer solution,
nit: No need for this part. In addition, it's not guaranteed that reading this 
comment 2 years from now would reveal what "Newer solution" is referred to.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@122
PS2, Line 122: Older solution,
No need for "Older solution" part of the comment.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@123
PS2, Line 123: are/were never
nit: I think we shouldn't refer to what happened before this change as we 
should only talk about the current code not the previous code.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@148
PS2, Line 148:   if (e.type == parquet::Type::INT96 && 
convert_int96_timestamps) needs_conversion = true;
Shouldn't this line be also included in ProcessSchema? Since you introduced a 
function to set 'needs_conversion' I think it should tackle all the cases not 
just a subset of them.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-metadata-utils.cc
File be/src/exec/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-metadata-utils.cc@71
PS2, Line 71:   return ParquetTimestampDecoder::ProcessSchema(element, 
precision, needs_conversion);
Just thinking out loud here: This ProcessSchema() is now used for 1) checking 
if the timestamp in the referred element is a valid timestamp and 2) to gather 
information such as 'populate' and 'precision'. Have you considered splitting 
these 2 functionalities?



--
To view, visit http://gerrit.cloudera.org:8080/11984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f
Gerrit-Change-Number: 11984
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 27 Nov 2018 14:17:17 +
Gerrit-HasComments: Yes