[Impala-ASF-CR] IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C

2017-11-14 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8549 )

Change subject: IMPALA-1144: Fix exception when cancelling query in 
Impala-shell with CTRL-C
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG@25
PS2, Line 25: Tested manually starting with a select from a random table that 
has few
> I haven't figured out a way to test this automatically.
So, um, if you want a query that returns a lot of rows:

 with v as (values (1 as x), (2), (3), (4)) select * from v, v v2, v v3, v v4, 
v v5, v v6, v v7, v v8, v v9

will do it. (Customize as necessary.)

In my very quick experiment, that query is long enough that Ctrl-C worked (but 
got the query handle error), and, um, takes 20 seconds to go to /dev/null, so 
you've got time to try the cancellation.

$impala-shell.sh --query ' with v as (values (1 as x), (2), (3), (4)) select * 
from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, v v9' -o /dev/null
Starting Impala Shell without Kerberos authentication
Connected to localhost:21000
Server version: impalad version 2.11.0-SNAPSHOT DEBUG (build 
454a4aefa426725ab8806cfc814fb8e9df3ab8e5)
Query: with v as (values (1 as x), (2), (3), (4)) select * from v, v v2, v v3, 
v v4, v v5, v v6, v v7, v v8, v v9
Fetched 262144 row(s) in 21.50s

$impala-shell.sh --query ' with v as (values (1 as x), (2), (3), (4)) select * 
from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, v v9' -o /dev/null
Starting Impala Shell without Kerberos authentication
Connected to localhost:21000
Server version: impalad version 2.11.0-SNAPSHOT DEBUG (build 
454a4aefa426725ab8806cfc814fb8e9df3ab8e5)
Query: with v as (values (1 as x), (2), (3), (4)) select * from v, v v2, v v3, 
v v4, v v5, v v6, v v7, v v8, v v9
^C Cancelling Query
ERROR: Invalid query handle: 864b2a5cd241853b:c45bb8fe

Could not execute command: with v as (values (1 as x), (2), (3), (4)) select * 
from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, v v9


Note that I didn't get "Invalid query handle" every time; depends a little bit 
on the timing of Ctrl-C.



--
To view, visit http://gerrit.cloudera.org:8080/8549
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2
Gerrit-Change-Number: 8549
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 15 Nov 2017 05:03:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1144: Fix exception when CTRL+C on running query in Impala-shell

2017-11-14 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8549 )

Change subject: IMPALA-1144: Fix exception when CTRL+C on running query in 
Impala-shell
..


Patch Set 2:

(5 comments)

The Python looks fine to me (after David's semicolon comment). I think it'd be 
good to try to add a regular test for cancellation in the shell.

http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-1144: Fix exception when CTRL+C on running query in 
Impala-shell
Perhaps: Fix exception when cancelling query in impala-shell with Ctrl-C

"when CTRL+C" reads oddly to me.


http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG@11
PS2, Line 11: 'Invalid query handle'.
nit: reformat this as one paragraph?


http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG@22
PS2, Line 22: was issued automatically. This happened to historical reasons but 
not needed
nit: "but is not needed" (insert "is")


http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG@25
PS2, Line 25: Tested manually starting with a select from a random table that 
has few
Can we write a normal test for this? I don't know if it's true, but "select 
sleep(6) from functional.whatever" may actually work with cancellation. Or 
it may not: I haven't tried it!


http://gerrit.cloudera.org:8080/#/c/8549/2/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/8549/2/shell/impala_shell.py@485
PS2, Line 485: new_imp_client = self._new_impala_client()
does new_imp_client need to be closed? or old_imp_client?



--
To view, visit http://gerrit.cloudera.org:8080/8549
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2
Gerrit-Change-Number: 8549
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 15 Nov 2017 00:52:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects

2017-11-14 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8368 )

Change subject: IMPALA-2235: Fix current db when shell auto-reconnects
..


Patch Set 5: Code-Review+1

Thanks. Tim: I'm good with this; you'll have to +2.


--
To view, visit http://gerrit.cloudera.org:8080/8368
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
Gerrit-Change-Number: 8368
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 15 Nov 2017 00:12:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI.

2017-11-14 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8529 )

Change subject: [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web 
UI.
..


Patch Set 1:

(3 comments)

I took a very brief look. I was worried when you said "lightweight framework", 
but it turns out you're using codahale/yammer metrics, which is the right thing 
to do.

You know this, but TopNCache doesn't have a unit test.

I didn't look at all of this.

http://gerrit.cloudera.org:8080/#/c/8529/1/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

http://gerrit.cloudera.org:8080/#/c/8529/1/common/thrift/JniCatalog.thrift@602
PS1, Line 602:   1: required list large_tables
Is the idea that len(large_tables)==len(memory_estimates), and likewise 
len(frequent_table)=len(num_metadata_operations)? I think it'd be more honest 
for this to be "list large_tables", with LargeTable being a struct 
that has a tablename and details about that table that you want to share.


http://gerrit.cloudera.org:8080/#/c/8529/1/fe/pom.xml
File fe/pom.xml:

http://gerrit.cloudera.org:8080/#/c/8529/1/fe/pom.xml@365
PS1, Line 365:   metrics-core
Thanks. This is the right thing to use in my experience.


http://gerrit.cloudera.org:8080/#/c/8529/1/fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java
File fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java:

http://gerrit.cloudera.org:8080/#/c/8529/1/fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java@36
PS1, Line 36:   // TODO: Consider making it a configurable parameter.
A somewhat cheap way to do this is to use:

Integer.getInteger("org.apache.impala.catalog.CatalogUsageMonitor.NUM_TABLES_TRACKED",
 25)

here.

That gets the number from system properties. It's pretty weak configuration, 
but I've used it to nice effect for constants that really should be fine as is.



--
To view, visit http://gerrit.cloudera.org:8080/8529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e
Gerrit-Change-Number: 8529
Gerrit-PatchSet: 1
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 14 Nov 2017 20:00:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process

2017-11-14 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() 
should not fork a process
..


Patch Set 2:

Based on my reading of "man readdir", it's not threadsafe. I think the only 
usage here is if a user hits "http://impalad:.../; to see the web UI. It's 
possible that there's a lock somewhere preventing concurrent use, but given 
that this is already reasonably expensive, I'd recommend using the reentrant 
readdir_r instead.

I looked around, and it looks like C++17 and boost have filesystem 
abstractions, but I think readdir() is simple enough.


--
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 14 Nov 2017 19:09:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects

2017-11-14 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8368 )

Change subject: IMPALA-2235: Fix current db when shell auto-reconnects
..


Patch Set 4: Code-Review+1

(1 comment)

Thanks for the commit message. Please see my comment about the sleep you added. 
If you think it won't work, I'm fine with this as is.

http://gerrit.cloudera.org:8080/#/c/8368/4/tests/custom_cluster/test_shell_interactive_reconnect.py
File tests/custom_cluster/test_shell_interactive_reconnect.py:

http://gerrit.cloudera.org:8080/#/c/8368/4/tests/custom_cluster/test_shell_interactive_reconnect.py@60
PS4, Line 60: time.sleep(1)
Would "p.send_cmd("show tables"); p.get_result()" make this stable, without 
doing a sleep?

I always worry a bit about sleeps in tests, as they have a tendency to show up 
at annoying times as problems.



--
To view, visit http://gerrit.cloudera.org:8080/8368
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
Gerrit-Change-Number: 8368
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 14 Nov 2017 16:58:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Pin gen build version's git handling to typical git dir.

2017-11-13 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8500 )

Change subject: Pin gen_build_version's git handling to typical git dir.
..


Patch Set 1:

Ping?

Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/8500
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd
Gerrit-Change-Number: 8500
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Mon, 13 Nov 2017 21:03:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects

2017-11-13 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8368 )

Change subject: IMPALA-2235: Fix current db when shell auto-reconnects
..


Patch Set 3: Code-Review+1

(3 comments)

Thanks for the updates. This looks fine to me, with 2 minor issues:

* Please re-work the first line of the commit message; it doesn't make sense to 
me. (Or maybe I'm missing something, in which case just let me know.)

* If you're using NamedTemporaryFile to just get a filename, I suggested a 
shorter solution. There are two cases in this diff where you could take 
advantage of that.

I'm fine with both of those changes going in without further review; they're 
very minor.

http://gerrit.cloudera.org:8080/#/c/8368/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8368/3//COMMIT_MSG@9
PS3, Line 9: When precmd tested the connection it didn't validate that if we are
This sentence didn't parse for me. Specifically, I'm not sure what "that" in 
"it didn't validate that" is referring to.


http://gerrit.cloudera.org:8080/#/c/8368/3/tests/custom_cluster/test_shell_interactive_reconnect.py
File tests/custom_cluster/test_shell_interactive_reconnect.py:

http://gerrit.cloudera.org:8080/#/c/8368/3/tests/custom_cluster/test_shell_interactive_reconnect.py@36
PS3, Line 36: tempfile = NamedTemporaryFile(delete=False)
: tempfile.close()
: cls.tempfile_name = tempfile.name
Minor: I think this is just:

cls.tempfile_name = tempfile.mktemp()

Or did you use NamedTemporaryFile to create the file too, and that's important 
somehow?


http://gerrit.cloudera.org:8080/#/c/8368/3/tests/shell/util.py
File tests/shell/util.py:

http://gerrit.cloudera.org:8080/#/c/8368/3/tests/shell/util.py@121
PS3, Line 121:   """ Moves back history file from given filepath """
mention that this is a no-op if filepath doesn't exist.



--
To view, visit http://gerrit.cloudera.org:8080/8368
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
Gerrit-Change-Number: 8368
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 13 Nov 2017 19:14:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs

2017-11-09 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8456 )

Change subject: IMPALA-6148: Specifying thirdparty deps as URLs
..


Patch Set 6:

Well, that's what I get for being clever with bash. I've removed the magical 
"unset" for loop and am now unsetting every URL explicitly.


--
To view, visit http://gerrit.cloudera.org:8080/8456
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
Gerrit-Change-Number: 8456
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 09 Nov 2017 18:18:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs

2017-11-09 Thread Philip Zeyliger (Code Review)
Hello David Knupp, Joe McDonnell, Zach Amsden, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8456

to look at the new patch set (#6).

Change subject: IMPALA-6148: Specifying thirdparty deps as URLs
..

IMPALA-6148: Specifying thirdparty deps as URLs

If the environment variable $IMPALA__URL is configured in
impala-config-branch.sh or impala-config-local, for a thirdparty
dependency, use that to download it instead of the s3://native-toolchain
bucket. This makes testing against arbitrary versions of the
dependencies easier.

I did a little bit of refactoring while here, creating a small class for
a Package to handle reading the environment variables. I also changed
bootstrap_toolchain.py to use Python logging, which cleans up the output
during the multi-threaded downloading.

I tested this by both with customized URLs and by running the regular
build (pre-review-test, without most of the slow test suites).

Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
2 files changed, 138 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/8456/6
--
To view, visit http://gerrit.cloudera.org:8080/8456
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
Gerrit-Change-Number: 8456
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] Remove unused/defunct Maven repositories.

2017-11-08 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8497 )

Change subject: Remove unused/defunct Maven repositories.
..


Patch Set 1:

https://jenkins.impala.io/view/Utility/job/pre-review-test/74/ ran happily with 
this change. The "-from-scratch" aspect of it probably suggests that it ran 
with a clean .m2 cache (though I haven't explicitly verified that).


--
To view, visit http://gerrit.cloudera.org:8080/8497
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79eb6c483561726c7cbaf86874001f1979128720
Gerrit-Change-Number: 8497
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 08 Nov 2017 23:00:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] Remove unused/defunct Maven repositories.

2017-11-08 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8497


Change subject: Remove unused/defunct Maven repositories.
..

Remove unused/defunct Maven repositories.

Removes three Maven repositories. davidtrott and codehaus both don't
exist any more, so they're not doing anyone any good. (We had previously
cleaned up Codehaus in IMPALA-5224, but a reference was resurrected.)
The libphonenumber repo was simply misconfigured: the library exists in
Maven central in the "normal" place, and a subdirectory repo is
unnecessary.

To test this, I ran "buildall" after removing ~/.m2/ on my machine.

Change-Id: I79eb6c483561726c7cbaf86874001f1979128720
---
M common/yarn-extras/pom.xml
M fe/pom.xml
M tests/test-hive-udfs/pom.xml
3 files changed, 0 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/8497/1
--
To view, visit http://gerrit.cloudera.org:8080/8497
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I79eb6c483561726c7cbaf86874001f1979128720
Gerrit-Change-Number: 8497
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] Expose $IMPALA MAVEN OPTIONS for configuring Maven.

2017-11-08 Thread Philip Zeyliger (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8496

to look at the new patch set (#3).

Change subject: Expose $IMPALA_MAVEN_OPTIONS for configuring Maven.
..

Expose $IMPALA_MAVEN_OPTIONS for configuring Maven.

With this commit, $IMPALA_MAVEN_OPTIONS is used by bin/mvn-quiet.sh
to configure Maven slightly. The default is no extra options.

This is handy for giving Maven a settings file with the "-s" flag, to
control, for example, repositories and their mirrors. In fact, I
considered exposing IMPALA_MAVEN_SETTINGS_FILE explicitly, but decided
that the generic option would be as good.

It's useful to customize how Maven works, especially
to provide a settings file with repository mirrors.

Change-Id: I2c62185476fd2388c7cda8884276b79a77370127
---
M bin/impala-config.sh
M bin/mvn-quiet.sh
M testdata/bin/generate-block-ids.sh
M testdata/bin/generate-load-nested.sh
M testdata/bin/split-hbase.sh
5 files changed, 14 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8496/3
--
To view, visit http://gerrit.cloudera.org:8080/8496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2c62185476fd2388c7cda8884276b79a77370127
Gerrit-Change-Number: 8496
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6084: Avoid using of global namespace for llvm

2017-11-08 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8489 )

Change subject: IMPALA-6084: Avoid using of global namespace for llvm
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8489/3/be/src/codegen/instruction-counter.cc
File be/src/codegen/instruction-counter.cc:

http://gerrit.cloudera.org:8080/#/c/8489/3/be/src/codegen/instruction-counter.cc@28
PS3, Line 28: const char* InstructionCounter::TOTAL_INSTS = "Total 
llvm::Instructions";
This was probably not intentional?



--
To view, visit http://gerrit.cloudera.org:8080/8489
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7098baff5f746bcc2965583ca99f6188997b733a
Gerrit-Change-Number: 8489
Gerrit-PatchSet: 3
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 08 Nov 2017 22:41:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs

2017-11-08 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8456 )

Change subject: IMPALA-6148: Specifying thirdparty deps as URLs
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py@112
PS2, Line 112: if re.search(k, release):
> This is probably an obtuse question (you can count on me for those): for an
The 57 times is why we're adding the cache. It was spamming the log, mostly. 
(It also saves a second, which is sort of nice on a warm build.)

But release can be passed in as an argument to get_platform_release_label(), 
which is done if Kudu isn't supported; line 240.



--
To view, visit http://gerrit.cloudera.org:8080/8456
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
Gerrit-Change-Number: 8456
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 08 Nov 2017 21:09:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-08 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8447 )

Change subject: IMPALA-2181: Add query option levels for display
..


Patch Set 9:

(1 comment)

I'm still a little weirded out that we want "SET" to behave differently in the 
impala-shell and in HS2. Tim/Dan: you ok with it?

http://gerrit.cloudera.org:8080/#/c/8447/9/common/thrift/beeswax.thrift
File common/thrift/beeswax.thrift:

http://gerrit.cloudera.org:8080/#/c/8447/9/common/thrift/beeswax.thrift@103
PS9, Line 103:   4: optional byte level
Why is this a byte? Shouldn't this be an enum?



--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 08 Nov 2017 20:56:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Pin gen build version's git handling to typical git dir.

2017-11-08 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8500


Change subject: Pin gen_build_version's git handling to typical git dir.
..

Pin gen_build_version's git handling to typical git dir.

gen_build_version.py now specifies --git-dir explicitly, to avoid
fetching the revision of a containing git directory that's not an Impala
checkout.

This came up when building Impala without a git directory, but within a
build system that happens to have a git directory higher up in the
directory tree.

I tested this by running the script manually and observing
it works identically in the normal case.

Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd
---
M bin/gen_build_version.py
1 file changed, 5 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8500/1
--
To view, visit http://gerrit.cloudera.org:8080/8500
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd
Gerrit-Change-Number: 8500
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs

2017-11-08 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8456 )

Change subject: IMPALA-6148: Specifying thirdparty deps as URLs
..


Patch Set 4:

(6 comments)

Thanks for the reviews!

I think I addressed/responded all the comments.

http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py@36
PS2, Line 36: # python bootstrap_toolchain.py
> Not a blocker, but would it make sense to find/replace print -> logging thr
Done


http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py@82
PS2, Line 82:   if not self.version:
> Nit: Probably a bit pedantic, but catching a specific exception only to rai
I fixed this by getting rid of catching KeyError entirely.


http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py@112
PS2, Line 112: if re.search(k, release):
> Also, would it be appropriate to cache the value as an attribute of Package
You can't trivially cache v because it depends on the argument 'release'.

I'm happy to revert this function to its previous incarnation.

The overall logic here is complicated enough that I don't want to do a further 
refactor of moving the URL-creation stuff into Package in the same commit.


http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py@373
PS2, Line 373:
> Nit: trailing white space
Done


http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py@399
PS2, Line 399:   logging.basicConfig(level=logging.INFO,
> Nice.
Done


http://gerrit.cloudera.org:8080/#/c/8456/3/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8456/3/bin/impala-config.sh@138
PS3, Line 138: # Download URLs for toolchain dependencies can be overridden by
 : # IMPALA__URLs in impala-config-*.sh. We unset them 
here first:
 : for var in "${!IMPALA_@}"; do
 :   if [[ "$var" == IMPALA_*_URL ]]; then
 : unset $var
 :   fi
 : done
> If I understand this correctly, setting these in the environment won't work
I updated commit message and some comments to reflect this.

I think it's appropriate for IMPALA_HIVE_URL and IMPALA_HIVE_VERSION to have 
the same lifecycle, which in this case is impala-config*.sh.



--
To view, visit http://gerrit.cloudera.org:8080/8456
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
Gerrit-Change-Number: 8456
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 08 Nov 2017 18:10:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Expose $IMPALA MAVEN OPTIONS for configuring Maven.

2017-11-08 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8496


Change subject: Expose $IMPALA_MAVEN_OPTIONS for configuring Maven.
..

Expose $IMPALA_MAVEN_OPTIONS for configuring Maven.

With this commit, $IMPALA_MAVEN_OPTIONS is used by bin/mvn-quiet.sh
to configure Maven slightly. The default is no extra options.

This is handy for giving Maven a settings file with the "-s" flag, to
control, for example, repositories and their mirrors. In fact, I
considered exposing IMPALA_MAVEN_SETTINGS_FILE explicitly, but decided
that the generic option would be as good.

It's useful to customize how Maven works, especially
to provide a settings file with repository mirrors.

Change-Id: I2c62185476fd2388c7cda8884276b79a77370127
---
M bin/impala-config.sh
M bin/mvn-quiet.sh
2 files changed, 8 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8496/1
--
To view, visit http://gerrit.cloudera.org:8080/8496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2c62185476fd2388c7cda8884276b79a77370127
Gerrit-Change-Number: 8496
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs

2017-11-08 Thread Philip Zeyliger (Code Review)
Hello David Knupp, Joe McDonnell, Zach Amsden,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8456

to look at the new patch set (#4).

Change subject: IMPALA-6148: Specifying thirdparty deps as URLs
..

IMPALA-6148: Specifying thirdparty deps as URLs

If the environment variable $IMPALA__URL is configured in
impala-config-branch.sh or impala-config-local, for a thirdparty
dependency, use that to download it instead of the s3://native-toolchain
bucket. This makes testing against arbitrary versions of the
dependencies easier.

I did a little bit of refactoring while here, creating a small class for
a Package to handle reading the environment variables. I also changed
bootstrap_toolchain.py to use Python logging, which cleans up the output
during the multi-threaded downloading.

I tested this by both with customized URLs and by running the regular
build (pre-review-test, without most of the slow test suites).

Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
2 files changed, 101 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/8456/4
--
To view, visit http://gerrit.cloudera.org:8080/8456
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
Gerrit-Change-Number: 8456
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs

2017-11-06 Thread Philip Zeyliger (Code Review)
Hello David Knupp, Joe McDonnell, Zach Amsden,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8456

to look at the new patch set (#3).

Change subject: IMPALA-6148: Specifying thirdparty deps as URLs
..

IMPALA-6148: Specifying thirdparty deps as URLs

If the environment variable $IMPALA__URL is configured for
a thirdparty dependency, use that to download it instead of
the s3://native-toolchain bucket. This makes testing against
arbitrary versions of the dependencies easier.

I did a little bit of refactoring while here, creating a small class for
a Package to handle reading the environment variables. I also changed
bootstrap_toolchain.py to use Python logging, which cleans up the output
during the multi-threaded downloading.

I tested this by both with customized URLs and by running the regular
build (pre-review-test, without most of the slow test suites).

Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
2 files changed, 85 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/8456/3
--
To view, visit http://gerrit.cloudera.org:8080/8456
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
Gerrit-Change-Number: 8456
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8459 )

Change subject: Fix errant, newline-including log directory.
..


Patch Set 1:

Looking through 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/574/artifact/Impala/logs_static/logs/,
 I think the following is the relevant issue.

I'm hesitant to believe that a change to bin/clean.sh could have caused this.

F1103 20:52:33.350702 67042 scalar-expr-evaluator.cc:70] Check failed: 
!initialized_ || closed_


   I1103 20:52:33.415714 67008 status.cc:124] ImpalaRuntimeException: 
Unable to find class.   


   CAUSED BY: ClassNotFoundException: org.apache.impala.TestUdf 



 @  0x15a0bef  impala::Status::Status() 



   @  0x19a42c8  impala::JniUtil::GetJniExceptionMsg()  



 @  0x1d28b85  impala::HiveUdfCall::OpenEvaluator() 



   @  0x1d2b76d  impala::ScalarExpr::OpenEvaluator()



 @  0x1d6f55d  impala::ScalarFnCall::OpenEvaluator()



   @  0x1d31d94  impala::ScalarExprEvaluator::Clone()   



 @  0x1d31fa6  impala::ScalarExprEvaluator::Clone() 


   
@  0x1ad088c  impala::HdfsScanner::Open()   


  @ 
 0x1b10f80  impala::HdfsTextScanner::Open() 


@   
   0x1ab4fdf  impala::HdfsScanNodeBase::CreateAndOpenScanner()  


  @ 
 0x1aa9f99  impala::HdfsScanNode::ProcessSplit()


@  

[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8459


Change subject: Fix errant, newline-including log directory.
..

Fix errant, newline-including log directory.

We've seen cases where a directory named "cluster\n  " sneaks
into the logs directory. The intention is that $IMPALA_ALL_LOGS_DIRS
is a space-separated list, but clean.sh (as opposed to buildall.sh)
treats it as a single argument.

I tested this manually:

Before:

  $ls logs/
  be_tests/  custom_cluster_tests/  data_loading/  ee_tests/  fe_tests/

Bad command:

  $mkdir -p "${IMPALA_ALL_LOGS_DIRS}"

After:

  $ls logs/
  be_tests/  cluster?  /  custom_cluster_tests/  data_loading/  ee_tests/
  fe_tests/

Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
---
M bin/clean.sh
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/8459/1
--
To view, visit http://gerrit.cloudera.org:8080/8459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
Gerrit-Change-Number: 8459
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs

2017-11-03 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8456


Change subject: IMPALA-6148: Specifying thirdparty deps as URLs
..

IMPALA-6148: Specifying thirdparty deps as URLs

If the environment variable $IMPALA__URL is configured for
a thirdparty dependency, use that to download it instead of
the s3://native-toolchain bucket. This makes testing against
arbitrary versions of the dependencies easier.

I did a little bit of refactoring while here, creating a small class for
a Package to handle reading the environment variables. I also changed
bootstrap_toolchain.py to use Python logging, which cleans up the output
during the multi-threaded downloading.

I tested this by both with customized URLs and by running the regular
build (pre-review-test, without most of the slow test suites).

Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
---
M bin/bootstrap_toolchain.py
1 file changed, 77 insertions(+), 44 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/8456/2
--
To view, visit http://gerrit.cloudera.org:8080/8456
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
Gerrit-Change-Number: 8456
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable

2017-11-03 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8428 )

Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable
..


Patch Set 1:

(1 comment)

> Should I create tests for condition-variable.h? We don't seem to have unit 
> tests yet.

I think ideally we'd have tests for helper code like this. I'm ok if we think 
it's not worth it here given the low complexity.

http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h
File be/src/util/condition-variable.h:

http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h@38
PS1, Line 38:   void Wait(boost::unique_lock& lock) {
> About the 'struct' keyword: it is something that C requires, but in C++ it
Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/8428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9
Gerrit-Change-Number: 8428
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Nov 2017 16:38:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-03 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8447 )

Change subject: IMPALA-2181: Add query option levels for display
..


Patch Set 5: Code-Review-1

(3 comments)

I took a quick look through. I'm uncomfortable with this change as it stands, 
because I feel it abuses the Thrift structures too much. Specifically, keeping 
the map of query option levels in TQueryOptions seems wrong: you're really 
describing "Query Option Metadata" (which might include name, description/help, 
and level), rather than the query options themselves. And, for sending this 
stuff to impala-shell, abusing the 'description' field seems wrong. (In fact, 
impala-shell should actually show useful help for these options. Things like 
"MT_DOP" are completely opaque.

Implementation wise, I think we also need to worry about what happens for the 
HiveServer2 side of things. Specifically, we respond to "SET" queries over HS2 
at 
https://github.com/apache/incubator-impala/blob/1803b403e38b3f952afc4ff3fd3e5f4d14c088f8/be/src/service/client-request-state.cc#L194.
 I think we want to stay away from adding information into the Beeswax API that 
isn't also exposed in HS2. One option is to let "SET ALL" be meaningful as a 
query for HS2, though I don't quite know what that means in terms of the schema 
returned: how would you divide the regular/advanced/deprecated options over 
that API?

http://gerrit.cloudera.org:8080/#/c/8447/5/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/8447/5/common/thrift/ImpalaInternalService.thrift@65
PS5, Line 65: // Levels to use when displaying query options from Impala shell
Let's leave a pointer here indicating where the mapping from the options in 
TQueryOptions to these levels is defined.


http://gerrit.cloudera.org:8080/#/c/8447/5/common/thrift/ImpalaInternalService.thrift@297
PS5, Line 297:   // Map for query option name to option level mapping. 
Populated by ImpalaServer
 :   // on startup
 :   61: required map 
query_option_levels;
This worries me a bit, for a couple of reasons.

First, 'required' is forever. (See 
https://diwakergupta.github.io/thrift-missing-guide/#_defining_structs.) I 
think it means that a Thrift client will refuse to serialize a struct if this 
isn't set. For some uses, like TClientRequest, it makes no sense for us to 
carry this around. It's possible it works out because it can be set to the 
empty map often, but it's still odd.

Second, this isn't a query option. It's certainly describing query options, but 
it's very very different from, say, "mem_limit", which applies to the query.


http://gerrit.cloudera.org:8080/#/c/8447/5/shell/impala_client.py
File shell/impala_client.py:

http://gerrit.cloudera.org:8080/#/c/8447/5/shell/impala_client.py@100
PS5, Line 100:   def get_query_option_level_from_description(self, description):
Have we decided we can't change ConfigVariable to include a level? It seems 
like this is very much not a description, and we're abusing it.

/** Represents a Hadoop-style configuration variable. */
struct ConfigVariable {
  1: string key,
  2: string value,
  3: string description
}



--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Nov 2017 16:33:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-11-02 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8434 )

Change subject: IMPALA-5564: Release lock during planning. (wip)
..


Patch Set 1:

(1 comment)

Thanks for the reviews!

I'll report back about cancellations and how this looks in tools like CM that 
look at profiles. Cancellations isn't something I considered yet, so thanks for 
the tip!

http://gerrit.cloudera.org:8080/#/c/8434/1/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/8434/1/be/src/service/client-request-state.h@342
PS1, Line 342: is_planning_
> maybe make that is_planning_finished_ (or similar), and initialize it to fa
I started out with this being planning_finished_ (or some such). I ran into the 
fact that some calls (like GetSchemas()) avoid the ExecuteInternal() path 
entirely: there's no planning to be done, but yet they also have a 
ClientRequestState, and they use get_result_metadata() and so on. In my case, 
the JDBC tests exposed this very clearly, but many other tests would have as 
well.

So, I decided to have state around only the query paths that actually do 
planning, so as to avoid saying that "GetSchemas()" has finished planning, 
since that makes less sense. I could also think about this as a query state, 
but changing the query state enums seems much more delicate than this approach.



--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 02 Nov 2017 22:32:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Correct log line in start-impala-cluster.py.

2017-11-02 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8432 )

Change subject: Correct log line in start-impala-cluster.py.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8432/1/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/8432/1/bin/start-impala-cluster.py@398
PS1, Line 398: executors = options.cluster_size - options.num_coordinators
> Could executors be negative due to the same bug?
No. I think it's handled by line 343-344.

$bin/start-impala-cluster.py  
--impalad_args="--stress_metadata_loading_pause_injection_ms=5000" 
--log_level=2 --cluster_size=1 --num_coordinators=1 --use_exclusive_coordinators
Cannot start an Impala cluster with no executors



--
To view, visit http://gerrit.cloudera.org:8080/8432
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ceece1c05b9a4ca9f0a08fa30d195f811490c0e
Gerrit-Change-Number: 8432
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 02 Nov 2017 22:08:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.

2017-11-01 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8430 )

Change subject: IMPALA-6136: Part 1: Query duration should not be normally 
negative.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8430/2/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/8430/2/be/src/service/impala-http-handler.cc@303
PS2, Line 303:   // record.end_time_us might still be zero if the query is not 
yet done
nit: Could you update the following comment in client-request-state.h and 
impala-server.h (once for ClientRequestState, once for QueryStateRecord) to say 
something about how 0 is a special value.

  /// Start/end time of the query, in Unix microseconds.
  int64_t start_time_us_, end_time_us_;



--
To view, visit http://gerrit.cloudera.org:8080/8430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90
Gerrit-Change-Number: 8430
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 01 Nov 2017 16:02:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-11-01 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8434 )

Change subject: IMPALA-5564: Release lock during planning. (wip)
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17
PS1, Line 17: whereas the web UI would previously block on the lock.
> I tried this out (admittedly 1.5 weeks ago, but I don't think it's changed)
Looking over 
https://gerrit.cloudera.org/c/6707/12/be/src/service/impala-server.cc a bit 
more clearly, this is replacing "Query plan is not ready" with the actual 
profile. The profile is useful at this point: it has been populated with the 
query string, the time that planning started, and so on.



--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 01 Nov 2017 15:36:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-11-01 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8434 )

Change subject: IMPALA-5564: Release lock during planning. (wip)
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17
PS1, Line 17: whereas the web UI would previously block on the lock.
> After IMPALA-1972, it doesn't block, it just returns an empty profile.
I tried this out (admittedly 1.5 weeks ago, but I don't think it's changed). 
You can go to the web UI and see the queries, but if you click on "Profile", 
you'll get stuck getting the lock in ImpalaServer::GetRuntimeProfileStr() 
below. Am I missing something?

Status ImpalaServer::GetRuntimeProfileStr(const TUniqueId& query_id,
const string& user, bool base64_encoded, stringstream* output) {
  DCHECK(output != nullptr);
  // Search for the query id in the active query map
  {
shared_ptr request_state = 
GetClientRequestState(query_id);
if (request_state.get() != nullptr) {
  lock_guard l(*request_state->lock());


http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@26
PS1, Line 26: know the query id
: that's necessary to call GetResultSetMetadata()
> It looks to me like this is against IMPALA-2568. I guess we eventually want
Yes, IMPALA-2568 is one of the eventual goals. This is actually listed as the 
first subtask of that JIRA. It's easier to argue that this one is compatible.

As for this one needing to be undone, it's hard to say. I think you solve 
IMPALA-2568 by having more fine-grained and clear query states/lifecycles. I 
think this is a step in that direction.

What would you suggest?



--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 01 Nov 2017 15:32:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-10-31 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8434 )

Change subject: IMPALA-5564: Release lock during planning. (wip)
..


Patch Set 1:

Hi Dan,

This is a draft of the change to let query profiles show up during planning. 
Feedback appreciated!

Thanks!

-- Philip


--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 31 Oct 2017 22:46:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-10-31 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8434


Change subject: IMPALA-5564: Release lock during planning. (wip)
..

IMPALA-5564: Release lock during planning. (wip)

** I'm looking for feedback on the approach here; if folks
think this is the right direction, I'll work on chasing
down why Thrift Exception messages aren't propagated and
adding tests for any Thrift method that takes a query_id,
across Beeswax and HS2 APIs. **

This commit changes the query path to release the client request state
lock during planning. As a result, the plan becomes available to the web
UI, whereas the web UI would previously block on the lock.

This introduces a new boolean, is_planning_, to capture that that we're
in planning. This is done largely to be able to assert that result
metadata is not queried during planning. The common workflow here is:
client, using Thrift, calls ImpalaServer::ExecuteStatement() (in
impala-hs2-server.cc), which calls ImpalaServer::Execute(), which calls
ImpalaServer::ExecuteInternal(), where the plannig happens. Because the
first time the client can cleanly get the query id is the return of
ExecuteStatement(), there shouldn't be a way to know the query id
that's necessary to call GetResultSetMetadata(). If someone reaches
around (e.g., via the web UI), they will get an error message.

This happens to addess one of the points in IMPALA-3882, which
talks about releasing this very lock.

The core tests pass. One exhaustive test had to be modified,
and I saw exhaustive failures that are unrelated.
I added a new test for one Thrift Beeswax API whose behavior
has changed. If folks think this approach is fine, I'll
work through the other APIs.

Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M tests/common/impala_connection.py
A tests/custom_cluster/test_profile_during_planning.py
M tests/custom_cluster/test_query_concurrency.py
8 files changed, 147 insertions(+), 26 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/8434/1
--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] Correct log line in start-impala-cluster.py.

2017-10-31 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8432


Change subject: Correct log line in start-impala-cluster.py.
..

Correct log line in start-impala-cluster.py.

Updated logging in start-impala-cluster to accurately specify how many
impala nodes were started, and how many of these were coordinators or
executors or both. The new logging looks like:

  Impala Cluster Running with 1 nodes (1 coordinators, 1 executors).

Previously, when invoking this script with --cluster_size=1, it would
report "1 nodes and 3 coordinators" which was wrong (because there was
only 1 coordinator) and confusing (because it seemed like a coordinator
was a separate thing from a node).

I also removed an unused import.

I have run core and exhaustive tests with these change, as part of
testing other changes. Nothing untoward happened.

Change-Id: I7ceece1c05b9a4ca9f0a08fa30d195f811490c0e
---
M bin/start-impala-cluster.py
1 file changed, 8 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/8432/1
--
To view, visit http://gerrit.cloudera.org:8080/8432
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7ceece1c05b9a4ca9f0a08fa30d195f811490c0e
Gerrit-Change-Number: 8432
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] Install OpenJDK-dbg for development environments.

2017-10-31 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8431


Change subject: Install OpenJDK-dbg for development environments.
..

Install OpenJDK-dbg for development environments.

Updates the boostrap script to install the OpenJDK debug symbols.

OpenJDK comes with a gdb stack unwinder that errors out if the OpenJDK
debugging symbols aren't installed. This fixes the following error
message in gdb:

  Installing openjdk unwinder
  Traceback (most recent call last):
File 
"/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py",
 line 52, in 
  class Types(object):
File 
"/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py",
 line 66, in Types
  nmethodp_t = gdb.lookup_type('nmethod').pointer()
  gdb.error: No type named nmethod.

I tested this by installing the package manually on Ubuntu16.04 and
using gdb a bit.

Change-Id: I9aa3a5e1bbba55a4b0b489e4a4dfa39e35fc0ae0
---
M bin/bootstrap_system.sh
1 file changed, 2 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/8431/1
--
To view, visit http://gerrit.cloudera.org:8080/8431
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9aa3a5e1bbba55a4b0b489e4a4dfa39e35fc0ae0
Gerrit-Change-Number: 8431
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3

2017-10-31 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8426 )

Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3
..


Patch Set 3:

> Patch Set 3:
>
> I'm still testing this change. With this change we seem to be more prone to 
> hitting the Hive/mapred local executor race (MAPREDUCE-6441 MAPREDUCE-6992). 
> Will sync with PhilZ to see how to proceed.

https://gerrit.cloudera.org/c/8405/ is ready to commit. We were holding off 
because Lars said to hold off until the build stabilized.

I don't know where you're running into MAPREDUCE-6441. My change only addresses 
it for data load, since it's an option you have to pass to beeline.

c/8405 is splittable into two parts (beeline change for the option; 
parallelization), but I don't see much benefit into splitting it.


--
To view, visit http://gerrit.cloudera.org:8080/8426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
Gerrit-Change-Number: 8426
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 31 Oct 2017 20:38:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable

2017-10-31 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8428 )

Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable
..


Patch Set 1:

(1 comment)

Most of this looks mechanical, and it looked fine. You changed some 1-line if's 
into 3-line ifs, which may be against house style.

We don't seem to have any tests for condition-variable.h. How did you test this?

http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h
File be/src/util/condition-variable.h:

http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h@38
PS1, Line 38:   void Wait(boost::unique_lock& lock) {
I'm not yet familiar with this part of C++; why is 'inline' getting removed 
here? And 'struct' in line 47?



--
To view, visit http://gerrit.cloudera.org:8080/8428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9
Gerrit-Change-Number: 8428
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 31 Oct 2017 17:23:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable

2017-10-31 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8428 )

Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8428/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8428/1//COMMIT_MSG@14
PS1, Line 14: This commit substitues every boost::condtion_variable in
nit: condtion_variable -> condition_variable.



--
To view, visit http://gerrit.cloudera.org:8080/8428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9
Gerrit-Change-Number: 8428
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 31 Oct 2017 17:15:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6073: Fail on misconfigured CLASSPATH.

2017-10-30 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8327 )

Change subject: IMPALA-6073: Fail on misconfigured CLASSPATH.
..


Patch Set 1: Code-Review-1

> Patch Set 1:
>
> > I think this check seems not to be enough. Let me leave a detailed
>  > comment on the JIRA ticket.
>
> It is OK to leave detailed comments here in the code review tool. In fact, it 
> is common.

On JIRA, Kim Jin Chul said:

> CLASSPATH should be set after loading impala-config.sh but the variable may 
> not have sufficient libraries.

I'm -1'ing this for a bit; I think the trick is to figure out if we can get rid 
of impala-config.sh setting this at all. I'll try it out.


--
To view, visit http://gerrit.cloudera.org:8080/8327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb92fa5cad8bbc29b620bec5904e45ed7aeff13d
Gerrit-Change-Number: 8327
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 30 Oct 2017 21:06:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-10-27 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8405/2/bin/load-data.py
File bin/load-data.py:

http://gerrit.cloudera.org:8080/#/c/8405/2/bin/load-data.py@114
PS2, Line 114: # When HiveServer2 is configured to use "local" mode (i.e., MR 
jobs are run
> Let's mention the HADOOP JIRA in case it's ever fixed and we can remove the
Good idea. Done.



--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Oct 2017 03:48:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-10-27 Thread Philip Zeyliger (Code Review)
Hello Joe McDonnell, Tim Armstrong, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8405

to look at the new patch set (#3).

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..

IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

This is a revert of a revert, re-enabling parallel data load.  It avoid
the race condition by explicitly configuring the temporary directory in
question in load-data.py.

When the parallel data load change went in, we discovered
a race with a signature of:

  java.io.FileNotFoundException: File
  /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist

The number in this path is milliseconds since the epoch, and the race
occurs when two queries submitted to HiveServer2, running with the local
runner, hit the same millisecond time stamp.  The upstream bug is
https://issues.apache.org/jira/browse/MAPREDUCE-6441, and I described the
symptoms in https://issues.apache.org/jira/browse/MAPREDUCE-6992 (which
is now marked as a dupe).

I've tested this by running data load 5 times on the same machines
where it failed before. I also ran data load manually and inspected
the system to make sure that the temporary directories are getting
created as expected in /tmp/impala-data-load-*.

Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
---
M bin/load-data.py
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
4 files changed, 59 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8405/3
--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-10-27 Thread Philip Zeyliger (Code Review)
Hello Joe McDonnell, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8405

to look at the new patch set (#2).

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..

IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

This is a revert of a revert, re-enabling parallel data load.  It avoid
the race condition by explicitly configuring the temporary directory in
question in load-data.py.

When the parallel data load change went in, we discovered
a race with a signature of:

  java.io.FileNotFoundException: File
  /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist

The number in this path is milliseconds since the epoch, and the race
occurs when two queries submitted to HiveServer2, running with the local
runner, hit the same millisecond time stamp.  The upstream bug is
https://issues.apache.org/jira/browse/MAPREDUCE-6441, and I described the
symptoms in https://issues.apache.org/jira/browse/MAPREDUCE-6992 (which
is now marked as a dupe).

I've tested this by running data load 5 times on the same machines
where it failed before. I also ran data load manually and inspected
the system to make sure that the temporary directories are getting
created as expected in /tmp/impala-data-load-*.

Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
---
M bin/load-data.py
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
4 files changed, 58 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8405/2
--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-10-27 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8405


Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..

IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

This is a revert of a revert, re-enabling parallel data load.  It avoid
the race condition by explicitly configuring the temporary directory in
question in load-data.py.

When the parallel data load change went in, we discovered
a race with a signature of:

  java.io.FileNotFoundException: File
  /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist

The number in this path is milliseconds since the epoch, and the race
occurs when two queries submitted to HiveServer2, running with the local
runner, hit the same millisecond time stamp.  The upstream bug is
https://issues.apache.org/jira/browse/MAPREDUCE-6441, and I described the
symptoms in https://issues.apache.org/jira/browse/MAPREDUCE-6992 (which
is now marked as a dupe).

I've tested this by running data load 5 times on the same machines
where it failed before. I also ran data load manually and inspected
the system to make sure that the temporary directories are getting
created as expected in /tmp/impala-data-load-*.

Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
---
M bin/load-data.py
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
4 files changed, 61 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8405/1
--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-27 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8211 )

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..


Patch Set 8:

> Patch Set 8:
>
> > Oof. GVO caught a (correct) clang-tidy issue. Alas, clang-tidy is
>  > not in pre-review-test, so this was my first experience with that.
>
> You can also use 
> https://jenkins.impala.io/view/Utility/job/gerrit-verify-dryrun-external/ , 
> which does run clang-tidy.

Thanks for the pointer. I'm giving it a shot at 
https://jenkins.impala.io/job/gerrit-verify-dryrun-external/25/

My local clang-tidy build was clean:

$ bin/run_clang_tidy.sh | tee clang_tidy.txt
$ cat clang_tidy.txt | grep ']' 
   $


--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 8
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Oct 2017 21:39:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-27 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8211 )

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..


Patch Set 8:

Oof. GVO caught a (correct) clang-tidy issue. Alas, clang-tidy is not in 
pre-review-test, so this was my first experience with that.

I've made the trivial change (removed a const in llvm-codegen.h) and am running 
it through the relevant gauntlet locally.


--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 8
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Oct 2017 20:56:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-27 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8211 )

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..


Patch Set 7: Code-Review+1

Carrying the +1.

I've not made any changes here (except rebases). I've run the core tests 
successfully. I had run into https://issues.apache.org/jira/browse/IMPALA-6118 
and https://issues.apache.org/jira/browse/IMPALA-6106 in previous testing. I 
didn't see any evidence that this change breaks the build specifically.

The specific failed GVO build had the following assertion. That's IMPALA-6099, 
which is also reasonably unrelated to the Avro-only changes here.


F1023 19:06:33.677640 43548 filter-context.cc:49] Check failed: it != 
counters.end() Tried to increment unknown counter group
*** Check failure stack trace: ***
@  0x2f1e11d  google::LogMessage::Fail()
F1023 19:06:33.677934 43350 filter-context.cc:49] Check failed: it != 
counters.end() Tried to increment unknown counter group
*** Check failure stack trace: ***
@  0x2f1f9c2  google::LogMessage::SendToLog()
@  0x2f1daf7  google::LogMessage::Flush()
@  0x2f1e11d  google::LogMessage::Fail()
@  0x2f210be  google::LogMessageFatal::~LogMessageFatal()
@  0x2f1f9c2  google::LogMessage::SendToLog()
@  0x1ca4d99  impala::FilterStats::IncrCounters()   


@   
   0x1ca72a4  impala::FilterContext::CheckForAlwaysFalse()
@  0x2f1daf7  google::LogMessage::Flush()
@  0x1aa537d  impala::HdfsScanNodeBase::PartitionPassesFilters()
@  0x2f210be  google::LogMessageFatal::~LogMessageFatal()
@  0x1b0c9c6  impala::HdfsParquetScanner::GetNextInternal()
@  0x1ca4d99  impala::FilterStats::IncrCounters()
@  0x1b0acca  impala::HdfsParquetScanner::ProcessSplit()
@  0x1a99cc0  impala::HdfsScanNode::ProcessSplit()
@  0x1ca72a4  impala::FilterContext::CheckForAlwaysFalse()
@  0x1a99136  impala::HdfsScanNode::ScannerThread()
@  0x1a985d3  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_17ThreadResourceMgr12ResourcePoolEENKUlvE_clEv
@  0x1aa537d  impala::HdfsScanNodeBase::PartitionPassesFilters()
@  0x1a9a501  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_17ThreadResourceMgr12ResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x171bdfc  boost::function0<>::operator()()
@  0x1b0c9c6  impala::HdfsParquetScanner::GetNextInternal()
@  0x19f3393  impala::Thread::SuperviseThread()
@  0x1b0acca  impala::HdfsParquetScanner::ProcessSplit()


@   
   0x19fbf26  boost::_bi::list4<>::operator()<>()
@  0x1a99cc0  impala::HdfsScanNode::ProcessSplit()
@  0x19fbe69  boost::_bi::bind_t<>::operator()()
@  0x19fbe2c  boost::detail::thread_data<>::run()
@  0x1a99136  impala::HdfsScanNode::ScannerThread()
@  0x1a985d3  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_17ThreadResourceMgr12ResourcePoolEENKUlvE_clEv
@  0x20a7c9a  thread_proxy
@  0x1a9a501  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_17ThreadResourceMgr12ResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@ 0x7f0cd8e3a6ba  start_thread
@ 0x7f0cd8b703dd  clone


--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Oct 2017 16:17:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects

2017-10-26 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8368 )

Change subject: IMPALA-2235: Fix current db when shell auto-reconnects
..


Patch Set 2:

(7 comments)

Thanks for tackling this! As a user, I hate that this was broken.

I think your changes to impala_shell.py look good. I'm a bit more wary of 
having another superclass for tests with helper functions.

http://gerrit.cloudera.org:8080/#/c/8368/2/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/8368/2/shell/impala_shell.py@729
PS2, Line 729:   def _validate_database(self, immediately=False):
Is there a case where you ever want immediately=True? i.e., could we get rid of 
the two paths here and converge to just one?


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py
File tests/common/cluster_controller.py:

http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@33
PS2, Line 33: class ClusterController(CustomClusterTestSuite):
I'm generally not fond of the inheritance and multiple-inheritance in these 
tests. Is this actually distinct from CustomClusterTestSuite?


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@64
PS2, Line 64:   def start_cluster_with_args(self, **kwargs):
This could go straight into the super class. It's tightly coupled with 
_start_impala_cluster.


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py
File tests/custom_cluster/test_shell_interactive_reconnect.py:

http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@25
PS2, Line 25: TMP_HISTORY_FILE = os.path.expanduser("~/.impalahistorytmp")
Perhaps use an actual tempfile created with tempfile.[something]


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@27
PS2, Line 27: class TestShellInteractiveReconnect(ClusterController):
Any reason not to add this simply to tests/shell/test_shell_interactive.py?

I think because you need CustomCluster to do the restart? If so, please add a 
comment about this.


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@40
PS2, Line 40:   @pytest.mark.execute_serially
I think all CustomCluster tests run serially? This one seems like it could be 
parallel.


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py
File tests/shell/util.py:

http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py@116
PS2, Line 116:   """ Moves history file to given filepath """
The underlying bug here is that the shell doesn't have a historypath option, 
but I think this is fine...



--
To view, visit http://gerrit.cloudera.org:8080/8368
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
Gerrit-Change-Number: 8368
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 17:58:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6108: Revert "IMPALA-6070: Parallel data load."

2017-10-25 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8386


Change subject: IMPALA-6108: Revert "IMPALA-6070: Parallel data load."
..

IMPALA-6108: Revert "IMPALA-6070: Parallel data load."

We may be seeing a race with errors like "java.io.FileNotFoundException:
File /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist".

This reverts commit e020c37106383be5416f882cbe11fc25efad8968.

Change-Id: I46da93f4315a5a4bdaa96fa464cb51922bd6c419
---
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
3 files changed, 5 insertions(+), 44 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/8386/1
--
To view, visit http://gerrit.cloudera.org:8080/8386
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I46da93f4315a5a4bdaa96fa464cb51922bd6c419
Gerrit-Change-Number: 8386
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-3998: deprecate --refresh after connect

2017-10-25 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8381 )

Change subject: IMPALA-3998: deprecate --refresh_after_connect
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id297f80c0f596a69ef8ecde948812b82d2a5c0fa
Gerrit-Change-Number: 8381
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 25 Oct 2017 18:02:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-25 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8211 )

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..


Patch Set 6: Code-Review-1

The test failures are real; I'm going to figure it out.


--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 25 Oct 2017 15:35:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

2017-10-24 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
..


Patch Set 1:

Hi Laszlo,

We're interested in this change for avoiding some compatibility problems with 
Hadoop 3. Any news here?

Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 24 Oct 2017 23:03:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock

2017-10-23 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8363 )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and 
client_request_state_map_lock_
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8363/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8363/3//COMMIT_MSG@30
PS3, Line 30: shareded to a default of 4 buckets initally.
nit: sharded


http://gerrit.cloudera.org:8080/#/c/8363/3/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8363/3/be/src/runtime/query-exec-mgr.cc@84
PS3, Line 84: uint8_t qs_map_bucket = QueryIdToBucket(query_id);
: lock_guard l(qs_map_lock_[qs_map_bucket]);
Would it make sense to write a helper so that this code could look like:

// I don't like my naming here...
gs_map_bucket_guard qs_map_bucket_(get_guard(query_id));
qs_map_bucket_.find(query_id); ...
...

i.e., every time you use this thing, you have to use QueryIdToBucket(), and 
then use the result of that for both the lock guard andthe map. Could that be 
shortened with a helper class that contains both the lock guard and the map 
you're allowed to use?


http://gerrit.cloudera.org:8080/#/c/8363/3/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/8363/3/be/src/service/impala-http-handler.cc@244
PS3, Line 244:   for (int i = 0 ; i < 4 ; ++i) {
4 should be the constant here, no?


http://gerrit.cloudera.org:8080/#/c/8363/3/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8363/3/be/src/service/impala-server.h@562
PS3, Line 562:   /// on disk. Must be called with the corressponding 
client_request_state_map_lock_[]
nit: corresponding



--
To view, visit http://gerrit.cloudera.org:8080/8363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 23 Oct 2017 22:34:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py

2017-10-23 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8354 )

Change subject: IMPALA-6070: Parallel compute_table_stats.py
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py
File tests/util/compute_table_stats.py:

http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py@71
PS2, Line 71: end(pool.apply_async(compute_stats_table
> Thanks for the clarification. I actually missed that you do the same thing
This script ignores errors by default, and that's not including silent issues 
like the one you mention. For its purposes: computing table stats on 
everything, it seems to do the trick. If I were shipping a "stats computation 
script", I wouldn't ship this one. (For one, it computes stats even if they've 
been computed.)

In fact, the current invocation tries to compute stats on a view in Kudu and 
fails, but the failure is ignored.



--
To view, visit http://gerrit.cloudera.org:8080/8354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
Gerrit-Change-Number: 8354
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 23 Oct 2017 19:28:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py

2017-10-23 Thread Philip Zeyliger (Code Review)
Hello Michael Brown, David Knupp, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8354

to look at the new patch set (#4).

Change subject: IMPALA-6070: Parallel compute_table_stats.py
..

IMPALA-6070: Parallel compute_table_stats.py

Uses a thread pool to issue many compute stats commands in parallel to
Impala, rather than doing it serially. Where it was obvious, I combined
multiple stats commands into fewer, to reduce the number
of "show databses" and serialized "show tables" commands.

This speeds up the compute stats step in data loading significantly. My
measurements for testdata/bin/compute-table-stats.sh running before and
after this change, with the Impala daemons restarted (cold) or not
restarted (warm) on an 8-core, 32GB RAM machine were:

old, cold: 7m44s
new, cold: 1m42s

old, warm: 1m23s
new, warm:   48s

The data load in the full test build behaves in a cold fashion. It's
typical for https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/ to
run this compute stats step for 9 or 10 minutes. With this change, this
will come down to about 2 minutes.

Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
---
M testdata/bin/compute-table-stats.sh
M tests/util/compute_table_stats.py
2 files changed, 59 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8354/4
--
To view, visit http://gerrit.cloudera.org:8080/8354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
Gerrit-Change-Number: 8354
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py

2017-10-23 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8354 )

Change subject: IMPALA-6070: Parallel compute_table_stats.py
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py
File tests/util/compute_table_stats.py:

http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py@44
PS2, Line 44:   if not continue_on_error: raise e
> Two nits: even though we tend to do this a lot, PEP 8 frowns on inline if s
Sure; fixed.

(This was pre-existing.)


http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py@48
PS2, Line 48:
> I'm curious -- why is it necessary to define the client factory inside of t
It's a closure. Doing it inline in the main block lets the closure access 
options.impalad, and so on. Obviously, you could create a class, which takes as 
init parameters all the options, and then make this a class method. Or you 
could use globals.

I've not changed anything here.


http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py@71
PS2, Line 71: _factory, db, table, continue_on_error,)
> It seems to me like you don't need the intersection. You already have eithe
Let's say all_tables = ["a", "b", "c"] and selected_tables = ["b", "c", "d"]. 
This code is trying to avoid running compute stats on "d", which has been 
selected but isn't in all tables.


http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py@116
PS2, Line 116: yield impala_client
> Would it be possible to decorate client_factory as a @contextmanager, somet
Done



--
To view, visit http://gerrit.cloudera.org:8080/8354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
Gerrit-Change-Number: 8354
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 23 Oct 2017 18:42:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py

2017-10-23 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8354 )

Change subject: IMPALA-6070: Parallel compute_table_stats.py
..


Patch Set 2:

> Patch Set 2:
>
> (1 comment)

Sure. I did the @contextmanager trick. I'm a little bit ambivalent about it, 
because it's kind of "fancy python", but I think it looks ok.


--
To view, visit http://gerrit.cloudera.org:8080/8354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
Gerrit-Change-Number: 8354
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 23 Oct 2017 18:08:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py

2017-10-23 Thread Philip Zeyliger (Code Review)
Hello Michael Brown, David Knupp, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8354

to look at the new patch set (#3).

Change subject: IMPALA-6070: Parallel compute_table_stats.py
..

IMPALA-6070: Parallel compute_table_stats.py

Uses a thread pool to issue many compute stats commands in parallel to
Impala, rather than doing it serially. Where it was obvious, I combined
multiple stats commands into fewer, to reduce the number
of "show databses" and serialized "show tables" commands.

This speeds up the compute stats step in data loading significantly. My
measurements for testdata/bin/compute-table-stats.sh running before and
after this change, with the Impala daemons restarted (cold) or not
restarted (warm) on an 8-core, 32GB RAM machine were:

old, cold: 7m44s
new, cold: 1m42s

old, warm: 1m23s
new, warm:   48s

The data load in the full test build behaves in a cold fashion. It's
typical for https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/ to
run this compute stats step for 9 or 10 minutes. With this change, this
will come down to about 2 minutes.

Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
---
M testdata/bin/compute-table-stats.sh
M tests/util/compute_table_stats.py
2 files changed, 58 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8354/3
--
To view, visit http://gerrit.cloudera.org:8080/8354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
Gerrit-Change-Number: 8354
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py

2017-10-23 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8354 )

Change subject: IMPALA-6070: Parallel compute_table_stats.py
..


Patch Set 2:

BTW,

https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/481/consoleFull

did fine. It had both data load parallelism and table stats parallelism. Ran in 
3h17m. A baseline is, say, 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/479/, which ran for 
3h52m. So, between these two changes, I've picked up about 35 minutes.

22:51:43 Computing table stats (logging to 
/home/ubuntu/Impala/logs/data_loading/compute-table-stats.log)...
22:53:28   Computing table stats OK (Took: 1 min 45 sec)


--
To view, visit http://gerrit.cloudera.org:8080/8354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
Gerrit-Change-Number: 8354
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 23 Oct 2017 17:34:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py

2017-10-23 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8354 )

Change subject: IMPALA-6070: Parallel compute_table_stats.py
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8354/1/tests/util/compute_table_stats.py
File tests/util/compute_table_stats.py:

http://gerrit.cloudera.org:8080/#/c/8354/1/tests/util/compute_table_stats.py@37
PS1, Line 37: statement = "compute stats %s" % (db_table,)
> Do these prints come out clean or all garbled due to the parallelism?
They do. At first I thought I didn't care, but you're right, and I've fixed it.

I switched to using python logging, which has its own logging to prevent 
interleaving. As a bonus, we get timestamps if we want to debug how long things 
take, and thread ids to make it obvious that there's parallelism.


http://gerrit.cloudera.org:8080/#/c/8354/1/tests/util/compute_table_stats.py@119
PS1, Line 119: impala_client.connect()
> please add newline to separate more cleanly
Done



--
To view, visit http://gerrit.cloudera.org:8080/8354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
Gerrit-Change-Number: 8354
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 23 Oct 2017 17:29:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py

2017-10-23 Thread Philip Zeyliger (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8354

to look at the new patch set (#2).

Change subject: IMPALA-6070: Parallel compute_table_stats.py
..

IMPALA-6070: Parallel compute_table_stats.py

Uses a thread pool to issue many compute stats commands in parallel to
Impala, rather than doing it serially. Where it was obvious, I combined
multiple stats commands into fewer, to reduce the number
of "show databses" and serialized "show tables" commands.

This speeds up the compute stats step in data loading significantly. My
measurements for testdata/bin/compute-table-stats.sh running before and
after this change, with the Impala daemons restarted (cold) or not
restarted (warm) on an 8-core, 32GB RAM machine were:

old, cold: 7m44s
new, cold: 1m42s

old, warm: 1m23s
new, warm:   48s

The data load in the full test build behaves in a cold fashion. It's
typical for https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/ to
run this compute stats step for 9 or 10 minutes. With this change, this
will come down to about 2 minutes.

Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
---
M testdata/bin/compute-table-stats.sh
M tests/util/compute_table_stats.py
2 files changed, 62 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8354/2
--
To view, visit http://gerrit.cloudera.org:8080/8354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
Gerrit-Change-Number: 8354
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.

2017-10-23 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.
..


Patch Set 8:

(4 comments)

Thanks for the review, Alex!

I believe I addressed your comments. I've re-run 
test_alter_table_create_many_partitions (the new test). I'm kicking off a full 
test run (https://jenkins.impala.io/view/Utility/job/pre-review-test/68/) to 
make sure my Lists.partition() change wasn't fat-fingered somehow.

LD_LIBRARY_PATH=./toolchain/kudu-bec2a24/release/lib:$LD_LIBRARY_PATH 
infra/python/env/bin/py.test tests/metadata/test_ddl.py -k 
test_alter_table_create_many_partitions -s

http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1958
PS7, Line 1958:   Lists.partition(allHmsPartitionsToAdd, 
MAX_PARTITION_UPDATES_PER_RPC)) {
> While you are here, can you also change bulkAlterPartitions() to use Lists.
Done


http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@514
PS7, Line 514: service.wait_for_metric_value(class_cache_misses_metric, 
class_cache_misses)
> This test doesn't belong in the TestLibCache class.
I moved it up in this faile in test_ddl. Good catch that this file had multiple 
classes; I missed that somehow.

I found a test test_hms_integration.py:test_add_overlapping_partitions that 
does have multi-partition add alter, but it's all about Hive/Impala 
compatibility, so I think test_ddl.py is a slightly better home here. I'm very 
happy to defer to you.


http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@519
PS7, Line 519: cache. create_stmts is the list of CREATE statements to be 
executed in order
> "use" can generally cause problems in tests because we sometimes run a test
Done


http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@527
PS7, Line 527: 
self.client.set_configuration(vector.get_value("exec_option"))
> This comment can become stale quickly. I suggest not adding it an output ro
Done



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 8
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 23 Oct 2017 17:20:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.

2017-10-23 Thread Philip Zeyliger (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8238

to look at the new patch set (#9).

Change subject: IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.
..

IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.

This commit allows users to add more than 500
(=MAX_PARTITION_UPDATES_PER_RPC) partitions in a single ALTER TABLE
command. We batch the operations against Hive into groups of 500.

I tested this manually, creating 1002 partitions and observing the
expected 3 API calls against the Hive Metastore in the log.  I can
confirm that there is coverage of this in some existing tests.  A new,
simple, test has been added that confirms that creating 502 partitions
works.

Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/metadata/test_ddl.py
4 files changed, 45 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/9
--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 9
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.

2017-10-23 Thread Philip Zeyliger (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8238

to look at the new patch set (#8).

Change subject: IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.
..

IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.

This commit allows users to add more than 500
(=MAX_PARTITION_UPDATES_PER_RPC) partitions in a single ALTER TABLE
command. We batch the operations against Hive into groups of 500.

I tested this manually, creating 1002 partitions and observing the
expected 3 API calls against the Hive Metastore in the log.  I can
confirm that there is coverage of this in some existing tests.  A new,
simple, test has been added that confirms that creating 502 partitions
works.

Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/metadata/test_ddl.py
4 files changed, 45 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/8
--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 8
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py

2017-10-22 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8354


Change subject: IMPALA-6070: Parallel compute_table_stats.py
..

IMPALA-6070: Parallel compute_table_stats.py

Uses a thread pool to issue many compute stats commands in parallel to
Impala, rather than doing it serially. Where it was obvious, I combined
multiple stats commands into fewer, to reduce the number
of "show databses" and serialized "show tables" commands.

This speeds up the compute stats step in data loading significantly. My
measurements for testdata/bin/compute-table-stats.sh running before and
after this change, with the Impala daemons restarted (cold) or not
restarted (warm) on an 8-core, 32GB RAM machine were:

old, cold: 7m44s
new, cold: 1m42s

old, warm: 1m23s
new, warm:   48s

The data load in the full test build behaves in a cold fashion. It's
typical for https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/ to
run this compute stats step for 9 or 10 minutes. With this change, this
will come down to about 2 minutes.

Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
---
M testdata/bin/compute-table-stats.sh
M tests/util/compute_table_stats.py
2 files changed, 58 insertions(+), 29 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8354/1
--
To view, visit http://gerrit.cloudera.org:8080/8354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
Gerrit-Change-Number: 8354
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-21 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8211 )

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..


Patch Set 4:

(4 comments)

Thanks. I think I got all the long lines out.

"git show | egrep '^\+.{91}' | grep -v '^+//'" doesn't return anything any 
more. (Obviously, we're not wrapping the example codegen.)

http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.h
File be/src/exec/hdfs-avro-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.h@241
PS3, Line 241:   const SchemaPath& path, const AvroSchemaElement& record, 
int child_start,
> long line
Done


http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc
File be/src/exec/hdfs-avro-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc@856
PS3, Line 856:   BasicBlock* helper_block = BasicBlock::Create(
> Long line
Done


http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc@867
PS3, Line 867:   Function* fnHelper = helper_functions[i];
> Long line.
Done


http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc@889
PS3, Line 889:
> long line.
Done



--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 22 Oct 2017 04:13:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-21 Thread Philip Zeyliger (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8211

to look at the new patch set (#5).

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..

IMPALA-5243: Speed up code gen for wide Avro tables.

HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in
size to the number of columns. On 1000 column tables, codegen time is
significant. This commit roughly halves it for wide columns.
(Note that this had been much worse in recent history (<= Impala 2.9).)

It does so by breaking up MaterializeTuple() into multiple smaller
functions, and then calls them in order. When breaking up into
200-column chunks, there is a noticeable speed-up.

I've made the helper code for generating LLVM function prototypes
have a mutable function name, so that the builder can be re-used
multiple times.

I've checked by inspecting optimized LLVM that in the case where there's
only 1 helper function, code gets inlined so that there doesn't seem to
be an extra function.

I measured codegen time for various "step sizes." The case where there
are no helper functions is about 2.7s. The best case was about a step
size of 200, with timings of 1.3s.

For the query "select count(int_col16) from 
functional_avro.widetable_1000_cols",
codegen times as a function of step size are roughly as follows. This is
averaged across 5 executions, and rounded to 0.1s.

   step time
 10 2.4
 50 2.5
 75 2.9
100 3.0
125 3.0
150 1.4
175 1.3
200 1.3 <-- chosen step size
225 1.5
250 1.4
300 1.6
400 1.6
500 1.8
   1000 2.7

The raw data was generated like so, with some code that let me change the step 
size at runtime:

  $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for 
try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; 
impala-shell.sh -q "select count(int_col16) from 
functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 
'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' |
  sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee 
out.txt
  ...
  200  CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) 
CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K 
OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms
  ...
  1000  CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) 
CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K 
OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms

I have run the core tests with this change.

Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
---
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
3 files changed, 158 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/5
--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-21 Thread Philip Zeyliger (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8211

to look at the new patch set (#4).

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..

IMPALA-5243: Speed up code gen for wide Avro tables.

HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in
size to the number of columns. On 1000 column tables, codegen time is
significant. This commit roughly halves it for wide columns.
(Note that this had been much worse in recent history (<= Impala 2.9).)

It does so by breaking up MaterializeTuple() into multiple smaller
functions, and then calls them in order. When breaking up into
200-column chunks, there is a noticeable speed-up.

I've made the helper code for generating LLVM function prototypes
have a mutable function name, so that the builder can be re-used
multiple times.

I've checked by inspecting optimized LLVM that in the case where there's
only 1 helper function, code gets inlined so that there doesn't seem to
be an extra function.

I measured codegen time for various "step sizes." The case where there
are no helper functions is about 2.7s. The best case was about a step
size of 200, with timings of 1.3s.

For the query "select count(int_col16) from 
functional_avro.widetable_1000_cols",
codegen times as a function of step size are roughly as follows. This is
averaged across 5 executions, and rounded to 0.1s.

   step time
 10 2.4
 50 2.5
 75 2.9
100 3.0
125 3.0
150 1.4
175 1.3
200 1.3 <-- chosen step size
225 1.5
250 1.4
300 1.6
400 1.6
500 1.8
   1000 2.7

The raw data was generated like so, with some code that let me change the step 
size at runtime:

  $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for 
try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; 
impala-shell.sh -q "select count(int_col16) from 
functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 
'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' |
  sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee 
out.txt
  ...
  200  CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) 
CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K 
OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms
  ...
  1000  CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) 
CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K 
OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms

I have run the core tests with this change.

Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
---
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
3 files changed, 156 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/4
--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-21 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
..


Patch Set 2:

(9 comments)

Thanks for the reviews!

I observed memory when watching this, and on my 32GB machine, I always has 
~20GB available.

I agree with Alex on adding in more things: there are similar changes that can 
continue to help here, but I'm doing them one at a time.

http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@9
PS1, Line 9: This commit loads functional-query, TPC-H data, and TPC-DS data in
> nit: Can you wrap this at the red line provided by gerrit? I think it is 72
Done. "gqip" does it in vi. It looks like it's 72 chars.


http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@12
PS1, Line 12: 13 minut
> nit: minutes
Done


http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@33
PS1, Line 33: 16:14:33Loading workload 'tpcds' using exploration strategy 
'core' OK (Took: 16 min 29 sec)
> What testing did you do? Does the data load still run on a non-beefy local
Define non-beefy?

My desktop is 32 GB and 8 cores. This ran fine.


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@480
PS1, Line 480:   # Run some steps in parallel, with run-step-backgroundable / 
run-step-wait-all.
> Could add a comment about what you decided to background and what you decid
Done.


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@492
PS1, Line 492: LOAD_NESTED_ARGS="--cm-host $CM_HOST"
> I don't see any reason this also couldn't run in parallel.
Yes, but I've not tested this one.


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@505
PS1, Line 505:   load-data "functional-query" "core" "hbase/none"
 : fi
 :
 : if $KUDU_IS_SUPPORTED; then
 :   # Tests depend on the kudu data being clean, so load
> It should be possible to do the same thing for these. That will only save a
Yes. I am testing this one, but I'll do a separate patch for it.


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75
PS1, Line 75:   HADOOP_HEAPSIZE="512" hive --service hiveserver2 > 
${LOGDIR}/hive-server2.out 2>&1 &
> :). I'm still using that good-old machine, mem should be fine (fingers cros
512 works, so that's what I've changed it to. I'm not investigating using -Xms 
-Xmx to give this more flexibility (but even less predictability).


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh
File testdata/bin/run-step.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@53
PS1, Line 53:
> nit: only one empty line, to match context
Done


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@84
PS1, Line 84:   RUN_STEP_MSGS=()
> Do you want to reset MSGS, too?
Good catch. Done.



--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Sat, 21 Oct 2017 21:32:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-21 Thread Philip Zeyliger (Code Review)
Hello Jim Apple, Joe McDonnell, Alex Behm, Zach Amsden,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8320

to look at the new patch set (#2).

Change subject: IMPALA-6070: Parallel data load.
..

IMPALA-6070: Parallel data load.

This commit loads functional-query, TPC-H data, and TPC-DS data in
parallel. In parallel, these take about 37 minutes, dominated by
functional-query. Serially, these take about 30 minutes more, namely the
13 minutes of tpcds and 16 minutes of tpcds. This works out nicely
because CPU usage during data load is very low in aggregate. (We don't
sustain more than 1 CPU of load, whereas build machines are likely to
have many CPUs.)

To do this, I added support to run-step.sh to have a notion of a
backgroundable task, and support waiting for all tasks.

I also increased the heapsize of our HiveServer2 server. When datasets
were being loaded in parallel, we ran out of memory at 256MB of heap.

The resulting log output is currently like so (but without the
timestamps):

15:58:04  Started Loading functional-query data in background; pid 8105.
15:58:04  Started Loading TPC-H data in background; pid 8106.
15:58:04  Loading functional-query data (logging to 
/home/impdev/Impala/logs/data_loading/load-functional-query.log)...
15:58:04  Started Loading TPC-DS data in background; pid 8107.
15:58:04  Loading TPC-H data (logging to 
/home/impdev/Impala/logs/data_loading/load-tpch.log)...
15:58:04  Loading TPC-DS data (logging to 
/home/impdev/Impala/logs/data_loading/load-tpcds.log)...
16:11:31Loading workload 'tpch' using exploration strategy 'core' OK (Took: 
13 min 27 sec)
16:14:33Loading workload 'tpcds' using exploration strategy 'core' OK 
(Took: 16 min 29 sec)
16:35:08Loading workload 'functional-query' using exploration strategy 
'exhaustive' OK (Took: 37 min 4 sec)

I tested dataloading with the following command on an 8-core, 32GB
machine. I saw 19GB of available memory during my run:
  ./buildall.sh -testdata -build_shared_libs -start_minicluster 
-start_impala_cluster -format

Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
---
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
3 files changed, 44 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/8320/2
--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-6073: Fail on misconfigured CLASSPATH.

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8327


Change subject: IMPALA-6073: Fail on misconfigured CLASSPATH.
..

IMPALA-6073: Fail on misconfigured CLASSPATH.

Asserts, early, that $CLASSPATH is set and has no wildcards.

Several developers have been tripped up running C++ tests without first
running 'bin/set-classpath.sh'. This causes them to run into HDFS-11851
(getGlobalJNIEnv() may deadlock on exception), wherein, instead of a
relatively clear "Class not found" error, we hang forever.

I considered limiting this to tests, but we clearly need a $CLASSPATH
for the daemons themselves.

Testing:
* Ran a backend test without $CLASSPATH set.
* Started Impala cluster with this change.

Change-Id: Icb92fa5cad8bbc29b620bec5904e45ed7aeff13d
---
M be/src/common/init.cc
1 file changed, 7 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/8327/1
--
To view, visit http://gerrit.cloudera.org:8080/8327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icb92fa5cad8bbc29b620bec5904e45ed7aeff13d
Gerrit-Change-Number: 8327
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6045: Make build scripts more friendly to Ubuntu 16.04

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8262 )

Change subject: IMPALA-6045: Make build scripts more friendly to Ubuntu 16.04
..


Patch Set 3:

> Patch Set 3:
>
> > Sorry to get to this a bit late.
>
> No apology required.
>
> I agree with all of your suggestions.
>
> If you want to file a ticket and assign it to me or if you want to send me a 
> patch to review, I'd be amenable.

Filed https://issues.apache.org/jira/browse/IMPALA-6079.

I've left it unassigned. It's not clear to me yet whether some other work will 
lead me right back to this :)


--
To view, visit http://gerrit.cloudera.org:8080/8262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8196a2a87bce5893a349a1b290c3f3d04fd80317
Gerrit-Change-Number: 8262
Gerrit-PatchSet: 3
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 18 Oct 2017 22:38:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8320


Change subject: IMPALA-6070: Parallel data load.
..

IMPALA-6070: Parallel data load.

This commit loads functional-query, TPC-H data, and TPC-DS data in parallel. In
parallel, these take about 37 minutes, dominated by functional-query. Serially,
these take about 30 minutes more, namely the 13 minutes of tpcds and 16
minuites of tpcds. This works out nicely because CPU usage during data load is
very low in aggregate. (We don't sustain more than 1 CPU of load, whereas build
machines are likely to have many CPUs.)

To do this, I added support to run-step.sh to have a notion of a backgroundable
task, and support waiting for all tasks.

I also increased the heapsize of our HiveServer2 server. When datasets were
being loaded in parallel, we ran out of memory at 256MB of heap.

The resulting log output is currently like so (but without the timestamps):

15:58:04  Started Loading functional-query data in background; pid 8105.
15:58:04  Started Loading TPC-H data in background; pid 8106.
15:58:04  Loading functional-query data (logging to 
/home/impdev/Impala/logs/data_loading/load-functional-query.log)...
15:58:04  Started Loading TPC-DS data in background; pid 8107.
15:58:04  Loading TPC-H data (logging to 
/home/impdev/Impala/logs/data_loading/load-tpch.log)...
15:58:04  Loading TPC-DS data (logging to 
/home/impdev/Impala/logs/data_loading/load-tpcds.log)...
16:11:31Loading workload 'tpch' using exploration strategy 'core' OK (Took: 
13 min 27 sec)
16:14:33Loading workload 'tpcds' using exploration strategy 'core' OK 
(Took: 16 min 29 sec)
16:35:08Loading workload 'functional-query' using exploration strategy 
'exhaustive' OK (Took: 37 min 4 sec)

Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
---
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
3 files changed, 40 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/8320/1
--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-18 Thread Philip Zeyliger (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8211

to look at the new patch set (#3).

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..

IMPALA-5243: Speed up code gen for wide Avro tables.

HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in
size to the number of columns. On 1000 column tables, codegen time is
significant. This commit roughly halves it for wide columns.
(Note that this had been much worse in recent history (<= Impala 2.9).)

It does so by breaking up MaterializeTuple() into multiple smaller
functions, and then calls them in order. When breaking up into
200-column chunks, there is a noticeable speed-up.

I've made the helper code for generating LLVM function prototypes
have a mutable function name, so that the builder can be re-used
multiple times.

I've checked by inspecting optimized LLVM that in the case where there's
only 1 helper function, code gets inlined so that there doesn't seem to
be an extra function.

I measured codegen time for various "step sizes." The case where there
are no helper functions is about 2.7s. The best case was about a step
size of 200, with timings of 1.3s.

For the query "select count(int_col16) from 
functional_avro.widetable_1000_cols",
codegen times as a function of step size are roughly as follows. This is
averaged across 5 executions, and rounded to 0.1s.

   step time
 10 2.4
 50 2.5
 75 2.9
100 3.0
125 3.0
150 1.4
175 1.3
200 1.3 <-- chosen step size
225 1.5
250 1.4
300 1.6
400 1.6
500 1.8
   1000 2.7

The raw data was generated like so, with some code that let me change the step 
size at runtime:

  $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for 
try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; 
impala-shell.sh -q "select count(int_col16) from 
functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 
'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' |
  sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee 
out.txt
  ...
  200  CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) 
CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K 
OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms
  ...
  1000  CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) 
CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K 
OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms

I have run the core tests with this change.

Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
---
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
3 files changed, 148 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/3
--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-18 Thread Philip Zeyliger (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8211

to look at the new patch set (#2).

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..

IMPALA-5243: Speed up code gen for wide Avro tables.

HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in
size to the number of columns. On 1000 column tables, codegen time is
significant. This commit roughly halves it for wide columns.
(Note that this had been much worse in recent history (<= Impala 2.9).)

It does so by breaking up MaterializeTuple() into multiple smaller
functions, and then calls them in order. When breaking up into
200-column chunks, there is a noticeable speed-up.

I've made the helper code for generating LLVM function prototypes
have a mutable function name, so that the builder can be re-used
multiple times.

I've checked by inspecting optimized LLVM that in the case where there's
only 1 helper function, code gets inlined so that there doesn't seem to
be an extra function.

I measured codegen time for various "step sizes." The case where there
are no helper functions is about 2.7s. The best case was about a step
size of 200, with timings of 1.3s.

For the query "select count(int_col16) from 
functional_avro.widetable_1000_cols",
codegen times as a function of step size are roughly as follows. This is
averaged across 5 executions, and rounded to 0.1s.

   step time
 10 2.4
 50 2.5
 75 2.9
100 3.0
125 3.0
150 1.4
175 1.3
200 1.3 <-- chosen step size
225 1.5
250 1.4
300 1.6
400 1.6
500 1.8
   1000 2.7

The raw data was generated like so, with some code that let me change the step 
size at runtime:

  $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for 
try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; 
impala-shell.sh -q "select count(int_col16) from 
functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 
'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' |
  sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee 
out.txt
  ...
  200  CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) 
CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K 
OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms
  ...
  1000  CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) 
CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K 
OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms

I have run the core tests with this change.

Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
---
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
3 files changed, 148 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/2
--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 7: Code-Review+1

Carry post rebase.


-- 
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:32:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 6: Code-Review+1

(1 comment)

Carrying Dimitris' +1.

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974
PS3, Line 1974: // HMS because some of them may already exist there. In that 
case, we load in the
  :   // catalog the partitions that already exist in HMS but 
aren't in the catalog yet.
  :   if (allHmsPartitionsToAdd.size() != 
addedHmsPartitions.size()) {
  : List difference = 
computeDifference(allHmsPartitionsToAdd,
  : addedHmsPartitions);
  : addedHmsPartitions.addAll(
  : getPartitionsFromHms(msTbl, msClient, tableName, 
difference));
  :   }
  :
  :   for (Partition partition: addedHmsPartitions) {
  : // Create and add the HdfsPartition to catalog. Return 
the table object with an
  :
> Well, you're only calling it for the diff, right? I would expect in most ca
Yep, I agree.



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:32:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@276
PS1, Line 276: 169.254.169.254
> What is this address? Can we use a domain name and https?
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html

This is EC2 magic. If you're not on EC2, this will tell you to set the 
environment variables (if you're testing S3). If you're on EC2, this will 
return something. We don't actually check that those credentials can access the 
relevant bucket, but at least it lets you through.



--
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:06:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@256
PS1, Line 256: if [[ -z ${AWS_ACCESS_KEY_ID-} ]]; then
> We may want to include a check here that "set -x" has not been enabled?
(Bash hackery)

If you want, you can use a subshell:

$(set -x; (set +x; [ $USER == philip ])) && echo yes || echo no
+ set +x
yes
10:46:02 mystery  ~
$(set -x; (set +x; [ $USER == phili ])) && echo yes || echo no
+ set +x
no


Return values survive, and you can just do

if (set +x; [[ -z ${AWS_SECRET_ACCESS_KEY_ID-} ]]); then
  ...
fi

I tested something similar above.



--
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 18 Oct 2017 17:47:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add 'psmisc' to bootstrap system.sh.

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8306 )

Change subject: Add 'psmisc' to bootstrap_system.sh.
..


Patch Set 1:

> Would you like me to submit this for you, Phil?

Yes, please. I don't have permission to do it.

Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/8306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I378b6fbf7a7b3f0b42ecf064e98cfe6e29b3
Gerrit-Change-Number: 8306
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 18 Oct 2017 17:10:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] Add 'psmisc' to bootstrap system.sh.

2017-10-17 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8306


Change subject: Add 'psmisc' to bootstrap_system.sh.
..

Add 'psmisc' to bootstrap_system.sh.

testdata/bin/kill-all.sh uses /usr/bin/killall, which
is provided by the 'psmisc' package on Ubuntu. This
commit adds the package to the list of packages we install.

The Docker base image for Ubuntu 16.04 doesn't contain this package,
which is how this came up.

Change-Id: I378b6fbf7a7b3f0b42ecf064e98cfe6e29b3
---
M bin/bootstrap_system.sh
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/8306/1
--
To view, visit http://gerrit.cloudera.org:8080/8306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I378b6fbf7a7b3f0b42ecf064e98cfe6e29b3
Gerrit-Change-Number: 8306
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] Making bin/bootstrap system.sh executable.

2017-10-16 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8292


Change subject: Making bin/bootstrap_system.sh executable.
..

Making bin/bootstrap_system.sh executable.

Script should be executable, since it's reasonable to
run it standalone.

Change-Id: I15973f68e0d6cba86da58a104a607cac0627c4cb
---
M bin/bootstrap_system.sh
1 file changed, 0 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/8292/1
--
To view, visit http://gerrit.cloudera.org:8080/8292
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I15973f68e0d6cba86da58a104a607cac0627c4cb
Gerrit-Change-Number: 8292
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-16 Thread Philip Zeyliger (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8238

to look at the new patch set (#5).

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..

IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC)
partitions in a single ALTER TABLE command. We batch the operations
against Hive into groups of 500.

I tested this manually, creating 1002 partitions and observing the
expected 3 API calls against the Hive Metastore in the log.  I can
confirm that there is coverage of this in some existing tests.
A new, simple, test has been added that confirms that creating 502
partitions works.

Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/metadata/test_ddl.py
4 files changed, 40 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/5
--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors

2017-10-16 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8233 )

Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker 
errors
..


Patch Set 4: Code-Review+1

(2 comments)

Thanks!

http://gerrit.cloudera.org:8080/#/c/8233/4/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/8233/4/be/src/codegen/llvm-codegen.cc@308
PS4, Line 308: if (!diagnostic_err.empty()) ss <<" "<< diagnostic_err;
nit: Do we do spacing around "<<" operators?


http://gerrit.cloudera.org:8080/#/c/8233/4/be/src/codegen/llvm-codegen.cc@1625
PS4, Line 1625: diagnostic_printer <<"LLVM diagnostic error: ";
nit: space around "<<"?



--
To view, visit http://gerrit.cloudera.org:8080/8233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff
Gerrit-Change-Number: 8233
Gerrit-PatchSet: 4
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Mon, 16 Oct 2017 23:28:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-16 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 4:

(2 comments)

Thanks for the review!

I ran the core tests. Added a python test, as you suggested.

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@a1954
PS3, Line 1954:
> Why did you remove this?
Mistake. Re-instated.

(I missed that MetaStoreClient was autocloseable...)


http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974
PS3, Line 1974: // HMS because some of them may already exist there. In that 
case, we load in the
  :   // catalog the partitions that already exist in HMS but 
aren't in the catalog yet.
  :   if (allHmsPartitionsToAdd.size() != 
addedHmsPartitions.size()) {
  : List difference = 
computeDifference(allHmsPartitionsToAdd,
  : addedHmsPartitions);
  : addedHmsPartitions.addAll(
  : getPartitionsFromHms(msTbl, msClient, tableName, 
difference));
  :   }
  :
  :   for (Partition partition: addedHmsPartitions) {
  : // Create and add the HdfsPartition to catalog. Return 
the table object with an
  :
> I think you can refactor this so that you make only one call to getPartitio
Sure. I tightened the partitioned loop to only include the msClient call.

Note that this implies that getPartitionFromHms() is ok to call with a big (> 
MAX_PARTITION_UPDATES_PER_RPC) batch. Is that against the spirit here?



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 16 Oct 2017 18:43:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-16 Thread Philip Zeyliger (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8238

to look at the new patch set (#4).

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..

IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC)
partitions in a single ALTER TABLE command. We batch the operations
against Hive into groups of 500.

I tested this manually, creating 1002 partitions and observing the
expected 3 API calls against the Hive Metastore in the log.  I can
confirm that there is coverage of this in some existing tests.
A new, simple, test has been added that confirms that creating 502
partitions works.

Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/metadata/test_ddl.py
4 files changed, 39 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/4
--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+

2017-10-13 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8274 )

Change subject: IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+
..


Patch Set 1: Code-Review+1

(3 comments)

This looks fine to me, with minor comment nits.

http://gerrit.cloudera.org:8080/#/c/8274/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8274/1//COMMIT_MSG@13
PS1, Line 13: N.B. this is probably worth fixing upstream.
We are upstream!


http://gerrit.cloudera.org:8080/#/c/8274/1/tests/metadata/test_hdfs_encryption.py
File tests/metadata/test_hdfs_encryption.py:

http://gerrit.cloudera.org:8080/#/c/8274/1/tests/metadata/test_hdfs_encryption.py@191
PS1, Line 191: # exists. This behavior is expected due to the difference in 
encryption zones
"the difference in encryption zones between the .Trash and the warehouse 
directory"

If that's true, could we elaborate on "difference in encryption zones" perhaps 
as suggested here?


http://gerrit.cloudera.org:8080/#/c/8274/1/tests/metadata/test_hdfs_encryption.py@198
PS1, Line 198:   # New HDFS behavior succeeds the query and creates trash; 
the partition removal
s/New HDFS/HDFS 2.8+/?



--
To view, visit http://gerrit.cloudera.org:8080/8274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30234aa50fea93f316e75beea2ced002dcea0c24
Gerrit-Change-Number: 8274
Gerrit-PatchSet: 1
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 13 Oct 2017 22:07:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6045: Make build scripts more friendly to Ubuntu 16.04

2017-10-13 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8262 )

Change subject: IMPALA-6045: Make build scripts more friendly to Ubuntu 16.04
..


Patch Set 3:

(2 comments)

Sorry to get to this a bit late.

I find the distinctions between the (now 3) bootstrap_* scripts to be really 
hard to grok. Since we're multiplying scripts in this commit, it seems like we 
should clarify those distinctions.

I very much like separating bootstrap_system from bootstrap_development, but I 
think it doesn't make sense to separate it into two scripts. Rather, I think 
this would be clearer to use if it looked like:

  bootstrap_development  [install-packages] [checkout-impala] [configure] 
[build] [loadtestdata] [runtests]

(Or something like it; point being to have on entry point with a few variants.)

The default actions would be whatever bootstrap_development does today.

There's a mild prototype at 
https://github.com/philz/incubator-impala/commit/e0e7ad3ce07335709987b1b4088ac66ef312e7af
 if you're interested. This came up for me while trying to time the different 
parts of the bootstrap script for some Docker work I was doing. In it, I just 
broke the existing script into ~5 bash functions.

http://gerrit.cloudera.org:8080/#/c/8262/3/bin/bootstrap_build.sh
File bin/bootstrap_build.sh:

http://gerrit.cloudera.org:8080/#/c/8262/3/bin/bootstrap_build.sh@35
PS3, Line 35: python-dev python-setuptools libffi-dev libkrb5-dev
Consider just adding openjdk-${JDK_VERSION}-jdk (and friends) here? I don't 
know that one apt-get command is faster than two, but it's nice to have them 
all together.


http://gerrit.cloudera.org:8080/#/c/8262/3/bin/bootstrap_system.sh
File bin/bootstrap_system.sh:

http://gerrit.cloudera.org:8080/#/c/8262/3/bin/bootstrap_system.sh@20
PS3, Line 20: # This script bootstraps a system for Impala development from 
almost nothing; it is known
This is very heavily replicated with bootstrap_development; highlighting the 
differences may make sense.



--
To view, visit http://gerrit.cloudera.org:8080/8262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8196a2a87bce5893a349a1b290c3f3d04fd80317
Gerrit-Change-Number: 8262
Gerrit-PatchSet: 3
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 13 Oct 2017 16:54:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options

2017-10-13 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8038 )

Change subject: IMPALA-5736: Add impala-shell argument to set default query 
options
..


Patch Set 7: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986
Gerrit-Change-Number: 8038
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 13 Oct 2017 16:09:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-12 Thread Philip Zeyliger (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8238

to look at the new patch set (#3).

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..

IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC)
partitions in a single ALTER TABLE command. We batch the operations
against Hive into groups of 500.

I tested this manually, creating 1002 partitions and observing the
expected 3 API calls against the Hive Metastore in the log.  I can
confirm that there is coverage of this in some existing tests.
I'm very open to additional suggestions for how to test this.
I can certainly add a query test that reproduces my manual testing,
if reviewers feel like it's the right direction.

Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
3 files changed, 12 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options

2017-10-11 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8038 )

Change subject: IMPALA-5736: Add impala-shell argument to set default query 
options
..


Patch Set 4:

(10 comments)

Thanks! I think this is much clearer than the previous iteration. All of my 
comments are either nits or requests to add a few more comments.

http://gerrit.cloudera.org:8080/#/c/8038/4/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/8038/4/shell/impala_shell.py@686
PS4, Line 686: tmp_set = {}
Add a comment:

# Use a temporary to avoid changing set_query_options during iteration.

(An alternative is to use del, but use .items() instead of .iteritems())


http://gerrit.cloudera.org:8080/#/c/8038/4/shell/impala_shell.py@1330
PS4, Line 1330: if __name__ == "__main__":
Perhaps we need a big comment:

"""
There are two types of options: shell options and query_options. You can set 
these in the shell itself, which overrides settings on the command line, which 
override the defaults in impala_shell_config_defaults.py. Query options have no 
defaults within the impala-shell, but they do have defaults on the server.
"""


http://gerrit.cloudera.org:8080/#/c/8038/4/shell/impala_shell.py@1350
PS4, Line 1350:   # default options loaded in from 
impala_shell_config_defaults.py
Let's take the time to update this comment by disambiguating "shell options" vs 
"query options" here?


http://gerrit.cloudera.org:8080/#/c/8038/4/shell/impala_shell.py@1354
PS4, Line 1354: impala_shell_defaults.update(loaded_shell_options)
I think "impala_query_options_default" is empty, but this assymetry between 
impala_shell_defaults and query_options is weird. It's weird that you're 
updating impala_shell_defaults, rather than updating "shell_options".


http://gerrit.cloudera.org:8080/#/c/8038/4/shell/impala_shell.py@1437
PS4, Line 1437:   query_options.update(
Add comment:

Override query_options with those specified on the command line.


http://gerrit.cloudera.org:8080/#/c/8038/4/shell/option_parser.py
File shell/option_parser.py:

http://gerrit.cloudera.org:8080/#/c/8038/4/shell/option_parser.py@36
PS4, Line 36: def convert_loaded_shell_option(option, value, default_options, 
config_filename):
Please document config_filename. I'm unclear how it's used.


http://gerrit.cloudera.org:8080/#/c/8038/4/shell/option_parser.py@47
PS4, Line 47: # if the option is not set to either true or false, use 
the default
Do we need to log about the ignored value here?


http://gerrit.cloudera.org:8080/#/c/8038/4/shell/option_parser.py@36
PS4, Line 36: def convert_loaded_shell_option(option, value, default_options, 
config_filename):
: """Converts some values based on their type in default_options
: """
: if default_options[option] in [True, False]:
:   # validate the option if it can only be a boolean value
:   # the only choice for these options is true or false
:   if value.lower() == "true":
: return True
:   elif value.lower() == 'false':
: return False
:   else:
: # if the option is not set to either true or false, use 
the default
: return default_options[option]
: elif value.lower() == "none":
:   return None
: elif option.lower() == "config_file":
:   return config_filename
This function is mixing 2-space indent and 4-space indent. Please clean up.


http://gerrit.cloudera.org:8080/#/c/8038/4/shell/option_parser.py@164
PS4, Line 164:   "Only specify this as a option in the 
commandline."
s/as a option/as an option/


http://gerrit.cloudera.org:8080/#/c/8038/4/shell/option_parser.py@183
PS4, Line 183: help="Sets default query options."
Add: "May be specified multiple times." Unless it's clear from the usage?



--
To view, visit http://gerrit.cloudera.org:8080/8038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986
Gerrit-Change-Number: 8038
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 11 Oct 2017 18:42:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors

2017-10-11 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8233 )

Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker 
errors
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8233/3/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/8233/3/be/src/codegen/llvm-codegen.cc@308
PS3, Line 308:   }
nit: This might look like "... to main module.Bla bla." I.e., no space after 
the period.

You might want to add '<< " "' in here to make it look nicer.


http://gerrit.cloudera.org:8080/#/c/8233/3/be/src/codegen/llvm-codegen.cc@1635
PS3, Line 1635:
I might be confused about the C++11 stuff, but given that you just std::move'd 
it, I don't think you need to clear it.


http://gerrit.cloudera.org:8080/#/c/8233/1/tests/query_test/test_codegen.py
File tests/query_test/test_codegen.py:

http://gerrit.cloudera.org:8080/#/c/8233/1/tests/query_test/test_codegen.py@45
PS1, Line 45:   def test_codegen_diagnostic_handler(self, vector):
> I tried moving this to llvm-codegen-test.cc but it turns out  that a lot ug
I don't know LLVM well enough to suggest how to get at these errors.

That said, LoadModuleFromFile() takes a regular file, I think, and not one on 
HDFS.  You might be able to just cp ./llvm-ir/test-loop.bc to a temporary file 
to induce this. (Or create a second test-loop that has a link-time conflict: 
presumably name collisions cause errors?)



--
To view, visit http://gerrit.cloudera.org:8080/8233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff
Gerrit-Change-Number: 8233
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 11 Oct 2017 17:43:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6027: Retry downloading toolchain components.

2017-10-11 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8258 )

Change subject: IMPALA-6027: Retry downloading toolchain components.
..

IMPALA-6027: Retry downloading toolchain components.

We've seen intermittent 500 errors when downloading the toolchain from
S3 over the HTTPS URLs. As a first stab, this commit retries 3 times,
with some jitter.

I also changed the threadpool introduced previously to have a limit
of 4 threads, because that's sufficient to get the speed improvement.
The 500 errors have been observed both before and after the threadpool
change.

For testing, I ran the straight-forward case directly. I introduced
a broken version string to observe that retries would happen on
any error from wget.

Change-Id: I7669c7d41240aa0eb43c30d5bf2bd5c01b66180b
---
M bin/bootstrap_toolchain.py
1 file changed, 15 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/8258/2
--
To view, visit http://gerrit.cloudera.org:8080/8258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7669c7d41240aa0eb43c30d5bf2bd5c01b66180b
Gerrit-Change-Number: 8258
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6027: Retry downloading toolchain components.

2017-10-11 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8258


Change subject: IMPALA-6027: Retry downloading toolchain components.
..

IMPALA-6027: Retry downloading toolchain components.

We've seen intermittent 500 errors when downloading the toolchain from
S3 over the HTTPS URLs. As a first stab, this commit retries 3 times,
with some jitter.

I also changed the threadpool introduced previously to have a limit
of 4 threads, because that's sufficient to get the speed improvement.
The 500 errors have been observed both before and after the threadpool
change.

For testing, I ran the straight-forward case directly. I introduced
a broken version string to observe that retries would happen on
any error from wget.

Change-Id: I7669c7d41240aa0eb43c30d5bf2bd5c01b66180b
---
M bin/bootstrap_toolchain.py
1 file changed, 15 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/8258/1
--
To view, visit http://gerrit.cloudera.org:8080/8258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7669c7d41240aa0eb43c30d5bf2bd5c01b66180b
Gerrit-Change-Number: 8258
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] Download toolchain in parallel.

2017-10-09 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8237


Change subject: Download toolchain in parallel.
..

Download toolchain in parallel.

By downloading from the toolchain S3 buckets in parallel with
extracting them, this improves bootstrap_toolchain on my machine
from about 1m5s to about 30s.

  $rm -rf toolchain; time bin/bootstrap_toolchain.py > /dev/null

  real0m29.226s
  user0m46.516s
  sys 0m33.820s

On a large EC2 machine, closer to the S3 buckets, the new time is 21s.

Because multiprocessing hasn't always been available (python2.4 on RHEL5
won't have it), I fall back to a simpler implementation

Change-Id: I46a6088bb002402c7653dbc8257dff869afb26ec
---
M bin/bootstrap_toolchain.py
1 file changed, 30 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/8237/1
--
To view, visit http://gerrit.cloudera.org:8080/8237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I46a6088bb002402c7653dbc8257dff869afb26ec
Gerrit-Change-Number: 8237
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.

2017-10-06 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8100 )

Change subject: IMPALA-5940: Avoid log spew by using Status::Expected.
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8100/3/be/src/runtime/client-cache.h
File be/src/runtime/client-cache.h:

http://gerrit.cloudera.org:8080/#/c/8100/3/be/src/runtime/client-cache.h@246
PS3, Line 246:
> Maybe we should include the RPC name in the log message and that will disam
I followed Michael's foot steps and looked at the ~17 callers of DoRpc. Far 
from all of them log.

I'm now proposing we do this patch without the DoRpc() changes. From a 
performance standpoint, if you're timing out on RPCs, the backtrace is the 
least of your problems, and, yet, I think knowing which RPC is timing out is 
very helpful. I wasn't able to see a quick way to get the name of the RPC 
without discouraged C++ shenanigans. I checked that the Thrift Exception 
doesn't have that, nor do the Thrift structs seem to have self-identifying 
information.



--
To view, visit http://gerrit.cloudera.org:8080/8100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330
Gerrit-Change-Number: 8100
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 06 Oct 2017 23:28:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.

2017-10-06 Thread Philip Zeyliger (Code Review)
Hello Michael Ho, Sailesh Mukil, Alex Behm, Mostafa Mokhtar, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8100

to look at the new patch set (#6).

Change subject: IMPALA-5940: Avoid log spew by using Status::Expected.
..

IMPALA-5940: Avoid log spew by using Status::Expected.

In IMPALA-5926, we fixed a case where closing the session triggered a
stack trace in the logs which impacted performance for short-running
queries. Looking at log files from several active clusters, I identified a few
other cases where we could clean up log spew with the same (trivial)
approach.

The following messages will no longer log:

  "Failed to codegen MaterializeTuple() due to unsupported type: $0"
  "Not implemented for this format."
  "ScalarFnCall Codegen not supported for CHAR"
  "Could not codegen CodegenMaterializeExprs: "
  "Could not codegen TupleRowComparator::Compare(): $0"

These codegen related messages were happening at every execution node, and were
therefore very common.

In addition, the following messages, which were very frequent in the logs, now
will log, but without the back trace:

  "Query Id ... not found."

To do this, I changed them from logging implicitly via Status(...)
to logging explicitly.

I have not fixed this message:
  "... Client ... timed-out during recv call."
It's in DoRpc(), and not all callers of DoRpc() have independent
logging. For now, we'll trade a little bit of log spam for clearer
debugging.

The snippet I used to identify these was:

  find . -type f -name '*IMPALAD*.gz' | xargs gzcat  | awk '/^I/ { if(x) { 
print x; } x = "" } /status.cc/ { x=" "; } { if(x) { x=x  $0 } }'  | sed -e 
's/0x[0-9a-fx]* //g' | sed -e 's/[0-9a-f]\{16\}:[0-9a-f]*/QUERYID/g' |  tr -s 
'\t' ' ' | tr '[0-9]' 'N' | sort | uniq -c  | sort -n | tee output.txt

I also analyzed some logs using SQL, against a pre-processed logs table:

  with v as (
select regexp_replace(
regexp_replace(
  translate(substr(message, 42), "\n\t", "  "),
  "[a-zA-Z0-9.-]*[.][a-zA-Z0-9-]*:[0-9]*",
  ""),
"@.*$", "@@@...") as m
from logs_table where `class`="status.cc")
  select m, count(*) from v group by 1 order by 2 desc limit 100

Testing:
* Automated tests.
* Manual testing for one of the new back-trace-suppressed paths:

  $ impala-python shell/gen-py/ImpalaService/ImpalaService-remote -h 
localhost:21000 GetRuntimeProfile 'beeswaxd.ttypes.QueryHandle()'
  Traceback (most recent call last):
File "shell/gen-py/ImpalaService/ImpalaService-remote", line 106, in 

  pp.pprint(client.GetRuntimeProfile(eval(args[0]),))
File "/home/philip/src/impala/shell/gen-py/ImpalaService/ImpalaService.py", 
line 161, in GetRuntimeProfile
  return self.recv_GetRuntimeProfile()
File "/home/philip/src/impala/shell/gen-py/ImpalaService/ImpalaService.py", 
line 184, in recv_GetRuntimeProfile
  raise result.error
  beeswaxd.ttypes.BeeswaxException: 
BeeswaxException(handle=QueryHandle(log_context='', id=''), log_context='', 
SQLState='HY000', _message='GetRuntimeProfile error: Query id 0:0 not 
found.\n', errorCode=0)

  $ grep 'Query id' logs/cluster/impalad.INFO | tail -n 1
  I0926 20:29:09.44  6787 impala-server.cc:642] Query id 0:0 not found.

Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/runtime/tuple.cc
M be/src/service/impala-server.cc
M be/src/util/tuple-row-compare.cc
6 files changed, 16 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8100/6
--
To view, visit http://gerrit.cloudera.org:8080/8100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330
Gerrit-Change-Number: 8100
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors

2017-10-06 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8233 )

Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker 
errors
..


Patch Set 1:

(5 comments)

I'm excited by anything that makes LLVM debugging easier!

http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.h@783
PS1, Line 783:   class DiagnosticState {
Based on http://llvm.org/doxygen/DiagnosticHandler_8h_source.html, it looks 
like they recommend us subclassing their class to implement this, and that 
there are different "remarks" that we can get.

It's tricky, but if they've got different warnings, I sure as heck would want 
to see them during development.


http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc@186
PS1, Line 186:   >diagnostic_state_, true);
Could you comment on what this third argument does?

The llvm header says "expects enabled diagnostics", which I don't understand.


http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc@304
PS1, Line 304:   if (error || diagnostic_state_.EncounteredError()) {
So do the diagnostic messages get printed anywhere?


http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc@1622
PS1, Line 1622: info.print(diagnostic_printer);
Does this make it to our logs?


http://gerrit.cloudera.org:8080/#/c/8233/1/tests/query_test/test_codegen.py
File tests/query_test/test_codegen.py:

http://gerrit.cloudera.org:8080/#/c/8233/1/tests/query_test/test_codegen.py@45
PS1, Line 45:   def test_codegen_diagnostic_handler(self, vector):
You could probably test this in be/src/codegen/llvm-codegen-test.cc with 
considerably less fanfare. I think the end-to-end test that we don't fail on a 
custom UDF is lovely, but a more targeted test makes sense too.



--
To view, visit http://gerrit.cloudera.org:8080/8233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff
Gerrit-Change-Number: 8233
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 06 Oct 2017 23:22:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6021: Revert "IMPALA-6009: Upgrade Guava to 14.0.1"

2017-10-06 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8225 )

Change subject: IMPALA-6021: Revert "IMPALA-6009: Upgrade Guava to 14.0.1"
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8225
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd07d9b011ebd407d7a274725a10b6371d024186
Gerrit-Change-Number: 8225
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 06 Oct 2017 16:00:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

2017-10-05 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8216/2/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
File fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:

http://gerrit.cloudera.org:8080/#/c/8216/2/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@318
PS2, Line 318:   private final char[] hexDigits_ = 
"0123456789abcdef".toCharArray();
You can make this static and call it HEX_DIGITS to mark it as a constant?


http://gerrit.cloudera.org:8080/#/c/8216/2/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@593
PS2, Line 593:   private String getQueryHash(String queryStr) {
It's a bit pedantic, but it'd be nice to write a test that executes this code. 
Ideally, we'd also confirm that some query hashes to the same result as it used 
to with the Guava implementation.


http://gerrit.cloudera.org:8080/#/c/8216/2/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@599
PS2, Line 599:   throw new IllegalStateException("Query hash failed");
typically this would be "throw new IllegalStateException("Query hash failed", 
e)"

I think that chains the exceptions, which is handy since we'll want to know 
what e.getMessage() says.

(This will never actually happen, but it's more idiomatic I think.)



--
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04bb436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 05 Oct 2017 17:48:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4704: Disallow client connections to imapalad until catalog is received.

2017-10-05 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Disallow client connections to imapalad until 
catalog is received.
..


Patch Set 3:

(2 comments)

Thanks for the updates!

http://gerrit.cloudera.org:8080/#/c/8202/3/be/src/testutil/in-process-servers.cc
File be/src/testutil/in-process-servers.cc:

http://gerrit.cloudera.org:8080/#/c/8202/3/be/src/testutil/in-process-servers.cc@75
PS3, Line 75: if (started.ok()) return impala;
You're no longer logging status.getDetail() any more. I suspect that's 
intentional because it was logged when it was create?


http://gerrit.cloudera.org:8080/#/c/8202/3/tests/custom_cluster/test_catalog_wait.py
File tests/custom_cluster/test_catalog_wait.py:

http://gerrit.cloudera.org:8080/#/c/8202/3/tests/custom_cluster/test_catalog_wait.py@26
PS3, Line 26:
nit: whitespace?



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 05 Oct 2017 17:24:38 +
Gerrit-HasComments: Yes


  1   2   >