[Impala-ASF-CR] IMPALA-5174: Bump gflags to 2.2.0-p1

2017-05-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5174: Bump gflags to 2.2.0-p1
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/588/

-- 
To view, visit http://gerrit.cloudera.org:8080/6889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc09a750879a8eae8b3549b9438241cb7c4448ed
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5246: Verify that a UDF test completely finishes before moving on

2017-05-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5246: Verify that a UDF test completely finishes before 
moving on
..


IMPALA-5246: Verify that a UDF test completely finishes before moving on

A memory intensive UDF test takes a while to completely finish and for
the memory in Impala to be completely freed. This caused a problem in
ASAN builds (and potentially in normal builds) because we would start
the next test right away, before the memory is freed.

We fix the issue by checking that all fragments finish executing before
starting the next test.

Testing:
- Ran a private ASAN build which passed.

Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe
Reviewed-on: http://gerrit.cloudera.org:8080/6893
Reviewed-by: Alex Behm 
Reviewed-by: Michael Ho 
Tested-by: Impala Public Jenkins
---
M tests/query_test/test_udfs.py
1 file changed, 9 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Ho: Looks good to me, approved
  Alex Behm: Looks good to me, but someone else must approve



-- 
To view, visit http://gerrit.cloudera.org:8080/6893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5246: Verify that a UDF test completely finishes before moving on

2017-05-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5246: Verify that a UDF test completely finishes before 
moving on
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..

IMPALA-5259: Add REFRESH FUNCTIONS  statement

Before this patch, Impala relied on INVALIDATE METADATA to load
externally added UDFs from HMS. The problem with this approach is that
INVALIDATE METADATA affects all databases and tables in the entire
cluster.

In this patch, we add a REFRESH FUNCTIONS  statement that reloads
the functions of a database from HMS. We return a list of updated and
removed db functions to the issuing Impalad in order to update its
local catalog cache.

Testing:
- Ran a private build which passed.

Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_permanent_udfs.py
10 files changed, 290 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..

IMPALA-5259: Add REFRESH FUNCTIONS  statement

Before this patch, Impala relied on INVALIDATE METADATA to load
externally added UDFs from HMS. The problem with this approach is that
INVALIDATE METADATA affects all databases and tables in the entire
cluster.

In this patch, we add a REFRESH FUNCTIONS  statement that reloads
the functions of a database from HMS. We return a list of updated and
removed db functions to the issuing Impalad in order to update its
local catalog cache.

Testing:
- Ran a private build which passed.

Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_permanent_udfs.py
10 files changed, 290 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..


Patch Set 1:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6878/1/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java:

Line 30:  * Representation of a REFRESH/INVALIDATE METADATA statement.
> Can you spell out the different variants to make this clearer?
Done


Line 45:   public ResetMetadataStmt(TableName name, boolean isRefresh,
> This c'tor is starting to look scary. Please make it private and add static
Done


http://gerrit.cloudera.org:8080/#/c/6878/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 616:   List javaFns =
> Let's move as much of the heavy work as possible outside of the global lock
Done.


Line 624:   Db db = dbCache_.get().get(dbName);
> need to handle db == null
Done


Line 627:   Db tmpDb = new Db(dbName, this, msDb);
> I understand this is convenient, but it would be preferable to not RPC to H
The msDb is not actually needed, it can be null, I don't think it's used 
anywhere.


Line 636:   for (String fn_name: tmpDb.getAllFunctions().keySet()) {
> easier to read if you iterate over the entrySet()
Done


Line 638:   boolean result = db.addFunction(fn);
> We need to handle the !result case. We could have a totally different funct
It will only be false if an identical function exists (we don't compare just 
the name, we compare with IS_INDISTINGUISHABLE). So I don't think anything 
needs to be changed.


Line 648:   for (String fn_name: db.getAllFunctions().keySet()) {
> easier to read if you iterate over the entrySet()
Done


Line 653: toRemove.add(fn);
> why not remove immediately here?
We can't modify the list as we are iterating over it. There will be some 
exception (something like concurrent modification).


http://gerrit.cloudera.org:8080/#/c/6878/1/fe/src/main/java/org/apache/impala/catalog/Function.java
File fe/src/main/java/org/apache/impala/catalog/Function.java:

Line 318: result.setCatalog_version(getCatalogVersion());
> using catalogVersion_ is a little clearer to me
Done


http://gerrit.cloudera.org:8080/#/c/6878/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 225: if (req.is_delta) {
> single line and remove the else block
Done


http://gerrit.cloudera.org:8080/#/c/6878/1/tests/custom_cluster/test_permanent_udfs.py
File tests/custom_cluster/test_permanent_udfs.py:

Line 220: REFRESH_COMMANDS = ["INVALIDATE METADATA",
> Let's think a little about how to test these scenarios:
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..

IMPALA-5259: Add REFRESH FUNCTIONS  statement

Before this patch, Impala relied on INVALIDATE METADATA to load
externally added UDFs from HMS. The problem with this approach is that
INVALIDATE METADATA affects all databases and tables in the entire
cluster.

In this patch, we add a REFRESH FUNCTIONS  statement that reloads
the functions of a database from HMS. We return a list of updated and
removed db functions to the issuing Impalad in order to update its
local catalog cache.

Testing:
- Ran a private build which passed.

Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_permanent_udfs.py
10 files changed, 288 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..

IMPALA-5259: Add REFRESH FUNCTIONS  statement

Before this patch, Impala relied on INVALIDATE METADATA to load
externally added UDFs from HMS. The problem with this approach is that
INVALIDATE METADATA affects all databases and tables in the entire
cluster.

In this patch, we add a REFRESH FUNCTIONS  statement that reloads
the functions of a database from HMS. We return a list of updated and
removed db functions to the issuing Impalad in order to update its
local catalog cache.

Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_permanent_udfs.py
9 files changed, 282 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-5319: Fix test hdfs scan node errors failures

2017-05-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5319: Fix test_hdfs_scan_node_errors failures
..


Patch Set 1: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/586/

-- 
To view, visit http://gerrit.cloudera.org:8080/6891
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I533f1921662802ea6e076eefac973f50c014fcb5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5246: Verify that a UDF test completely finishes before moving on

2017-05-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5246: Verify that a UDF test completely finishes before 
moving on
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/587/

-- 
To view, visit http://gerrit.cloudera.org:8080/6893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5301: Set Kudu minicluster memory limit

2017-05-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5301: Set Kudu minicluster memory limit
..


Patch Set 4:

http://sandbox.jenkins.cloudera.com/job/impala-umbrella-build-and-test/9740/

-- 
To view, visit http://gerrit.cloudera.org:8080/6844
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5301: Set Kudu minicluster memory limit

2017-05-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5301: Set Kudu minicluster memory limit
..


Patch Set 4:

passed a gvo dry run

running a private exhaustive job now

-- 
To view, visit http://gerrit.cloudera.org:8080/6844
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5301: Set Kudu minicluster memory limit

2017-05-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#4).

Change subject: IMPALA-5301: Set Kudu minicluster memory limit
..

IMPALA-5301: Set Kudu minicluster memory limit

By default, Kudu assumes it has 80% of system memory which
is far too high for the minicluster. This sets a mem limit
of 2gb and lowers the limit of the block cache. These values
were tested on a gerrit-verify-dryrun job as well as an
exhaustive run.

This patch also simplifies TestKuduMemLimits which was
unnecessarily creating a large table during test execution.

Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5
---
M testdata/cluster/node_templates/cdh5/etc/init.d/kudu-common
M tests/query_test/test_kudu.py
2 files changed, 5 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6844/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6844
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5246: Verify that a UDF test completely finishes before moving on

2017-05-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5246: Verify that a UDF test completely finishes before 
moving on
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention

2017-05-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6707/6//COMMIT_MSG
Commit Message:

PS6, Line 16: While the actual fix for this is to not hold the QES::lock_ during
: metadata load,
I disagree. what you are doing is the actual fix for the lock acquisition 
order.  Not holding QES::lock_ while planning is a partial fix for a different 
issue (not being able to cancel while planning).


PS6, Line 20: We make sure that QES::lock_ doesn't block 
query_exec_state_map_lock_
:   in hot paths.
After this change, isn't it the case that we'll never try to acquire another 
lock while holding the map lock? If so, this should say make 
query_exec_state_map_lock_ a terminal lock.


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention

2017-05-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6707/6/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 192:   DEFINE_int64(metadata_loading_pause_injection_ms, 0, "Simulates 
metadata loading for a "
how about prefixing the flag with "stress_", like we do for the one in 
free-pool.h, to give more of a clue that this is just for stressing the system.


PS6, Line 822: // Note: this acquires the exec_state lock *before* the
 : // query_exec_state_map_ lock. This is the opposite of
 : // GetQueryExecState(..., true), and therefore looks like a
 : // candidate for deadlock. The reason this works here is that
 : // GetQueryExecState cannot find exec_state (under the exec 
state
 : // map lock) and take it's lock until RegisterQuery has
 : // finished. By that point, the exec state map lock will 
have been
 : // given up, so the classic deadlock interleaving is not 
possible.
this comment is outdated now.


http://gerrit.cloudera.org:8080/#/c/6707/6/be/src/service/impala-server.h
File be/src/service/impala-server.h:

PS6, Line 85: /// 4. query_exec_state_map_lock_
i think this should be updated. with this change, do we ever hold this lock in 
conjunction with another lock?


http://gerrit.cloudera.org:8080/#/c/6707/6/tests/custom_cluster/test_query_concurrency.py
File tests/custom_cluster/test_query_concurrency.py:

PS6, Line 55: GetRuntimeProfileStr
This doesn't make sense. GetRuntimeProfileStr() doesn't hold the map lock while 
getting the QES::lock_.  I think what's happening is that the call on line 76 
to get_in_flight_queries() is hitting the QuerySummaryHandler() handler (please 
double check that), which then causes the lock cycle. 

Without the fix, does the test ever get past line 76? I would expect the 
webserver to block on the QES::lock_ at that point.


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention

2017-05-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention
..


Patch Set 6:

> I haven't rebased this yet on to new QES changes so that you can
 > diff it from previous versions. Please let me know if you need a
 > rebased patch.

Please go ahead and rebase.

-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5174: Bump gflags to 2.2.0-p1

2017-05-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5174: Bump gflags to 2.2.0-p1
..


Patch Set 1: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/585/

-- 
To view, visit http://gerrit.cloudera.org:8080/6889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc09a750879a8eae8b3549b9438241cb7c4448ed
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5319: Fix test hdfs scan node errors failures

2017-05-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5319: Fix test_hdfs_scan_node_errors failures
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/586/

-- 
To view, visit http://gerrit.cloudera.org:8080/6891
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I533f1921662802ea6e076eefac973f50c014fcb5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[native-toolchain-CR] Remove redundant libevent flag when building Thrift

2017-05-16 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6904

Change subject: Remove redundant libevent flag when building Thrift
..

Remove redundant libevent flag when building Thrift

--with-libevent was specified twice, and overridden by the second
  option. Libevent is not required for Thrift if the non-blocking server
  is not used.

Change-Id: Ibf6ef921f4389bd875578cd681f008b86f633af8
---
M source/thrift/build.sh
1 file changed, 0 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/04/6904/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6904
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibf6ef921f4389bd875578cd681f008b86f633af8
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-5286: Query fails due to Kudu column name case

2017-05-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5286: Query fails due to Kudu column name case
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6902/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 946: String key = slotPath.toString().toLowerCase();
This works just fine for other tables. The slotPath and names in Columns are 
generally assumed to be in lower case already.

I suspect the real bug might be in the path that creates Impala columns from 
Kudu columns. If that is correct, then there could still be other issues 
lurking due to casing.


-- 
To view, visit http://gerrit.cloudera.org:8080/6902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I14aba88510012174716691b9946e1c7d54d01b44
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5246: Verify that a UDF test completely finishes before moving on

2017-05-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5246: Verify that a UDF test completely finishes before 
moving on
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5286: Query fails due to Kudu column name case

2017-05-16 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6902

Change subject: IMPALA-5286: Query fails due to Kudu column name case
..

IMPALA-5286: Query fails due to Kudu column name case

Impala is case insensitive for column names and generally deals
with them in all lower case. Kudu is case sensitive. This can
lead to a problem when a table is created externally in Kudu
with a column name with upper case letters.

The particular bug here is that the Analyzer can end up creating
two SlotDescriptors that point to the same column because (despite
comment in Analyzer.java to the contrary) it does a case sensitive
comparison of slot paths to determine if a SlotDescriptor for the
path already exists.

The fix is to convert slot paths to lower case before doing the
comparison.

Testing:
- Added an e2e test in test_kudu.py.

Change-Id: I14aba88510012174716691b9946e1c7d54d01b44
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M tests/query_test/test_kudu.py
2 files changed, 27 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6902/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I14aba88510012174716691b9946e1c7d54d01b44
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5304: reduce transfer of Parquet decompression buffers

2017-05-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5304: reduce transfer of Parquet decompression buffers
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2dbd749f43078b222ff8e1ddcec840986c466de6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Add a script to test performance on a developer machine

2017-05-16 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add a script to test performance on a developer machine
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6818/2//COMMIT_MSG
Commit Message:

PS2, Line 20: buildall.sh
> Did you mean to add buildall.sh to this review?
I only mean that the script single_node_perf_run.py calls buildall.sh and the 
others. It is the mortar. buildall.sh was one of the ricks that didn't need to 
change.


-- 
To view, visit http://gerrit.cloudera.org:8080/6818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add a script to test performance on a developer machine

2017-05-16 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: Add a script to test performance on a developer machine
..


Patch Set 2:

(1 comment)

Jim, this is mainly just to confirm that this review request has been noted. 
Will try to find time in the next couple of days to comment.

http://gerrit.cloudera.org:8080/#/c/6818/2//COMMIT_MSG
Commit Message:

PS2, Line 20: buildall.sh
Did you mean to add buildall.sh to this review?


-- 
To view, visit http://gerrit.cloudera.org:8080/6818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5304: reduce transfer of Parquet decompression buffers

2017-05-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5304: reduce transfer of Parquet decompression buffers
..


Patch Set 4: Code-Review+1

Changed lgtm. I do wonder whether our tests are still covering both paths 
though.

-- 
To view, visit http://gerrit.cloudera.org:8080/6876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2dbd749f43078b222ff8e1ddcec840986c466de6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Loads all TPC-DS tables

2017-05-16 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Loads all TPC-DS tables
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6877/2/testdata/datasets/tpcds/tpcds_schema_template.sql
File testdata/datasets/tpcds/tpcds_schema_template.sql:

Line 659: WHERE ss_sold_date_sk IS NULL;
Replace the multiple insert statements with 
insert overwrite table   store_sales partition(ss_sold_date_sk)  /*+ 
clustered,shuffle */ select 


-- 
To view, visit http://gerrit.cloudera.org:8080/6877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5304: reduce transfer of Parquet decompression buffers

2017-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5304: reduce transfer of Parquet decompression buffers
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6876/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 1037: decompressed_data_pool_->FreeAll();
> Makes sense. We could selectively Clear() and FreeAll() depending on the un
I'd prefer to defer it - I think a policy like that would be better implemented 
within Clear() since it's generally applicable.


http://gerrit.cloudera.org:8080/#/c/6876/2/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

Line 476: && slot_desc_ != nullptr && slot_desc_->type().IsVarLenType();
> I think checking IsStringType() is more accurate because IsVarLenType() inc
You're right, CHAR columns could reference the pages directly in some cases.


-- 
To view, visit http://gerrit.cloudera.org:8080/6876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2dbd749f43078b222ff8e1ddcec840986c466de6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5174: Bump gflags to 2.2.0-p1

2017-05-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5174: Bump gflags to 2.2.0-p1
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/585/

-- 
To view, visit http://gerrit.cloudera.org:8080/6889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc09a750879a8eae8b3549b9438241cb7c4448ed
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5174: Bump gflags to 2.2.0-p1

2017-05-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5174: Bump gflags to 2.2.0-p1
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc09a750879a8eae8b3549b9438241cb7c4448ed
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5304: reduce transfer of Parquet decompression buffers

2017-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-5304: reduce transfer of Parquet decompression buffers
..

IMPALA-5304: reduce transfer of Parquet decompression buffers

The buffers contain the Parquet DataPages, which need to be
attached to the row batch if the rows point to var-len data
stored directly in the page. Otherwise the buffers can be
discarded once the values in the page have been materialized.

This reduces the amount of memory transferred between threads, which is
a known TCMalloc anti-pattern. It also allows us to free memory
earlier, which may help reduce memory consumption slightly.

Also fix a latent bug I noticed where needs_conversion_ is not
always initialised in the constructor.

Testing
Ran exhaustive build. Most of the Parquet tests use compressed Parquet,
which should exercise this code path.

Change-Id: I2dbd749f43078b222ff8e1ddcec840986c466de6
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
2 files changed, 18 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/6876/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2dbd749f43078b222ff8e1ddcec840986c466de6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"

2017-05-16 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
..


Patch Set 2:

> I think the more information the better. If we can determine that
 > one of those messages is always superseded by the 
 > get-last-exception-root-cause
 > msg then we can elide it, but for now it seems ok to me to have
 > more verbose msgs that could help debugging.

Ok, makes sense. I've added it for all types of errors now. Figuring out if 
both the messages are redundant is not trivial, so I'm not solving that for now.

-- 
To view, visit http://gerrit.cloudera.org:8080/6894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"

2017-05-16 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new patch set (#2).

Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
..

IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"

We use the new libHDFS API hdfsGetLastExceptionRootCause() to return
the last seen HDFS error on that thread.

This patch depends on the recent HDFS commit:
https://github.com/apache/hadoop/commit/fda86ef2a32026c02d9b5d4cca1ecb7b4decd872

Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d
---
M be/src/util/hdfs-util.cc
1 file changed, 1 insertion(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/6894/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5297: Set Kudu minicluster memory limit

2017-05-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#3).

Change subject: IMPALA-5297: Set Kudu minicluster memory limit
..

IMPALA-5297: Set Kudu minicluster memory limit

By default, Kudu assumes it has 80% of system memory which
is far too high for the minicluster. This sets a mem limit
of 1gb and lowers the limit of the block cache.

Also moves TestKuduMemLimits to exhaustive since those tests
put more memory pressure on Kudu than the gvo VMs can
reliably handle, and the test was very slow anyway.

Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5
---
M testdata/cluster/node_templates/cdh5/etc/init.d/kudu-common
M tests/query_test/test_kudu.py
2 files changed, 11 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6844/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6844
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"

2017-05-16 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
..


Patch Set 1:

I think the more information the better. If we can determine that one of those 
messages is always superseded by the get-last-exception-root-cause msg then we 
can elide it, but for now it seems ok to me to have more verbose msgs that 
could help debugging.

-- 
To view, visit http://gerrit.cloudera.org:8080/6894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"

2017-05-16 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
..


Patch Set 1:

> If hdfsGetLastExceptionRootCause() always contains relevant
 > information, I think we should include it in all cases. What's to
 > be gained by leaving it out?

In many cases, we avoid redundant error information by leaving it out. For 
example, a file not found error with this change for all types of errors:

[localhost:21000] > select * from s3_test;
Query: select * from s3_test
Query submitted at: 2017-05-16 11:42:13 (Coordinator: http://localhost:25000)
Query progress can be monitored at: 
http://localhost:25000/query_plan?query_id=7b4fea9564c4c1d5:fd45dd87
ERROR: Failed to open HDFS file s3a://sailesh-impala-s3/s3_test_parts/asbasda
Error(2): No such file or directory
Root cause: FileNotFoundException: No such file or directory: 
s3a://sailesh-impala-s3/s3_test_parts/asbasda

-- 
To view, visit http://gerrit.cloudera.org:8080/6894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5187, IMPALA-5208: Bump Breakpad Version, undo IMPALA-3794

2017-05-16 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5187, IMPALA-5208: Bump Breakpad Version, undo 
IMPALA-3794
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic541ccd565f2bb51f68c085747fc47ae8c905d19
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"

2017-05-16 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
..


Patch Set 1:

If hdfsGetLastExceptionRootCause() always contains relevant information, I 
think we should include it in all cases. What's to be gained by leaving it out?

-- 
To view, visit http://gerrit.cloudera.org:8080/6894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

2017-05-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp 
data in Parquet"
..


Patch Set 1:

Was it a clean revert? If not, where were the conflicts?

-- 
To view, visit http://gerrit.cloudera.org:8080/6896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5166: clean up BufferPool counters

2017-05-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5166: clean up BufferPool counters
..


Patch Set 6: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6690/6/be/src/runtime/bufferpool/reservation-tracker.cc
File be/src/runtime/bufferpool/reservation-tracker.cc:

PS6, Line 104: "InitialReservation"
does that counter still exist? or is this meant to check one of the counters 
being added below?


http://gerrit.cloudera.org:8080/#/c/6690/6/be/src/runtime/mem-tracker.cc
File be/src/runtime/mem-tracker.cc:

PS6, Line 228: Reservation=5.00 MB
is it not useful to indicate how much of the reservation is actually in use as 
well? don't need to add it to this change but wondering if that will help with 
tuning reservations in extreme cases.


-- 
To view, visit http://gerrit.cloudera.org:8080/6690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()

2017-05-16 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4890/5143: Coordinator race involving TearDown()
..


Patch Set 1:

This passed tests in both debug and release mode. I haven't run the stress test 
yet.

-- 
To view, visit http://gerrit.cloudera.org:8080/6897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created (FE)

2017-05-16 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6898

Change subject: IMPALA-5167: Reduce the number of Kudu clients created (FE)
..

IMPALA-5167: Reduce the number of Kudu clients created (FE)

Creating Kudu clients is very expensive as each will fetch
metadata from the Kudu master, so we should minimize the
number of Kudu clients that get created.

This patch stores a map from Kudu master addressed to Kudu
clients in KuduUtil to be used across the FE and catalog.
Another patch will address the BE.

This relies on a change on the Kudu side that clears
non-covered range entries from the client's cache on
table open (d07ecd6ded01201c912d2e336611a6a941f48d98).

Testing:
- Ran a stress test on a 10 node cluster: scan of a small
  Kudu table, 1000 concurrent queries, load on the Kudu
  master was reduced signficantly, from ~50% cpu to ~5%.
  (with the BE changes included)
- Ran the Kudu e2e tests.
- Manually ran a test with concurrent INSERTs and
  'ALTER TABLE ADD PARTITION' (which is affected by the
  Kudu side change mentiond above) and verified
  correctness.

Change-Id: I9b0b346f37ee43f7f0eefe34a093eddbbdcf2a5e
---
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
4 files changed, 41 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/6898/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9b0b346f37ee43f7f0eefe34a093eddbbdcf2a5e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()

2017-05-16 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6897

Change subject: IMPALA-4890/5143: Coordinator race involving TearDown()
..

IMPALA-4890/5143: Coordinator race involving TearDown()

TearDown() releases resources and destroys control
structures (the QueryState reference), and it can be called
while a concurrent thread executes Exec() or might call
GetNext() in the future. The solution is not to destroy
the control structures.

This also releases resources automatically at the end
of query execution.

Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
3 files changed, 29 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/6897/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 


[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

2017-05-16 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6896

Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp 
data in Parquet"
..

Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

Reverting IMPALA-2716 as SparkSQL does not agree with the approach
taken.

More details can be found at:
https://issues.apache.org/jira/browse/SPARK-12297

Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
---
M be/src/benchmarks/CMakeLists.txt
D be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
D tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 84 insertions(+), 850 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/6896/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-5166: clean up BufferPool counters

2017-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5166: clean up BufferPool counters
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6690/3/be/src/runtime/bufferpool/buffer-pool.cc
File be/src/runtime/bufferpool/buffer-pool.cc:

PS3, Line 279: profile->GetOrCreateChild
> Right, but why not just add that (or use the existing AddChid() pattern)?
:S forgot what was in my own patch. 

Switched to CreateChild() and using an existing API in buffer-pool-test. I 
think CreateChild() is a better API than AddChild() because it doesn't require 
threading the ObjectPool through - there's no logical reason I can see that the 
child profile would have a different lifetime to its parent.


http://gerrit.cloudera.org:8080/#/c/6690/3/be/src/runtime/mem-tracker.cc
File be/src/runtime/mem-tracker.cc:

PS3, Line 237: BufferPoolUsed
> update
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5166: clean up BufferPool counters

2017-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#6).

Change subject: IMPALA-5166: clean up BufferPool counters
..

IMPALA-5166: clean up BufferPool counters

Misc changes to improve usability of the profiles.

* Separate out detailed BufferPool metrics into a "Buffer pool"
  sub-profile.
* Only create the limit counter if there is a limit
* Show BufferPool using in query MemTracker (it was accidentally
  disabled before because there was no query-level profile).
* Reduce clutter in MemTracker dump by only showing buffer pool
  reservation, not usage (the usage was misleading anyway because
  it didn't include child usage).
* Remove TotalUnpinnedBytes, which had limited value - WriteIoBytes
  and PeakUnpinnedBytes can answer most of the same questions - i.e.
  did it unpin any pages, and how many did it need to write to disk.
* Add buffer pool metrics to /memz (if buffer pool is enabled) and
  reorder /memz so more useful information is up the top.

Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c
---
M be/src/runtime/bufferpool/buffer-allocator.cc
M be/src/runtime/bufferpool/buffer-pool-counters.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/mem-tracker.cc
M be/src/util/default-path-handlers.cc
M be/src/util/memory-metrics.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/thrift/metrics.json
M www/memz.tmpl
12 files changed, 145 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/6690/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created (BE)

2017-05-16 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5167: Reduce the number of Kudu clients created (BE)
..


Patch Set 6:

(3 comments)

> So I just spoke with the Kudu folks and it sounds like there will
 > be an issue with the Java client that will prevent it from being
 > used for more than 7 days w/ Kerberos (some token issue), and I
 > don't think that'll get fixed ASAP. The issue does exist for the
 > C++ client as well but that's being fixed now. Can you separate out
 > the Java side so we can submit them independently?
Done.

http://gerrit.cloudera.org:8080/#/c/6792/5/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

Line 403: KuduClientPtr* kudu_client_ptr = new KuduClientPtr;
> no std:: in .cc
Done


Line 404: RETURN_IF_ERROR(CreateKuduClient(master_addresses, 
_client_ptr->kudu_client));
> I suggested it, but we can take it out if you think it's superfluous.
If I understand correctly, this list comes from a table property, which by 
default is populated from a start up flag. I assume the common case we expect 
is for people to mostly create tables with the default list, so they should 
generally be in the same order.

Its also probably not a big deal if we end up creating a small number of Kudu 
clients for different orderings, its still a significant reduction in the total 
number created.


http://gerrit.cloudera.org:8080/#/c/6792/5/fe/src/main/java/org/apache/impala/util/KuduUtil.java
File fe/src/main/java/org/apache/impala/util/KuduUtil.java:

Line 72: 
b.defaultOperationTimeoutMs(BackendConfig.INSTANCE.getKuduClientTimeoutMs());
> move above the function comment.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created (BE)

2017-05-16 Thread Thomas Tauber-Marshall (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6792

to look at the new patch set (#6).

Change subject: IMPALA-5167: Reduce the number of Kudu clients created (BE)
..

IMPALA-5167: Reduce the number of Kudu clients created (BE)

Creating Kudu clients is very expensive as each will fetch
metadata from the Kudu master, so we should minimize the
number of Kudu clients that get created.

This patch stores a map from Kudu master addressed to Kudu
clients in the ExecEnv to be used across the BE for all
queries. Another patch will address the FE.

This relies on a change on the Kudu side that clears
non-covered range entries from the client's cache on
table open (fdc022fe6231af20e307012d98c35b16cbfa7b33)

Testing:
- Ran a stress test on a 10 node cluster: scan of a small
  Kudu table, 1000 concurrent queries, load on the Kudu
  master was reduced signficantly, from ~50% cpu to ~5%.
  (with the FE changes included)
- Ran the Kudu e2e tests.
- Manually ran a test with concurrent INSERTs and
  'ALTER TABLE ADD PARTITION' (which is affected by the
  Kudu side change mentiond above) and verified
  correctness.

Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705
---
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
8 files changed, 73 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/6792/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5166: clean up BufferPool counters

2017-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-5166: clean up BufferPool counters
..

IMPALA-5166: clean up BufferPool counters

Misc changes to improve usability of the profiles.

* Separate out detailed BufferPool metrics into a "Buffer pool"
  sub-profile.
* Only create the limit counter if there is a limit
* Show BufferPool using in query MemTracker (it was accidentally
  disabled before because there was no query-level profile).
* Reduce clutter in MemTracker dump by only showing buffer pool
  reservation, not usage (the usage was misleading anyway because
  it didn't include child usage).
* Remove TotalUnpinnedBytes, which had limited value - WriteIoBytes
  and PeakUnpinnedBytes can answer most of the same questions - i.e.
  did it unpin any pages, and how many did it need to write to disk.
* Add buffer pool metrics to /memz (if buffer pool is enabled) and
  reorder /memz so more useful information is up the top.

Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c
---
M be/src/runtime/bufferpool/buffer-allocator.cc
M be/src/runtime/bufferpool/buffer-pool-counters.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/mem-tracker.cc
M be/src/util/default-path-handlers.cc
M be/src/util/memory-metrics.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/thrift/metrics.json
M www/memz.tmpl
12 files changed, 145 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/6690/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2525: Treat parquet ENUMs as STRINGs when creating impala tables.

2017-05-16 Thread Jakub Kukul (Code Review)
Jakub Kukul has uploaded a new patch set (#4).

Change subject: IMPALA-2525: Treat parquet ENUMs as STRINGs when creating 
impala tables.
..

IMPALA-2525: Treat parquet ENUMs as STRINGs when creating impala tables.

Change-Id: Ia7a2e20c3ab83eb3fac422c3b33c117856fec475
---
M docs/topics/impala_parquet.xml
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M testdata/bin/create-load-data.sh
A testdata/data/schemas/logicaltypes.parquet
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
5 files changed, 21 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/6550/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia7a2e20c3ab83eb3fac422c3b33c117856fec475
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jakub Kukul 
Gerrit-Reviewer: Jakub Kukul 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5144: Remove sortby() hint

2017-05-16 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5144: Remove sortby() hint
..


Patch Set 2:

(3 comments)

Thank you for having a look. I added a test in PS3, and inline comments 
pointing to the remaining tests you asked for.

http://gerrit.cloudera.org:8080/#/c/6885/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

PS2, Line 1792: 
> Do we have any tests that use shuffle and clustered? If not, maybe we shoul
Tests for shuffle and clustered are above this removed block, starting in Line 
1726. The tests here have been added for the sortby() hint in particular, and 
were also only executed for non-legacy syntax. The tests above run for all hint 
syntaxes.


http://gerrit.cloudera.org:8080/#/c/6885/2/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

PS2, Line 508: 
> Since you're at it, maybe add a simply test with clustered as well. Didn't 
Good idea, done.


http://gerrit.cloudera.org:8080/#/c/6885/2/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS2, Line 328: 
> Same comment as before. Just make sure we have some coverage for shuffle + 
They are covered in multiple combinations in insert.test.


-- 
To view, visit http://gerrit.cloudera.org:8080/6885
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I83e1cd6fa7039035973676322deefbce00d3f594
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5144: Remove sortby() hint

2017-05-16 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#3).

Change subject: IMPALA-5144: Remove sortby() hint
..

IMPALA-5144: Remove sortby() hint

The sortby() hint is superseded by the SORT BY SQL clause, which has
been introduced in IMPALA-4166. This changes removes the hint.

Change-Id: I83e1cd6fa7039035973676322deefbce00d3f594
---
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M tests/query_test/test_insert_parquet.py
14 files changed, 62 insertions(+), 279 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/6885/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6885
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e1cd6fa7039035973676322deefbce00d3f594
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs.

2017-05-16 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs.
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6868/3/fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java
File fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java:

Line 68: if (randomSeed_ != null) {
single line?


http://gerrit.cloudera.org:8080/#/c/6868/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 812: result.numRows_ = numRows_;
this is a bit weird outside of the context of sampling (why should #rows stay 
the same for a different set of files?).

instead of doing the sampling in the metadata, would it make more sense to do 
it during scan range computation?


http://gerrit.cloudera.org:8080/#/c/6868/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 1961: List> allFiles =
this is an expensive list, and i don't think you need it. you can achieve the 
same effect with a list that mirrors orderedparts and contains the cumulative # 
of files up to the preceding partition (e.g., {0, 517, 1120, ...}). you then 
generate a random sequence of numbers between 0 and #totalfiles-1. this 
requires space proportional to the sample size, but not the total number of 
files.

you could even return this as a map (the value being the 
indices into the per-partition list of file descriptors), and have hdfsscannode 
use that to access the sampled files in the original partitions.


http://gerrit.cloudera.org:8080/#/c/6868/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 132:   // Total number of files from the effective partitions (partitions_ 
or sample_).
update comment


Line 135:   // Total number of bytes from the effective partitions (partitions_ 
or sample_).
same


Line 678:   cardinality_ = addCardinalities(cardinality_, 
p.getNumRows());
shouldn't the cardinality reflect the reduction from sampling? if you're 
sampling, you reduce scan output, which should affect join order, no?


http://gerrit.cloudera.org:8080/#/c/6868/3/tests/query_test/test_tablesample.py
File tests/query_test/test_tablesample.py:

Line 47: # 2. The results of queries without a repeatable clause could be 
change due to
remove "be"


-- 
To view, visit http://gerrit.cloudera.org:8080/6868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ief112cfb1e4983c5d94c08696dc83da9ccf43f70
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-05-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6527/2/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

Line 113:   boost::scoped_ptr materialized_row_batches_;
> If I understood correctly transfer-of-ownership currently is based on uniqu
I'll make the change or give a better explanation why we should not do it now. 
It will take some time.


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-05-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6527/2/be/src/exec/base-sequence-scanner.h
File be/src/exec/base-sequence-scanner.h:

Line 86:   virtual Status GetNextInternal(RowBatch* row_batch);
> Sounds fine - it would be good though to do the follow-on patch before comm
No point in waiting then. Added WARN_UNUSED_RESULT to all scanners and the 
hdfs-relevant scan nodes.


http://gerrit.cloudera.org:8080/#/c/6527/2/be/src/exec/hdfs-rcfile-scanner.h
File be/src/exec/hdfs-rcfile-scanner.h:

Line 404:   uint8_t* row_group_buffer_;
> I think maybe StringBuffer should be called MemoryBuffer or something like 
Agree about changing the name/comments of StringBuffer. Leaving as is for now, 
lmk if you still want to me to change it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5319: Fix test hdfs scan node errors failures

2017-05-16 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5319: Fix test_hdfs_scan_node_errors failures
..


Patch Set 1:

> Good idea, thanks. I cancelled that job and started:
 > http://sandbox.jenkins.cloudera.com/view/Impala/view/Private-Utility/job/impala-private-build-and-test/5609/

People who do not work at Cloudera cannot see this. If you want, you can run 
exhaustive builds on jenkins.impala.io:

http://jenkins.impala.io:8080/view/Utility/job/ubuntu-14.04-from-scratch/parambuild/?EXPLORATION_STRATEGY=exhaustive

-- 
To view, visit http://gerrit.cloudera.org:8080/6891
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I533f1921662802ea6e076eefac973f50c014fcb5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"

2017-05-16 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
..


Patch Set 1:

Questions for reviewers:

Do you think we should call hdfsGetLastExceptionRootCause() for all HDFS errors 
(and not only 255)? In some cases, there may be more useful information, while 
in most other cases, the error message will be redundant (eg: 'file not found' 
printed twice).

Also, does anyone know of a test case that deterministically reproduces an 
"Unknown Error 255"? I spent a few hours trying to recreate one, but none seem 
simple enough to automate into our test suite.

-- 
To view, visit http://gerrit.cloudera.org:8080/6894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"

2017-05-16 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6894

Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
..

IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"

We use the new libHDFS API hdfsGetLastExceptionRootCause() to return
the last seen HDFS error on that thread.

This patch depends on the recent HDFS commit:
https://github.com/apache/hadoop/commit/fda86ef2a32026c02d9b5d4cca1ecb7b4decd872

Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d
---
M be/src/util/hdfs-util.cc
1 file changed, 3 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/6894/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-05-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6527/2/be/src/exec/base-sequence-scanner.cc
File be/src/exec/base-sequence-scanner.cc:

Line 137:   
static_cast(scan_node_)->AddMaterializedRowBatch(row_batch);
> Right, but the transfer of ownership is different between the two cases - I
Done


http://gerrit.cloudera.org:8080/#/c/6527/2/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

Line 113:   boost::scoped_ptr materialized_row_batches_;
> If I understood correctly transfer-of-ownership currently is based on uniqu
I'll make the change or give a better explanation why we should not do it now.


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes