[Impala-ASF-CR] IMPALA-6990: TestClientSsl.test tls v12 failing due to Python SSL error

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10529 )

Change subject: IMPALA-6990: TestClientSsl.test_tls_v12 failing due to Python 
SSL error
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I92c66ecaeb94b0c83ee6f1396c082709c21b3187
Gerrit-Change-Number: 10529
Gerrit-PatchSet: 4
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 31 May 2018 05:29:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6990: TestClientSsl.test tls v12 failing due to Python SSL error

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10529 )

Change subject: IMPALA-6990: TestClientSsl.test_tls_v12 failing due to Python 
SSL error
..

IMPALA-6990: TestClientSsl.test_tls_v12 failing due to Python SSL error

When we upgraded to thrift-0.9.3, the TSSLSocket.py logic changed quite
a bit. Our RHEL7 machines come equipped with Python 2.7.5. Looking at
these comments, that means that we'll be unable to create a 'SSLContext'
but be able to explicitly specify ciphers:
https://github.com/apache/thrift/blob/88591e32e710a0524327153c8b629d5b461e35e0/lib/py/src/transport/TSSLSocket.py#L37-L41
# SSLContext is not available for Python < 2.7.9
_has_ssl_context = sys.hexversion >= 0x020709F0

# ciphers argument is not available for Python < 2.7.0
_has_ciphers = sys.hexversion >= 0x020700F0

If we cannot create a 'SSLContext', then we cannot use TLSv1.2 and have
to use TLSv1:
https://github.com/apache/thrift/blob/88591e32e710a0524327153c8b629d5b461e35e0/lib/py/src/transport/TSSLSocket.py#L48-L49
# For python >= 2.7.9, use latest TLS that both client and server
# supports.
# SSL 2.0 and 3.0 are disabled via ssl.OP_NO_SSLv2 and ssl.OP_NO_SSLv3.
# For python < 2.7.9, use TLS 1.0 since TLSv1_X nor OP_NO_SSLvX is
# unavailable.
_default_protocol = ssl.PROTOCOL_SSLv23 if _has_ssl_context else \
ssl.PROTOCOL_TLSv1

Our custom cluster test forces the server to use TLSv1.2 and also forces
a specific cipher:
https://github.com/apache/impala/blob/2f22a6f67ff363a0832a7ceee5d0020c8fd9b15a/tests/custom_cluster/test_client_ssl.py#L118-L119

So this combination of configuration values causes a failure in RHEL7
because we only allow a specific cipher which works with TLSv1.2, but
the client cannot use TLSv1.2 due to the Python version as mentioned above.

We've not noticed these failures on older-than-RHEL7-systems since the
OpenSSL versions on those systems don't support TLSv1.2. (< OpenSSL 1.0.1)

To fix this, we need to change the Python version on RHEL 7 to be
>= Python 2.7.9. This patch skips the test if an older version of
Python than 2.7.9 is detected.

Change-Id: I92c66ecaeb94b0c83ee6f1396c082709c21b3187
Reviewed-on: http://gerrit.cloudera.org:8080/10529
Reviewed-by: Sailesh Mukil 
Tested-by: Impala Public Jenkins 
---
M tests/custom_cluster/test_client_ssl.py
1 file changed, 6 insertions(+), 6 deletions(-)

Approvals:
  Sailesh Mukil: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/10529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I92c66ecaeb94b0c83ee6f1396c082709c21b3187
Gerrit-Change-Number: 10529
Gerrit-PatchSet: 5
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7091: Address NullPointerException in HBaseTable.getRegionSize().

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10531 )

Change subject: IMPALA-7091: Address NullPointerException in 
HBaseTable.getRegionSize().
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2572/


--
To view, visit http://gerrit.cloudera.org:8080/10531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02f06daf69e7f7e97c9ecc13997147530c2f9d3f
Gerrit-Change-Number: 10531
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 31 May 2018 03:32:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6990: TestClientSsl.test tls v12 failing due to Python SSL error

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10529 )

Change subject: IMPALA-6990: TestClientSsl.test_tls_v12 failing due to Python 
SSL error
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2571/


--
To view, visit http://gerrit.cloudera.org:8080/10529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I92c66ecaeb94b0c83ee6f1396c082709c21b3187
Gerrit-Change-Number: 10529
Gerrit-PatchSet: 4
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 31 May 2018 02:17:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6990: TestClientSsl.test tls v12 failing due to Python SSL error

2018-05-30 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10529 )

Change subject: IMPALA-6990: TestClientSsl.test_tls_v12 failing due to Python 
SSL error
..


Patch Set 4: Code-Review+2

(5 comments)

Thanks for the reviews! Upgrading to +2 since 2 reviewers +1'd it.

http://gerrit.cloudera.org:8080/#/c/10529/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10529/3//COMMIT_MSG@9
PS3, Line 9: When we upgraded to thrift-0.9.3, the TSSLSocket.py logic changed 
quite
> To be super clear: we used to ignore this silently but now TSSLSocket.py br
I explained this in a previous top level comment.


http://gerrit.cloudera.org:8080/#/c/10529/3//COMMIT_MSG@13
PS3, Line 13: 
https://github.com/apache/thrift/blob/88591e32e710a0524327153c8b629d5b461e35e0/lib/py/src/transport/TSSLSocket.py#L37-L41
> You can press "y" in the github UI to get code a permalink UI. Or specify a
Done


http://gerrit.cloudera.org:8080/#/c/10529/3/tests/custom_cluster/test_client_ssl.py
File tests/custom_cluster/test_client_ssl.py:

http://gerrit.cloudera.org:8080/#/c/10529/3/tests/custom_cluster/test_client_ssl.py@36
PS3, Line 36: HAS_LEGACY_OPENSSL = getattr(ssl, "OPENSSL_VERSION_NUMBER", None)
> It's super confusing that you'd change a constant on line 43. Can you just
Done


http://gerrit.cloudera.org:8080/#/c/10529/3/tests/custom_cluster/test_client_ssl.py@45
PS3, Line 45:
> If you want, you can use "getattr" with a default to avoid this.
Done


http://gerrit.cloudera.org:8080/#/c/10529/3/tests/custom_cluster/test_client_ssl.py@132
PS3, Line 132: 
statestored_args=SSL_WILDCARD_ARGS,
> Consider just inlining
Done



--
To view, visit http://gerrit.cloudera.org:8080/10529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I92c66ecaeb94b0c83ee6f1396c082709c21b3187
Gerrit-Change-Number: 10529
Gerrit-PatchSet: 4
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 31 May 2018 02:17:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6990: TestClientSsl.test tls v12 failing due to Python SSL error

2018-05-30 Thread Sailesh Mukil (Code Review)
Hello Michael Brown, Philip Zeyliger,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10529

to look at the new patch set (#4).

Change subject: IMPALA-6990: TestClientSsl.test_tls_v12 failing due to Python 
SSL error
..

IMPALA-6990: TestClientSsl.test_tls_v12 failing due to Python SSL error

When we upgraded to thrift-0.9.3, the TSSLSocket.py logic changed quite
a bit. Our RHEL7 machines come equipped with Python 2.7.5. Looking at
these comments, that means that we'll be unable to create a 'SSLContext'
but be able to explicitly specify ciphers:
https://github.com/apache/thrift/blob/88591e32e710a0524327153c8b629d5b461e35e0/lib/py/src/transport/TSSLSocket.py#L37-L41
# SSLContext is not available for Python < 2.7.9
_has_ssl_context = sys.hexversion >= 0x020709F0

# ciphers argument is not available for Python < 2.7.0
_has_ciphers = sys.hexversion >= 0x020700F0

If we cannot create a 'SSLContext', then we cannot use TLSv1.2 and have
to use TLSv1:
https://github.com/apache/thrift/blob/88591e32e710a0524327153c8b629d5b461e35e0/lib/py/src/transport/TSSLSocket.py#L48-L49
# For python >= 2.7.9, use latest TLS that both client and server
# supports.
# SSL 2.0 and 3.0 are disabled via ssl.OP_NO_SSLv2 and ssl.OP_NO_SSLv3.
# For python < 2.7.9, use TLS 1.0 since TLSv1_X nor OP_NO_SSLvX is
# unavailable.
_default_protocol = ssl.PROTOCOL_SSLv23 if _has_ssl_context else \
ssl.PROTOCOL_TLSv1

Our custom cluster test forces the server to use TLSv1.2 and also forces
a specific cipher:
https://github.com/apache/impala/blob/2f22a6f67ff363a0832a7ceee5d0020c8fd9b15a/tests/custom_cluster/test_client_ssl.py#L118-L119

So this combination of configuration values causes a failure in RHEL7
because we only allow a specific cipher which works with TLSv1.2, but
the client cannot use TLSv1.2 due to the Python version as mentioned above.

We've not noticed these failures on older-than-RHEL7-systems since the
OpenSSL versions on those systems don't support TLSv1.2. (< OpenSSL 1.0.1)

To fix this, we need to change the Python version on RHEL 7 to be
>= Python 2.7.9. This patch skips the test if an older version of
Python than 2.7.9 is detected.

Change-Id: I92c66ecaeb94b0c83ee6f1396c082709c21b3187
---
M tests/custom_cluster/test_client_ssl.py
1 file changed, 6 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/10529/4
--
To view, visit http://gerrit.cloudera.org:8080/10529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I92c66ecaeb94b0c83ee6f1396c082709c21b3187
Gerrit-Change-Number: 10529
Gerrit-PatchSet: 4
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7012: Fix NPE when parsing unexpected tokens

2018-05-30 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/10512 )

Change subject: IMPALA-7012: Fix NPE when parsing unexpected tokens
..

IMPALA-7012: Fix NPE when parsing unexpected tokens

Currently some token are not added to tokenIdMap in the sql scanner and
an NPE will be thrown if such are parsed as unexpected tokens. This
patch fixes this bug and adds a case to ParserTest.

Change-Id: I9c846fdfb22ba37bfc3b1985b9a044014ab58968
---
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
3 files changed, 35 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/10512/2
--
To view, visit http://gerrit.cloudera.org:8080/10512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9c846fdfb22ba37bfc3b1985b9a044014ab58968
Gerrit-Change-Number: 10512
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Tianyi Wang 


[Impala-ASF-CR] PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans

2018-05-30 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10550

to look at the new patch set (#4).

Change subject: PREVIEW: IMPALA-7078: improve memory consumption of wide Avro 
scans
..

PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans

Revert to the pre-IMPALA-3905 algorithm for deciding when to return a
batch from an Avro scan. The post-IMPALA-3905 algorithm is bad for
wide tables where there are only a small number of rows per Avro block.

Optimise memory transfer for selective scans - don't attach unused
decompression buffers to the output batch. Combined with the previous
change, this dramatically reduces the amount of memory transferred out
of scanner threads for selective scans of wide tables.

Cap the maximum row batch queue size at 5 * the number of active
scanner threads, so that we guarantee lower amounts of memory in
the row batch queue with fewer scanner threads, rather than
just achieving it statistically. It does not reduce the default
significantly on typical server configurations that would have
24+ cores except under high concurrency or low memory environments
where the number of scanner threads is limited. We should evaluate
reducing the default further or otherwise better controlling
memory consumption in a follow-up, based on experiments.

Includes some observability improvements including additional
counters that will help diagnose issues like this more easily:
* Add counters to give some insight into row batch queue. Here's
  an excerpt:
   - RowBatchBytesEnqueued: 20.89 MB (21903380)
   - RowBatchQueueCapacity: 5 (5)
   - RowBatchQueueGetWaitTime: 59.187ms
   - RowBatchQueuePeakMemoryUsage: 8.85 MB (9279347)
   - RowBatchQueuePutWaitTime: 0.000ns
   - RowBatchesEnqueued: 6 (6)
* Don't create AverageScannerThreadConcurrency for MT scan node where
  it's not actually used.
* Track the row batch queue memory consumption against a sub-tracker
  HDFS_SCAN_NODE (id=2): Reservation=48.00 MB OtherMemory=588.00 KB Total=48.57 
MB Peak=48.62 MB
Queued Batches: Total=588.00 KB Peak=637.00 KB

Ran the repro in the JIRA. Memory consumption was reduced from ~500MB
to ~220MB on my system.

TODO:
- running stress test on a single node against Avro
- Running exhaustive tests

Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/scan-node.h
M be/src/runtime/mem-pool.cc
M be/src/runtime/mem-pool.h
M be/src/runtime/mem-tracker-test.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/util/blocking-queue.h
13 files changed, 274 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/10550/4
--
To view, visit http://gerrit.cloudera.org:8080/10550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
Gerrit-Change-Number: 10550
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7091: Address NullPointerException in HBaseTable.getRegionSize().

2018-05-30 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10531 )

Change subject: IMPALA-7091: Address NullPointerException in 
HBaseTable.getRegionSize().
..


Patch Set 1: Code-Review+2

Given that I'm seeing this on 2.x (using the old HBase), the code we copied 
from HBase should be tracked as a separate issue. It seems reasonable to handle 
this null appropriately.


--
To view, visit http://gerrit.cloudera.org:8080/10531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02f06daf69e7f7e97c9ecc13997147530c2f9d3f
Gerrit-Change-Number: 10531
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 31 May 2018 01:29:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans

2018-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10550 )

Change subject: PREVIEW: IMPALA-7078: improve memory consumption of wide Avro 
scans
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10550/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10550/2//COMMIT_MSG@18
PS2, Line 18: Cap the maximum row batch queue size at 5 * the number of active
: scanner threads.
> adding this heuristic on top of the current one seems hard to understand ho
I want to get a tighter upper bound on the amount of memory in the queue and 
some guarantee that scaling down the scanner thread count (automatically or 
manually) will actually reduce the amount of queued memory.

We could leave this out of the change - it's not essential - but it felt like 
it should include some harder guarantees about queue size to make it easier to 
reason about.

We can usually control memory consumption from queueing batches by 
automatically limiting num_scanner_threads based on concurrency and 
availability of reservation (or by setting in manually), but there's no reason, 
with the current code, that a single scanner thread couldn't fill up the whole 
queue, e.g. if the consumer thread was off doing some other work for a long 
time.


http://gerrit.cloudera.org:8080/#/c/10550/2//COMMIT_MSG@32
PS2, Line 32: Track the row batch queue memory consumption against a sub-tracker
> what does that look like in terms of the memtracker hierarchy?
Added the MemTracker dump here.


http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-avro-scanner.cc
File be/src/exec/hdfs-avro-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-avro-scanner.cc@496
PS2, Line 496: Check
 :   // AtCapacity() at the end of the loop to guarantee that we 
process at least one row
 :   // so that we mke progress even if the batch starts off with 
AtCapacity() == true.
> that's kinda confusing because it's counter to our usual which is that once
Done


http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-avro-scanner.cc@568
PS2, Line 568: keep_current_chunk = true to
> but line 573 passes false. i think you meant false here.
For some reason my brain likes reversing things sometimes.


http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-scan-node.cc@238
PS2, Line 238:   row_batch->SetMemTracker(row_batches_mem_tracker_);
> which mem-tracker are these accounted to before this line?
They're accounted against ExecNode::mem_tracker_ - see 
HdfsScanner::ProcessSplit() then transferred to the caller of GetNext() with 
AcquireState()


http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/runtime/mem-tracker.cc
File be/src/runtime/mem-tracker.cc:

http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/runtime/mem-tracker.cc@243
PS2, Line 243: }
> why is it that we need this new interface now but didn't need it in other c
The motivation for doing it in this patch specifically is to avoid adding more 
contention on the global MemTracker consumption counters (process and pool 
level) in case that started to become more of a bottleneck.

I think this is a better solution in general than separately calling Consume() 
and Release() because it avoids the unnecessary updates of common ancestor 
counters and also avoids the window of inconsistency where the memory isn't 
accounted against the query (which can be hard to reason about).

We should probably switch to using this in other places where it applies, e.g. 
the various DataStream* classes in a follow-on.



--
To view, visit http://gerrit.cloudera.org:8080/10550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
Gerrit-Change-Number: 10550
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 31 May 2018 01:27:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans

2018-05-30 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10550

to look at the new patch set (#3).

Change subject: PREVIEW: IMPALA-7078: improve memory consumption of wide Avro 
scans
..

PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans

Revert to the pre-IMPALA-3905 algorithm for deciding when to return a
batch from an Avro scan. The post-IMPALA-3905 algorithm is bad for
wide tables where there are only a small number of rows per Avro block.

Optimise memory transfer for selective scans - don't attach unused
decompression buffers to the output batch. Combined with the previous
change, this dramatically reduces the amount of memory transferred out
of scanner threads for selective scans of wide tables.

Cap the maximum row batch queue size at 5 * the number of active
scanner threads, so that we guarantee lower amounts of memory in
the row batch queue with fewer scanner threads, rather than
just achieving it statistically. It does not reduce the default
significantly on typical server configurations that would have
24+ cores except under high concurrency or low memory environments
where the number of scanner threads is limited. We should evaluate
reducing the default further or otherwise better controlling
memory consumption in a follow-up, based on experiments.

Includes some observability improvements including additional
counters that will help diagnose issues like this more easily:
* Add counters to give some insight into row batch queue. Here's
  an excerpt:
   - RowBatchBytesEnqueued: 20.89 MB (21903380)
   - RowBatchQueueCapacity: 5 (5)
   - RowBatchQueueGetWaitTime: 59.187ms
   - RowBatchQueuePeakMemoryUsage: 8.85 MB (9279347)
   - RowBatchQueuePutWaitTime: 0.000ns
   - RowBatchesEnqueued: 6 (6)
* Don't create AverageScannerThreadConcurrency for MT scan node where
  it's not actually used.
* Track the row batch queue memory consumption against a sub-tracker
  HDFS_SCAN_NODE (id=2): Reservation=48.00 MB OtherMemory=588.00 KB Total=48.57 
MB Peak=48.62 MB
Queued Batches: Total=588.00 KB Peak=637.00 KB

Ran the repro in the JIRA. Memory consumption was reduced from ~500MB
to ~220MB on my system.

TODO:
- running stress test on a single node against Avro
- Running exhaustive tests

Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/scan-node.h
M be/src/runtime/mem-pool.cc
M be/src/runtime/mem-pool.h
M be/src/runtime/mem-tracker-test.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/util/blocking-queue.h
13 files changed, 272 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/10550/3
--
To view, visit http://gerrit.cloudera.org:8080/10550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
Gerrit-Change-Number: 10550
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 10:

(1 comment)

Yeah I agree with Phil that doing JNI to java to unzip a file may be the least 
of all evils - we wouldn't be adding another dependency and there's no risk of 
DOSing the NameNode. I doubt it would take the NameNode down, but it could 
disrupt other things happening on the cluster unnecessarily.

http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan
File testdata/tzdb/2017c/Africa/Abidjan:

http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan@1
PS10, Line 1: ../Atlantic/St_Helena
> We're adding a ton of files. Do we need such a big database for our testing
We'll also need to exclude these files from RAT checks (and document how they 
are licensed).



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 31 May 2018 00:30:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7092: Re-enable EC tests broken by HDFS-13539

2018-05-30 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10553


Change subject: IMPALA-7092: Re-enable EC tests broken by HDFS-13539
..

IMPALA-7092: Re-enable EC tests broken by HDFS-13539

This patch re-enables previously OOM-failed tests broken by HDFS-13539
and attributes other disabled EC tests to their root cause, replacing
the "fix_later" tag. There are 2 types of failed tests remaining that
are not inherently EC incompatible:
1. HDFS caching is currently buggy under EC: HDFS-9879.
2. The unit of an EC scan range is a block group, which is of 3X the
   regular block size in the mini-cluster. It breaks some tests and is
   tracked in IMPALA-7098.

Change-Id: I38df02a5cb1e9f6edcd4713eb8c13ba4081754e6
---
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/metadata/test_explain.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_mt_dop.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_queries.py
M tests/query_test/test_query_mem_limit.py
M tests/query_test/test_scanners.py
11 files changed, 15 insertions(+), 23 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/10553/1
--
To view, visit http://gerrit.cloudera.org:8080/10553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I38df02a5cb1e9f6edcd4713eb8c13ba4081754e6
Gerrit-Change-Number: 10553
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5380: [DOCS] Added additional filesystems supported in URI

2018-05-30 Thread Alex Rodoni (Code Review)
Alex Rodoni has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/10551 )

Change subject: IMPALA-5380: [DOCS] Added additional filesystems supported in 
URI
..

IMPALA-5380: [DOCS] Added additional filesystems supported in URI

Change-Id: I11d57efd564fb5779539a88d7b209cee93d313c6
---
M docs/topics/impala_authorization.xml
1 file changed, 9 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/10551/2
--
To view, visit http://gerrit.cloudera.org:8080/10551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I11d57efd564fb5779539a88d7b209cee93d313c6
Gerrit-Change-Number: 10551
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 


[Impala-ASF-CR] IMPALA-5380: [DOCS] Added additional filesystems supported in URI

2018-05-30 Thread Alex Rodoni (Code Review)
Alex Rodoni has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10551


Change subject: IMPALA-5380: [DOCS] Added additional filesystems supported in 
URI
..

IMPALA-5380: [DOCS] Added additional filesystems supported in URI

Change-Id: I11d57efd564fb5779539a88d7b209cee93d313c6
---
M docs/topics/impala_authorization.xml
1 file changed, 9 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/10551/1
--
To view, visit http://gerrit.cloudera.org:8080/10551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I11d57efd564fb5779539a88d7b209cee93d313c6
Gerrit-Change-Number: 10551
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 


[Impala-ASF-CR] IMPALA-6990: TestClientSsl.test tls v12 failing due to Python SSL error

2018-05-30 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10529 )

Change subject: IMPALA-6990: TestClientSsl.test_tls_v12 failing due to Python 
SSL error
..


Patch Set 3: Code-Review+1

...pending Phil's comments, which are fair. The Python version comparison looks 
like the accepted idiom (Impyla does it, for example)


--
To view, visit http://gerrit.cloudera.org:8080/10529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I92c66ecaeb94b0c83ee6f1396c082709c21b3187
Gerrit-Change-Number: 10529
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 30 May 2018 23:58:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans

2018-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10550 )

Change subject: PREVIEW: IMPALA-7078: improve memory consumption of wide Avro 
scans
..


Patch Set 2:

Here's an example of the profile with the new additions:

  HDFS_SCAN_NODE (id=0):(Total: 60.443ms, non-child: 60.443ms, % non-child: 
100.00%)
 - AverageHdfsReadThreadConcurrency: 0.00
 - AverageScannerThreadConcurrency: 0.00
 - BytesRead: 9.36 MB (9816312)
 - BytesReadDataNodeCache: 0
 - BytesReadLocal: 9.36 MB (9816312)
 - BytesReadRemoteUnexpected: 0
 - BytesReadShortCircuit: 9.36 MB (9816312)
 - BytesSkipped: 0
 - CachedFileHandlesHitCount: 0 (0)
 - CachedFileHandlesMissCount: 2 (2)
 - CollectionItemsRead: 0 (0)
 - DecompressionTime: 21.409ms
 - MaxCompressedTextFileLength: 0
 - NumDisksAccessed: 1 (1)
 - NumScannerThreadReservationsDenied: 0 (0)
 - NumScannerThreadsStarted: 1 (1)
 - PeakMemoryUsage: 24.99 MB (26205298)
 - PerReadThreadRawHdfsThroughput: 3.80 GB/sec
 - RemoteScanRanges: 0 (0)
 - RowBatchBytesEnqueued: 20.89 MB (21903380)
 - RowBatchQueueCapacity: 5 (5)
 - RowBatchQueueGetWaitTime: 59.187ms
 - RowBatchQueuePeakMemoryUsage: 8.85 MB (9279347)
 - RowBatchQueuePutWaitTime: 0.000ns
 - RowBatchesEnqueued: 6 (6)
 - RowsRead: 200.00K (20)
 - RowsReturned: 747 (747)
 - RowsReturnedRate: 12.36 K/sec
 - ScanRangesComplete: 1 (1)
 - ScannerThreadsInvoluntaryContextSwitches: 1 (1)
 - ScannerThreadsTotalWallClockTime: 63.681ms
   - MaterializeTupleTime(*): 38.682ms
   - ScannerThreadsSysTime: 0.000ns
   - ScannerThreadsUserTime: 60.000ms
 - ScannerThreadsVoluntaryContextSwitches: 4 (4)
 - TotalRawHdfsOpenFileTime(*): 1.449ms
 - TotalRawHdfsReadTime(*): 2.408ms
 - TotalReadThroughput: 0.00 /sec


--
To view, visit http://gerrit.cloudera.org:8080/10550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
Gerrit-Change-Number: 10550
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 May 2018 23:14:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans

2018-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10550 )

Change subject: PREVIEW: IMPALA-7078: improve memory consumption of wide Avro 
scans
..


Patch Set 2:

I'm still testing out the change but it would be good if you could take a look 
to make sure that the approach makes sense.


--
To view, visit http://gerrit.cloudera.org:8080/10550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
Gerrit-Change-Number: 10550
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 May 2018 23:13:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans

2018-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/10550 )

Change subject: PREVIEW: IMPALA-7078: improve memory consumption of wide Avro 
scans
..

PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans

Revert to the pre-IMPALA-3905 algorithm for deciding when to return a
batch from an Avro scan. The post-IMPALA-3905 algorithm is bad for
wide tables where there are only a small number of rows per Avro block.

Optimise memory transfer for selective scans - don't attach unused
decompression buffers to the output batch. Combined with the previous
change, this dramatically reduces the amount of memory transferred out
of scanner threads for selective scans of wide tables.

Cap the maximum row batch queue size at 5 * the number of active
scanner threads. This means that num_scanner_threads gives better
control over memory consumption. It does not reduce the default
significantly on typical server configurations that would have
24+ cores except under high concurrency or low memory environments
where the number of scanner threads is limited. We should evaluate
reducing the default further or otherwise better controlling
memory consumption in a follow-up, based on experiments.

Includes some observability improvements including additional
counters that will help diagnose issues like this more easily:
* Add counters to give some insight into row batch queue.
* Don't create AverageScannerThreadConcurrency for MT scan node where
  it's not actually used.
* Track the row batch queue memory consumption against a sub-tracker

Testing:
Ran the repro in the JIRA. Memory consumption was reduced from ~500MB
to ~220MB on my system.

TODO:
- running stress test on a single node against Avro
- Running exhaustive tests

Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/scan-node.h
M be/src/runtime/mem-pool.cc
M be/src/runtime/mem-pool.h
M be/src/runtime/mem-tracker-test.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/util/blocking-queue.h
13 files changed, 272 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/10550/2
--
To view, visit http://gerrit.cloudera.org:8080/10550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
Gerrit-Change-Number: 10550
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6990: TestClientSsl.test tls v12 failing due to Python SSL error

2018-05-30 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10529 )

Change subject: IMPALA-6990: TestClientSsl.test_tls_v12 failing due to Python 
SSL error
..


Patch Set 3:

> > (5 comments)
 > >
 > > My main question is to confirm that we understand why the Python
 > > upgrade broke this. Before it was also broken, but we were
 > silently
 > > ignoring that?
 > >
 > > The rest is python nits, which you can ignore if you'd like.
 >
 > I did some more digging to answer the questions and this is what
 > I've come up with (I've posted this in the JIRA as well, but
 > pasting here for better context):
 >
 > I missed a detail which was that this test never ran on RHEL6 due
 > to all our RHEL6 machines having OpenSSL 1.0.0 which doesn't
 > support TLSv1.2, causing them to be skipped.
 >
 > On RHEL7, this used to work before the Thrift upgrade because the
 > old Thrift cpp library (0.9.0) was somehow accepting TLSv1
 > connections even though we explicitly set TLSv1.2 on the server.
 > I'm unable to figure out why that was happening, and it looks like
 > a bug, but I'll keep looking. It could be a bug in the Python 'ssl'
 > library, or the Thrift 0.9.0 python library, or the Thrift 0.9.0
 > CPP library, or even OpenSSL.
 >
 > In Thrift 0.9.3, we explicitly select TLSv1.2 if that's what the
 > user specified which fixes the above mentioned bug. Our test caught
 > this bug, since the client side doesn't support TLSv1.2 unless we
 > are equipped with Python 2.7.9 or up.
 >
 > As for a weaker test, we already run test_ssl() which is a weaker
 > test as it doesn't enforce any ciphers or TLS versions which allows
 > the client and server to negotiate a protocol that they're both
 > aware of.
 >
 > So to summarize, things weren't working as expected before and we
 > were passing the test when it should have been failing, but the
 > upgrade fixed the original problem and caused the test to start
 > failing.
 >
 > Carry +1. Thanks for the review!

I'll address the nits in a separate patchset.


--
To view, visit http://gerrit.cloudera.org:8080/10529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I92c66ecaeb94b0c83ee6f1396c082709c21b3187
Gerrit-Change-Number: 10529
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 30 May 2018 22:32:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6990: TestClientSsl.test tls v12 failing due to Python SSL error

2018-05-30 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10529 )

Change subject: IMPALA-6990: TestClientSsl.test_tls_v12 failing due to Python 
SSL error
..


Patch Set 3: Code-Review+1

> (5 comments)
 >
 > My main question is to confirm that we understand why the Python
 > upgrade broke this. Before it was also broken, but we were silently
 > ignoring that?
 >
 > The rest is python nits, which you can ignore if you'd like.

I did some more digging to answer the questions and this is what I've come up 
with (I've posted this in the JIRA as well, but pasting here for better 
context):

I missed a detail which was that this test never ran on RHEL6 due to all our 
RHEL6 machines having OpenSSL 1.0.0 which doesn't support TLSv1.2, causing them 
to be skipped.

On RHEL7, this used to work before the Thrift upgrade because the old Thrift 
cpp library (0.9.0) was somehow accepting TLSv1 connections even though we 
explicitly set TLSv1.2 on the server. I'm unable to figure out why that was 
happening, and it looks like a bug, but I'll keep looking. It could be a bug in 
the Python 'ssl' library, or the Thrift 0.9.0 python library, or the Thrift 
0.9.0 CPP library, or even OpenSSL.

In Thrift 0.9.3, we explicitly select TLSv1.2 if that's what the user specified 
which fixes the above mentioned bug. Our test caught this bug, since the client 
side doesn't support TLSv1.2 unless we are equipped with Python 2.7.9 or up.

As for a weaker test, we already run test_ssl() which is a weaker test as it 
doesn't enforce any ciphers or TLS versions which allows the client and server 
to negotiate a protocol that they're both aware of.

So to summarize, things weren't working as expected before and we were passing 
the test when it should have been failing, but the upgrade fixed the original 
problem and caused the test to start failing.

Carry +1. Thanks for the review!


--
To view, visit http://gerrit.cloudera.org:8080/10529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I92c66ecaeb94b0c83ee6f1396c082709c21b3187
Gerrit-Change-Number: 10529
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 30 May 2018 22:32:42 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-7061: Rework HBase splitting and assignment

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10545 )

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2570/


--
To view, visit http://gerrit.cloudera.org:8080/10545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10545
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 30 May 2018 22:25:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6119: Fix issue with multiple partitions sharing same location

2018-05-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10543 )

Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
..


Patch Set 1:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@189
PS1, Line 189: in String as
nit: remove, kinda obvious


http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@189
PS1, Line 189: ::
nit:Java style


http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@191
PS1, Line 191:   private final HashMap> 
locationToPartMap_ =
Unfortunate that a lot of state maintained in this class and is pretty 
confusing and can lead to subtle bugs if we forget to update/remove from one of 
these data structures. We need to clean these up eventually.


http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1105
PS1, Line 1105: partition.getLocation()
can be inferred from the partition object?


http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1204
PS1, Line 1204: ::
nit:Java


http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1205
PS1, Line 1205: maintains
updates


http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1207
PS1, Line 1207: chang
nit: update?


http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1214
PS1, Line 1214: Adds 'location' as key and 'partition' as value to 
locationToPartMap_.
May be just say "Adds 'partition' to locationTo..." Rest is implicit.


http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1216
PS1, Line 1216: insertToLocationPartMap
call it addToLocationPartMap may be?


http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1468
PS1, Line 1468: a Map that contains all the HdfsPartitions
  :* that point to the same locations as the ones received
I think the earlier description is more clear.


http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2434
PS1, Line 2434: change
nit: call it updatePartitionLocation()?


http://gerrit.cloudera.org:8080/#/c/10543/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2435
PS1, Line 2435: } else {
  : partition.setLocation(location);
  :   }
I don't think non HdfsTable is possible here. Look at 
catalog_.getHdfsPartition()


http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py@159
PS1, Line 159:   assert data.split('\t') == ['21', '6']
Could you drop one of the partitions that point to the same partition dir and 
make sure the other partition still counts it?



--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 30 May 2018 22:10:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6035: Add query options to limit thread reservation

2018-05-30 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10365 )

Change subject: IMPALA-6035: Add query options to limit thread reservation
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10365/7/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/10365/7/be/src/scheduling/admission-controller.cc@426
PS7, Line 426:   max_thread_reservation = make_pair(, 
e.second.thread_reservation);
I think the use of pairs hurts readability (since fields don't have meaningful 
names) but given that the new pair is declared right here, I don't feel 
strongly in this case. So if you and Bikram feel this is clearest, leave it.


http://gerrit.cloudera.org:8080/#/c/10365/7/be/src/scheduling/admission-controller.cc@432
PS7, Line 432: chedule->query_options().__isset.buffer_pool_limit
 :   && schedule->query_options()
query_opts


http://gerrit.cloudera.org:8080/#/c/10365/7/be/src/scheduling/admission-controller.cc@440
PS7, Line 440:  else if
Preexisting, but I think it's non-obvious why this is "else if" with the buffer 
pool option check. I had to do a double take. How about adding a brief 
explanation (maybe just reference InitBufferPoolState())?


http://gerrit.cloudera.org:8080/#/c/10365/7/be/src/scheduling/admission-controller.cc@461
PS7, Line 461: else if
given that these aren't really mutually exclusive (unlike the buffer pool limit 
vs mem_limit thing), how about just use 'if' here? The "return true" pattern is 
used throughout the exclude downstream checks.


http://gerrit.cloudera.org:8080/#/c/10365/7/be/src/scheduling/query-schedule.h
File be/src/scheduling/query-schedule.h:

http://gerrit.cloudera.org:8080/#/c/10365/7/be/src/scheduling/query-schedule.h@49
PS7, Line 49: input to a BackendState
input to AdmissionController and a BackendState
(especially given that the thread reservation has no execution-side counterpart)



--
To view, visit http://gerrit.cloudera.org:8080/10365
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5b5bbbdad5cd6b24442eb6c99a4d38c2ad710007
Gerrit-Change-Number: 10365
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 May 2018 21:53:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7044: Prevent overflow when computing Parquet block size

2018-05-30 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10483 )

Change subject: IMPALA-7044: Prevent overflow when computing Parquet block size
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10483/4/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/10483/4/be/src/exec/hdfs-parquet-table-writer.cc@795
PS4, Line 795: num_cols
it's a bit misleading to say "number of columns in the target table" and give 
this value since the value is the columns in the parquet file not the table 
(when partitioned).


http://gerrit.cloudera.org:8080/#/c/10483/4/be/src/exec/hdfs-parquet-table-writer.cc@796
PS4, Line 796:   }
The preexisting block size logic is pretty convoluted. It appears that this 
check is completely redundant with the one you are adding to hdfs-table-writer 
and the check in HdfsParquetTableWriter::InitNewFile(). I'm not opposed to 
adding this since the error message is more informative, but I don't think that 
was your intent, so wanted to check that logic. And am wondering if we could 
consolidate the checks rather than having them sprinkled around (like move this 
check to HdfsParquetTableWriter::InitNewFile(). hmm, but then it might be too 
late since we'd have already hit the "generic" 2GB check you're adding to 
HdfsTableWriter.


http://gerrit.cloudera.org:8080/#/c/10483/4/be/src/exec/hdfs-parquet-table-writer.cc@925
PS4, Line 925: num_cols
maybe call this num_file_columns to be clear that it's the file columns not the 
table columns.


http://gerrit.cloudera.org:8080/#/c/10483/4/be/src/exec/hdfs-parquet-table-writer.cc@979
PS4, Line 979: a table with " << num_cols << " columns
not your change, but this is also confused.


http://gerrit.cloudera.org:8080/#/c/10483/4/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/10483/4/be/src/exec/hdfs-table-sink.cc@371
PS4, Line 371: output_partition->partition_descriptor->block_size();
in what cases can this be > 2GB? I'm wondering if we should just cap this value 
to 2GB, and only issue the 2GB error for the cases where user can actually do 
something about it.



--
To view, visit http://gerrit.cloudera.org:8080/10483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6e63420e5a093c0bbc789201771708865b16e138
Gerrit-Change-Number: 10483
Gerrit-PatchSet: 4
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Wed, 30 May 2018 21:19:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5552: Add support for authorized proxy groups

2018-05-30 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10510 )

Change subject: IMPALA-5552: Add support for authorized proxy groups
..


Patch Set 8: Code-Review+1

(1 comment)

Carry Vuk's +1

http://gerrit.cloudera.org:8080/#/c/10510/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/10510/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java@635
PS5, Line 635: sues.apach
> thx for the info.. yeah that's ugly. if there is a jira to either fix this
Done



--
To view, visit http://gerrit.cloudera.org:8080/10510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
Gerrit-Change-Number: 10510
Gerrit-PatchSet: 8
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 30 May 2018 21:10:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6020: [DOCS] REFRESH statement cannot detect HDFS block movement

2018-05-30 Thread Alex Rodoni (Code Review)
Alex Rodoni has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10470 )

Change subject: IMPALA-6020: [DOCS] REFRESH statement cannot detect HDFS block 
movement
..


Patch Set 1:

Juan,
You also filed this downstream. Could you comment whether to keep or remove the 
passage?


--
To view, visit http://gerrit.cloudera.org:8080/10470
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6de1469e0f375511fd415e79074f4ec72cf109b
Gerrit-Change-Number: 10470
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Juan Yu 
Gerrit-Comment-Date: Wed, 30 May 2018 21:10:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5552: Add support for authorized proxy groups

2018-05-30 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/10510 )

Change subject: IMPALA-5552: Add support for authorized proxy groups
..

IMPALA-5552: Add support for authorized proxy groups

The patch adds support for mapping of users to a list of proxy groups.

The following flags are added in impalad:
- authorized_proxy_group_config
- authorized_proxy_group_config_delimiter

Example:
--authorized_proxy_group_config=hue=group1,group2;user1=*

This feature is not supported on Shell-based Hadoop groups mapping
providers.

Testing:
- Added FE unit test to check for groups mapping provider
- Added BE unit test for the parsing logic
- Added a new test in test_authorization.py
- Ran all end-to-end test_authorization.py

Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
---
M be/src/service/CMakeLists.txt
M be/src/service/frontend.cc
M be/src/service/frontend.h
A be/src/service/impala-server-test.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/backend-gflag-util.cc
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A fe/src/test/java/org/apache/impala/service/JniFrontendTest.java
M tests/authorization/test_authorization.py
11 files changed, 385 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/10510/8
--
To view, visit http://gerrit.cloudera.org:8080/10510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
Gerrit-Change-Number: 10510
Gerrit-PatchSet: 8
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5552: Add support for authorized proxy groups

2018-05-30 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10510 )

Change subject: IMPALA-5552: Add support for authorized proxy groups
..


Patch Set 7: Code-Review+1

(1 comment)

thx for the changes. lgtm, but let others settle on their reviews as well.

http://gerrit.cloudera.org:8080/#/c/10510/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/10510/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java@635
PS5, Line 635: startsWith
> Unfortunately there's no way to do this at the moment. I agree this is very
thx for the info.. yeah that's ugly. if there is a jira to either fix this api 
or for impala to use something else, please add it. in either case, pls add a 
comment about why this is done (e.g., // HACK: ... )



--
To view, visit http://gerrit.cloudera.org:8080/10510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
Gerrit-Change-Number: 10510
Gerrit-PatchSet: 7
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 30 May 2018 20:13:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5216: Make admission control queuing async

2018-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10060 )

Change subject: IMPALA-5216: Make admission control queuing async
..


Patch Set 16: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/10060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf
Gerrit-Change-Number: 10060
Gerrit-PatchSet: 16
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 May 2018 19:34:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5552: Add support for authorized proxy groups

2018-05-30 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10510 )

Change subject: IMPALA-5552: Add support for authorized proxy groups
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10510/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/10510/5/be/src/service/impala-server.cc@406
PS5, Line 406: thorized_proxy_user_config or FLAGS_authorized_proxy_group_config
> can be specific to to the authorized_proxy_config for which this method was
I initially did that with lambda: 
https://gerrit.cloudera.org/c/10510/2/be/src/service/impala-server.cc#323. Let 
me know what you think about it and I can use the old code back.


http://gerrit.cloudera.org:8080/#/c/10510/5/be/src/service/impala-server.cc@412
PS5, Line 412: st string& config: proxy_confi
> if ws is included around a group or user string, e.g., " *   ", will (and s
Make sense I'll update it.


http://gerrit.cloudera.org:8080/#/c/10510/5/be/src/service/impala-server.cc@416
PS5, Line 416:
> can use a literal pair constructor: { ... }
Done


http://gerrit.cloudera.org:8080/#/c/10510/5/be/src/service/impala-server.cc@1383
PS5, Line 1383:   if (user == "*" |
> its an error, no need to wrap.
Done


http://gerrit.cloudera.org:8080/#/c/10510/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/10510/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java@635
PS5, Line 635: startsWith
> This makes the error messages in the security package part of the api. Is t
Unfortunately there's no way to do this at the moment. I agree this is very 
ugly. This is another example in Impala that relies on the same exact exception 
message: 
https://github.com/apache/impala/blob/e9bd917a218b5e1717fced983f70f64850c6e02f/fe/src/main/java/org/apache/impala/util/RequestPoolService.java#L302

The source in Hadoop API: 
https://github.com/apache/hadoop/blob/47c31ff16b452d47afc6ffc1cf936ac2de9b788d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java#L198-L200



--
To view, visit http://gerrit.cloudera.org:8080/10510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
Gerrit-Change-Number: 10510
Gerrit-PatchSet: 7
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 30 May 2018 18:55:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5552: Add support for authorized proxy groups

2018-05-30 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/10510 )

Change subject: IMPALA-5552: Add support for authorized proxy groups
..

IMPALA-5552: Add support for authorized proxy groups

The patch adds support for mapping of users to a list of proxy groups.

The following flags are added in impalad:
- authorized_proxy_group_config
- authorized_proxy_group_config_delimiter

Example:
--authorized_proxy_group_config=hue=group1,group2;user1=*

This feature is not supported on Shell-based Hadoop groups mapping
providers.

Testing:
- Added FE unit test to check for groups mapping provider
- Added BE unit test for the parsing logic
- Added a new test in test_authorization.py
- Ran all end-to-end test_authorization.py

Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
---
M be/src/service/CMakeLists.txt
M be/src/service/frontend.cc
M be/src/service/frontend.h
A be/src/service/impala-server-test.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/backend-gflag-util.cc
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A fe/src/test/java/org/apache/impala/service/JniFrontendTest.java
M tests/authorization/test_authorization.py
11 files changed, 382 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/10510/7
--
To view, visit http://gerrit.cloudera.org:8080/10510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
Gerrit-Change-Number: 10510
Gerrit-PatchSet: 7
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-2751: Matching quotes are not required in comments

2018-05-30 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10541 )

Change subject: IMPALA-2751: Matching quotes are not required in comments
..


Patch Set 2:

I ran all shell tests on both Python 2.6 and Python 2.7. This should also fix 
IMPALA-7089.


--
To view, visit http://gerrit.cloudera.org:8080/10541
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2feae34026a7e63f3d31489f757f093a73ca5d2c
Gerrit-Change-Number: 10541
Gerrit-PatchSet: 2
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Wed, 30 May 2018 18:53:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2751: Matching quotes are not required in comments

2018-05-30 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10541


Change subject: IMPALA-2751: Matching quotes are not required in comments
..

IMPALA-2751: Matching quotes are not required in comments

This patch fixes the issue where non-matching quotes inside comments
will cause the shell to not terminate.

The fix is to strip any SQL comments before sending to shlex since shlex
does not understand SQL comments and will raise an exception when it
sees
unmatched quotes regardless whether the quotes are in the comments or
not.

Testing:
- Added new shell tests
- Ran all end-to-end shell tests on Python 2.6 and Python 2.7

Change-Id: I2feae34026a7e63f3d31489f757f093a73ca5d2c
---
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
3 files changed, 26 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/10541/2
--
To view, visit http://gerrit.cloudera.org:8080/10541
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2feae34026a7e63f3d31489f757f093a73ca5d2c
Gerrit-Change-Number: 10541
Gerrit-PatchSet: 2
Gerrit-Owner: Fredy Wijaya 


[Impala-ASF-CR] IMPALA-7091: Address NullPointerException in HBaseTable.getRegionSize().

2018-05-30 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10531 )

Change subject: IMPALA-7091: Address NullPointerException in 
HBaseTable.getRegionSize().
..


Patch Set 1:

(1 comment)

I think handling the null is the right thing to do.

It bothers me a bit that we got the list of HRegionLocation from HBase and then 
one of these calls fails.

http://gerrit.cloudera.org:8080/#/c/10531/1/fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
File fe/src/main/java/org/apache/impala/catalog/HBaseTable.java:

http://gerrit.cloudera.org:8080/#/c/10531/1/fe/src/main/java/org/apache/impala/catalog/HBaseTable.java@711
PS1, Line 711: This is copied from org.apache.hadoop.hbase.client.HTable in 
HBase 0.95
This is where the HRegionLocation is coming from, maybe this needs to be 
updated for the new HBase? We had a TODO here, but we removed it in IMPALA-4082.



--
To view, visit http://gerrit.cloudera.org:8080/10531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02f06daf69e7f7e97c9ecc13997147530c2f9d3f
Gerrit-Change-Number: 10531
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 30 May 2018 18:53:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6990: TestClientSsl.test tls v12 failing due to Python SSL error

2018-05-30 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10529 )

Change subject: IMPALA-6990: TestClientSsl.test_tls_v12 failing due to Python 
SSL error
..


Patch Set 3: Code-Review+1

(5 comments)

My main question is to confirm that we understand why the Python upgrade broke 
this. Before it was also broken, but we were silently ignoring that?

The rest is python nits, which you can ignore if you'd like.

http://gerrit.cloudera.org:8080/#/c/10529/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10529/3//COMMIT_MSG@9
PS3, Line 9: When we upgraded to thrift-0.9.3, the TSSLSocket.py logic changed 
quite
To be super clear: we used to ignore this silently but now TSSLSocket.py broke 
this? i.e., why is it that the test started failing after the Thrift change...


http://gerrit.cloudera.org:8080/#/c/10529/3//COMMIT_MSG@13
PS3, Line 13: 
https://github.com/apache/thrift/blob/master/lib/py/src/transport/TSSLSocket.py#L37-L41
You can press "y" in the github UI to get code a permalink UI. Or specify a tag 
instead of "master".


http://gerrit.cloudera.org:8080/#/c/10529/3/tests/custom_cluster/test_client_ssl.py
File tests/custom_cluster/test_client_ssl.py:

http://gerrit.cloudera.org:8080/#/c/10529/3/tests/custom_cluster/test_client_ssl.py@36
PS3, Line 36: HAS_TLSV12_INCOMPATIBLE_PYTHON = True
It's super confusing that you'd change a constant on line 43. Can you just put 
line 43 here?

The same comment sort of applies to the other stuff, but you've not changed it.


http://gerrit.cloudera.org:8080/#/c/10529/3/tests/custom_cluster/test_client_ssl.py@45
PS3, Line 45:   # Old ssl module versions don't even have 
OPENSSL_VERSION_NUMBER as a member.
If you want, you can use "getattr" with a default to avoid this.

e.g.:
getattr(ssl, "OPENSSL_VERSION_NUMBER", None)

Feel free to ignore since this was pre-existing.


http://gerrit.cloudera.org:8080/#/c/10529/3/tests/custom_cluster/test_client_ssl.py@132
PS3, Line 132:   @pytest.mark.skipif(HAS_TLSV12_INCOMPATIBLE_PYTHON, 
reason=SKIP_TLSV12_MSG)
Consider just inlining

@pytest.mark.skipif(sys.version_info < REQUIRED_MIN_PYTHON_VERSION_FOR_TLSV12, 
reason="Python version too old to allow Thrift client to use TLSv1.2")



--
To view, visit http://gerrit.cloudera.org:8080/10529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I92c66ecaeb94b0c83ee6f1396c082709c21b3187
Gerrit-Change-Number: 10529
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 30 May 2018 18:51:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6714: [DOCS] ORC file format support

2018-05-30 Thread Alex Rodoni (Code Review)
Alex Rodoni has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10525 )

Change subject: IMPALA-6714: [DOCS] ORC file format support
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10525/1/docs/topics/impala_orc.xml
File docs/topics/impala_orc.xml:

http://gerrit.cloudera.org:8080/#/c/10525/1/docs/topics/impala_orc.xml@133
PS1, Line 133: select * from
Could you make all SQL keywords in uppercase as in the Hive examples below?



--
To view, visit http://gerrit.cloudera.org:8080/10525
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1ee23ed844653c274babdce5a332dbe5c79b630
Gerrit-Change-Number: 10525
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Wed, 30 May 2018 18:45:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](2.x) IMPALA-7061: Rework HBase splitting and assignment

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10545 )

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2570/


--
To view, visit http://gerrit.cloudera.org:8080/10545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10545
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 30 May 2018 18:39:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5216: Make admission control queuing async

2018-05-30 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10060 )

Change subject: IMPALA-5216: Make admission control queuing async
..


Patch Set 16: Code-Review+2

Tim, do you have any more comments?


--
To view, visit http://gerrit.cloudera.org:8080/10060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf
Gerrit-Change-Number: 10060
Gerrit-PatchSet: 16
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 May 2018 18:37:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5216: Make admission control queuing async

2018-05-30 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10060 )

Change subject: IMPALA-5216: Make admission control queuing async
..


Patch Set 16:

> - GetOperationStatus() after rejection should show reason.
 > => GetOperationStatus contains the operation state and the status
 > message. Unfortunately for a CANCELLED status the message is empty.
 > Instead we can take care of this coverage after IMPALA-1262 where
 > we can check for operation state to verify this scenario.

I assume the CANCELLED case is covered by your sleeps. Is the 
GetOperationStatus() (or beeswax equivalent) after rejection/timeout covered by 
some test? (Presumably the admission controller test?)


--
To view, visit http://gerrit.cloudera.org:8080/10060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf
Gerrit-Change-Number: 10060
Gerrit-PatchSet: 16
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 May 2018 18:35:26 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-7061: Rework HBase splitting and assignment

2018-05-30 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10545 )

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10545
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 30 May 2018 18:17:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5552: Add support for authorized proxy groups

2018-05-30 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10510 )

Change subject: IMPALA-5552: Add support for authorized proxy groups
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10510/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10510/5//COMMIT_MSG@11
PS5, Line 11: impalad
> question(s) on the design, since I've not seen this feature (or the related
This is consistent with the existing user delegation flags.

For what it's worth, we have a many options (including HDFS's core-site.xml and 
hadoop-site.xml) where effectively you need coherent configs across the 
cluster. We assume people use tools to make it happen.



--
To view, visit http://gerrit.cloudera.org:8080/10510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
Gerrit-Change-Number: 10510
Gerrit-PatchSet: 5
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 30 May 2018 18:17:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](2.x) IMPALA-7061: Rework HBase splitting and assignment

2018-05-30 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10545 )

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..


Patch Set 1:

2.x cherrypick required moving some files around, but no major changes 
otherwise. Dataload + fe tests succeeded locally.


--
To view, visit http://gerrit.cloudera.org:8080/10545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10545
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 30 May 2018 18:17:17 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-7061: Rework HBase splitting and assignment

2018-05-30 Thread Joe McDonnell (Code Review)
Hello Philip Zeyliger, Impala Public Jenkins,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/10545

to review the following change.


Change subject: IMPALA-7061: Rework HBase splitting and assignment
..

IMPALA-7061: Rework HBase splitting and assignment

Some frontend PlannerTests rely on HBase tables being
arranged in a deterministic way. Specifically, the
HBase tables need to be split with specific region
boundaries and those regions need to be assigned to
specific HBase region servers.

Currently, the tables are created without splits and
testdata/bin/split-hbase.sh runs Java code in
HBaseTestDataRegionAssignment to split and assign
the tables. This runs during dataload via
testdata/bin/create-load-data.sh and during tests
with bin/run-all-tests.sh. There are problems with
both parts of this process. The table splitting is
flaky. Since significant time can pass between the
assignments and the tests, rebalancing means the
assignments are not always stable.

This changes the process so that the HBase tables are
created with the splits already specified via the
HBase shell. The splits remain stable over time.
PlannerTestBase runs the assignment code in
HBaseTestDataRegionAssignment at the start of
the PlannerTests. This makes the assignments
deterministic. No other tests depends on the
exact assignments, so this does not regress anything.

Testing:
 - Local testing
 - Ran gerrit-verify-dryrun-external

2.x does not have minicluster profiles, so the
HBaseTestDataRegionAssignment.java is in the regular
fe/src/test directories.

Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Reviewed-on: http://gerrit.cloudera.org:8080/10447
Reviewed-by: Philip Zeyliger 
Tested-by: Impala Public Jenkins 
(cherry picked from commit 9a5410570e25431813b96e00f7b91db44f672f38)
---
M bin/run-all-tests.sh
A 
fe/src/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
D testdata/bin/split-hbase.sh
M testdata/datasets/functional/functional_schema_template.sql
D 
testdata/src/main/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java
8 files changed, 165 insertions(+), 390 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/10545/1
--
To view, visit http://gerrit.cloudera.org:8080/10545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10545
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] [DOCS] Fixed an inconsistent table ordering recommendations with STRAIGHT JOIN

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10519 )

Change subject: [DOCS] Fixed an inconsistent table ordering recommendations 
with STRAIGHT_JOIN
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ff2245b6d8e3160a10aae64f622813a40121ee8
Gerrit-Change-Number: 10519
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 May 2018 17:50:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Fixed an inconsistent table ordering recommendations with STRAIGHT JOIN

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10519 )

Change subject: [DOCS] Fixed an inconsistent table ordering recommendations 
with STRAIGHT_JOIN
..

[DOCS] Fixed an inconsistent table ordering recommendations with STRAIGHT_JOIN

Change-Id: I9ff2245b6d8e3160a10aae64f622813a40121ee8
Reviewed-on: http://gerrit.cloudera.org:8080/10519
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
M docs/topics/impala_perf_joins.xml
1 file changed, 4 insertions(+), 5 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/10519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9ff2245b6d8e3160a10aae64f622813a40121ee8
Gerrit-Change-Number: 10519
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5552: Add support for authorized proxy groups

2018-05-30 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/10510 )

Change subject: IMPALA-5552: Add support for authorized proxy groups
..

IMPALA-5552: Add support for authorized proxy groups

The patch adds support for mapping of users to a list of proxy groups.

The following flags are added in impalad:
- authorized_proxy_group_config
- authorized_proxy_group_config_delimiter

Example:
--authorized_proxy_group_config=hue=group1,group2;user1=*

This feature is not supported on Shell-based Hadoop groups mapping
providers.

Testing:
- Added FE unit test to check for groups mapping provider
- Added BE unit test for the parsing logic
- Added a new test in test_authorization.py
- Ran all end-to-end test_authorization.py

Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
---
M be/src/service/CMakeLists.txt
M be/src/service/frontend.cc
M be/src/service/frontend.h
A be/src/service/impala-server-test.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/backend-gflag-util.cc
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A fe/src/test/java/org/apache/impala/service/JniFrontendTest.java
M tests/authorization/test_authorization.py
11 files changed, 367 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/10510/6
--
To view, visit http://gerrit.cloudera.org:8080/10510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
Gerrit-Change-Number: 10510
Gerrit-PatchSet: 6
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5552: Add support for authorized proxy groups

2018-05-30 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10510 )

Change subject: IMPALA-5552: Add support for authorized proxy groups
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10510/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10510/5//COMMIT_MSG@11
PS5, Line 11: impalad
question(s) on the design, since I've not seen this feature (or the related, 
user delegation support). are these flags only relevant for coordinators? why 
are these flags added to impalad? is there a case for different impalads having 
different auth lists? if there is no case, then this looks like it opens up 
opportunities for misconfiguration. perhaps catalogd is a better place for 
this? (I could be wrong on this since I didn't see how these proxies are used)


http://gerrit.cloudera.org:8080/#/c/10510/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/10510/5/be/src/service/impala-server.cc@406
PS5, Line 406: --authorized_proxy_user_config flag or 
--authorized_proxy_group_config
can be specific to to the authorized_proxy_config for which this method was 
called.


http://gerrit.cloudera.org:8080/#/c/10510/5/be/src/service/impala-server.cc@412
PS5, Line 412: parsed_allowed_users_or_groups
if ws is included around a group or user string, e.g., " *   ", will (and 
should) that ws be trimmed from begin/end?


http://gerrit.cloudera.org:8080/#/c/10510/5/be/src/service/impala-server.cc@416
PS5, Line 416: make_pair(
can use a literal pair constructor: { ... }


http://gerrit.cloudera.org:8080/#/c/10510/5/be/src/service/impala-server.cc@1383
PS5, Line 1383: RETURN_IF_ERROR(s);
its an error, no need to wrap.


http://gerrit.cloudera.org:8080/#/c/10510/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/10510/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java@635
PS5, Line 635: startsWith
This makes the error messages in the security package part of the api. Is there 
any existing alternative to avoid this? If not, is there a JIRA filed to change 
that?



--
To view, visit http://gerrit.cloudera.org:8080/10510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
Gerrit-Change-Number: 10510
Gerrit-PatchSet: 5
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 30 May 2018 17:47:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Fixed an inconsistent table ordering recommendations with STRAIGHT JOIN

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10519 )

Change subject: [DOCS] Fixed an inconsistent table ordering recommendations 
with STRAIGHT_JOIN
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-docs-submit/301/


--
To view, visit http://gerrit.cloudera.org:8080/10519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ff2245b6d8e3160a10aae64f622813a40121ee8
Gerrit-Change-Number: 10519
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 May 2018 17:41:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] A typo fix in runtime filtering doc

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10544 )

Change subject: [DOCS] A typo fix in runtime filtering doc
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10544
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43006540ed24c957c9ad2ec54707bb9c678d6fb3
Gerrit-Change-Number: 10544
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 30 May 2018 17:13:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] A typo fix in runtime filtering doc

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10544 )

Change subject: [DOCS] A typo fix in runtime filtering doc
..

[DOCS] A typo fix in runtime filtering doc

Change-Id: I43006540ed24c957c9ad2ec54707bb9c678d6fb3
Reviewed-on: http://gerrit.cloudera.org:8080/10544
Reviewed-by: Sailesh Mukil 
Tested-by: Impala Public Jenkins 
---
M docs/topics/impala_runtime_filtering.xml
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Sailesh Mukil: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/10544
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I43006540ed24c957c9ad2ec54707bb9c678d6fb3
Gerrit-Change-Number: 10544
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] [DOCS] A typo fix in runtime filtering doc

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10544 )

Change subject: [DOCS] A typo fix in runtime filtering doc
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-docs-submit/300/


--
To view, visit http://gerrit.cloudera.org:8080/10544
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43006540ed24c957c9ad2ec54707bb9c678d6fb3
Gerrit-Change-Number: 10544
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 30 May 2018 17:07:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] A typo fix in runtime filtering doc

2018-05-30 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10544 )

Change subject: [DOCS] A typo fix in runtime filtering doc
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10544
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43006540ed24c957c9ad2ec54707bb9c678d6fb3
Gerrit-Change-Number: 10544
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 30 May 2018 16:55:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] A typo fix in runtime filtering doc

2018-05-30 Thread Alex Rodoni (Code Review)
Alex Rodoni has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10544


Change subject: [DOCS] A typo fix in runtime filtering doc
..

[DOCS] A typo fix in runtime filtering doc

Change-Id: I43006540ed24c957c9ad2ec54707bb9c678d6fb3
---
M docs/topics/impala_runtime_filtering.xml
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/10544/1
--
To view, visit http://gerrit.cloudera.org:8080/10544
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I43006540ed24c957c9ad2ec54707bb9c678d6fb3
Gerrit-Change-Number: 10544
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 


[Impala-ASF-CR] IMPALA-6812: Fix flaky Kudu scan tests

2018-05-30 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10503 )

Change subject: IMPALA-6812: Fix flaky Kudu scan tests
..


Patch Set 1:

> I'm sympathetic to the problem that people may still encounter
 > these test failures if they run the tests through a means other
 > than run-all-tests.sh
 >

Perhaps one option is to make kudu_read_mode a query option, so that it can be 
set in the query options by the python code.

Thomas, can you think of any meaningful Impala/Kudu integration/functional 
testing that we'll miss out if we change the tests to all rn with 
READ_AT_SNAPSHOT. It does seem like we need to enable this in order to get 
predictable scan results, but I also agree with Tim that it's best to test in 
the way most similar to the defaults.


--
To view, visit http://gerrit.cloudera.org:8080/10503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I70df84f2cbc663107f2ad029565d3c15bdfbd47c
Gerrit-Change-Number: 10503
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 30 May 2018 16:42:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Fixed an inconsistent table ordering recommendations with STRAIGHT JOIN

2018-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10519 )

Change subject: [DOCS] Fixed an inconsistent table ordering recommendations 
with STRAIGHT_JOIN
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ff2245b6d8e3160a10aae64f622813a40121ee8
Gerrit-Change-Number: 10519
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 May 2018 16:34:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5706: Spilling sort optimisations

2018-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9943 )

Change subject: IMPALA-5706: Spilling sort optimisations
..


Patch Set 17: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9
Gerrit-Change-Number: 9943
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 May 2018 16:26:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5706: Spilling sort optimisations

2018-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9943 )

Change subject: IMPALA-5706: Spilling sort optimisations
..


Patch Set 17:

Sorry for the delay here, got busy with some unplanned things.


--
To view, visit http://gerrit.cloudera.org:8080/9943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9
Gerrit-Change-Number: 9943
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 May 2018 16:26:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-05-30 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 10:

(5 comments)

> > > > > > Uploaded patch set 9.
 > > > > >
 > > > > > Patch -set 9 contains the following changes:
 > > > > > - Added a full timezone db to testdata/tzdb.
 > > > > > - End-to-end tests and BE-tests were changed to use this
 > > > timezone
 > > > > > db. This was necessary because some timezone-tests were
 > > failing
 > > > > on
 > > > > > older jenkins workers that had an older tzdata package
 > > > installed.
 > > > >
 > > > > It might be a good idea to store the timezone-db files in one
 > > > .tar
 > > > > file and extract them before running the tests. What do you
 > > > think?
 > > >
 > > > I agree, .taring or compressing the tz db would be much better,
 > > if
 > > > it does not make the code too complicated. Having less file
 > would
 > > > make the review more readable,

>From a review ability perspective, there's absolutely no need for this commit 
>to have 221 timezone files. You can test it with ~3, or do a separate commit. 
>i.e., it's neither here nor there.

 > >
 > > Alternatively we can store timezone files in a JAR archive
 > instead.
 > > The BE can call into the java FE to extract files from it.
 >
 > Tim, Dan, what do you think?

The Yarn equivalent here has this notion of a "distributed cache" which is to 
say it stores the files locally and re-uses them across jobs. I can't tell if 
we should be worried that all impalads, at boot time, will slam HDFS with 
reading the timezonedb. I think reading ~200 files per impalad times 200 
impalad daemons may be a lot of HDFS metadata load, but maybe it's comparable 
to what we do for queries anyway. I certainly think that tar or jar is a better 
way to go. Since this is happening at boot, we can probably still fork to tar, 
which we can assume is available, or use Java, which has tar libraries and 
native support for zip.

http://gerrit.cloudera.org:8080/#/c/9986/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9986/10//COMMIT_MSG@35
PS10, Line 35:   specify an HDFS/S3/ADLS location that contains the shared 
compiled
In what format? Does it add to the host one or override it?

My /usr/share/zoneinfo is full of symlinks which HDFS doesn't support in some 
configurations (and S3 certainly doesn't).


http://gerrit.cloudera.org:8080/#/c/9986/10//COMMIT_MSG@41
PS10, Line 41: - The name of the coordinator node’s local time-zone is saved to 
the
Is it easy to tell what the local time zone is of an impalad node? (E.g., do we 
log it?)


http://gerrit.cloudera.org:8080/#/c/9986/10//COMMIT_MSG@45
PS10, Line 45: - Introduces a new startup flag (--hdfs_zone_abbrev_conf) to 
impalad
What's the distinction between this and --hdfs_zone_info_dir? Do we need both?


http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan
File testdata/tzdb/2017c/Africa/Abidjan:

http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan@1
PS10, Line 1: ../Atlantic/St_Helena
We're adding a ton of files. Do we need such a big database for our testing 
purposes?


http://gerrit.cloudera.org:8080/#/c/9986/10/tests/custom_cluster/test_shared_tzdb.py
File tests/custom_cluster/test_shared_tzdb.py:

http://gerrit.cloudera.org:8080/#/c/9986/10/tests/custom_cluster/test_shared_tzdb.py@38
PS10, Line 38: cls.ImpalaTestMatrix.add_constraint(lambda v:
Add a comment about what this is trying to do?



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 30 May 2018 15:59:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6987: [DOCS] Refactor the INVALIDATE METADATA and REFRESH docs

2018-05-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10339 )

Change subject: IMPALA-6987: [DOCS] Refactor the INVALIDATE METADATA and 
REFRESH docs
..


Patch Set 4:

Sorry for the delay. Taking a look today.


--
To view, visit http://gerrit.cloudera.org:8080/10339
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2124e14900d0f82569c061cc46006447bb054b36
Gerrit-Change-Number: 10339
Gerrit-PatchSet: 4
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 30 May 2018 15:40:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-05-30 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 10:

> > > > > Uploaded patch set 9.
 > > > >
 > > > > Patch -set 9 contains the following changes:
 > > > > - Added a full timezone db to testdata/tzdb.
 > > > > - End-to-end tests and BE-tests were changed to use this
 > > timezone
 > > > > db. This was necessary because some timezone-tests were
 > failing
 > > > on
 > > > > older jenkins workers that had an older tzdata package
 > > installed.
 > > >
 > > > It might be a good idea to store the timezone-db files in one
 > > .tar
 > > > file and extract them before running the tests. What do you
 > > think?
 > >
 > > I agree, .taring or compressing the tz db would be much better,
 > if
 > > it does not make the code too complicated. Having less file would
 > > make the review more readable, and would also make the tz db
 > > consume much less space on hdfs, as the many small files will be
 > > rounded up to hdfs block size.
 >
 > Extracting files from a .tar file can be tricky. Probably we would
 > have to add libtar library to the native-toolchain to handle .tar
 > files.
 >
 > Alternatively we can store timezone files in a JAR archive instead.
 > The BE can call into the java FE to extract files from it.

Tim, Dan, what do you think?


--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 30 May 2018 15:34:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-05-30 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 10:

> > > > Uploaded patch set 9.
 > > >
 > > > Patch -set 9 contains the following changes:
 > > > - Added a full timezone db to testdata/tzdb.
 > > > - End-to-end tests and BE-tests were changed to use this
 > timezone
 > > > db. This was necessary because some timezone-tests were failing
 > > on
 > > > older jenkins workers that had an older tzdata package
 > installed.
 > >
 > > It might be a good idea to store the timezone-db files in one
 > .tar
 > > file and extract them before running the tests. What do you
 > think?
 >
 > I agree, .taring or compressing the tz db would be much better, if
 > it does not make the code too complicated. Having less file would
 > make the review more readable, and would also make the tz db
 > consume much less space on hdfs, as the many small files will be
 > rounded up to hdfs block size.

Extracting files from a .tar file can be tricky. Probably we would have to add 
libtar library to the native-toolchain to handle .tar files.

Alternatively we can store timezone files in a JAR archive instead. The BE can 
call into the java FE to extract files from it.


--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 30 May 2018 15:25:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6119: Fix issue with multiple partitions sharing same location

2018-05-30 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10543


Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
..

IMPALA-6119: Fix issue with multiple partitions sharing same location

When multiple partitions point to the same location and a new
data file is added to any of them then the expected behaviour is that
this new file is added to the other partitions pointing to the same
location as well. Apparently, this is not the case and right after
the insertion the new file is only visible in the partition where it
was inserted to and an invalidate metadata is needed to resolve this
inconsistency.
This fix addresses this issue with keeping track of a mapping between
locations and the HdfsPartitions pointing to it. When new files are
inserted into a partition then all the other partition's metadata are
reloaded that point to the same location as the one where the files
are inserted.

Testing:
There was an existing test that covered partitions pointing to the
same location. However, after each insert it executed a refresh to
reload the metadata for the entire table. This reload was removed
to cover the changes of this fix.
Another test is introduced to cover the case when the location of a
partition is altered.

Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_partition_metadata.py
3 files changed, 107 insertions(+), 12 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/10543/1
--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 


[Impala-ASF-CR] IMPALA-7090: Limit the size of expr created by EqualityDisjunctsToInRule

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10528 )

Change subject: IMPALA-7090: Limit the size of expr created by 
EqualityDisjunctsToInRule
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie40c3210271a9e3c7f1b2b869a8c2ec8bacaa72a
Gerrit-Change-Number: 10528
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 30 May 2018 09:28:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] Revert IMPALA-2751: Matching quotes are not requirerd in comments

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10537 )

Change subject: Revert IMPALA-2751: Matching quotes are not requirerd in 
comments
..

Revert IMPALA-2751: Matching quotes are not requirerd in comments

This patch is causing a large number of builds to fail, see
IMPALA-7089.

Change-Id: Id9995a91408d86a5ae1ecd70d07b02622ae26b43
Reviewed-on: http://gerrit.cloudera.org:8080/10537
Reviewed-by: Philip Zeyliger 
Tested-by: Impala Public Jenkins 
---
M shell/impala_shell.py
M tests/shell/test_shell_interactive.py
2 files changed, 5 insertions(+), 25 deletions(-)

Approvals:
  Philip Zeyliger: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/10537
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id9995a91408d86a5ae1ecd70d07b02622ae26b43
Gerrit-Change-Number: 10537
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] Revert IMPALA-2751: Matching quotes are not requirerd in comments

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10537 )

Change subject: Revert IMPALA-2751: Matching quotes are not requirerd in 
comments
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10537
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id9995a91408d86a5ae1ecd70d07b02622ae26b43
Gerrit-Change-Number: 10537
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 30 May 2018 08:12:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6338: Disable more flaky bloom filter tests

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10530 )

Change subject: IMPALA-6338: Disable more flaky bloom filter tests
..

IMPALA-6338: Disable more flaky bloom filter tests

Until IMPALA-6338 is fixed, temporarily disable tests that are
affected by it - any test that has a 'limit' and relies on the
contents of the runtime profile. This patch disables the runtime
profile check for all such tests in bloom_filter.test

Change-Id: Ifc9da892efa3b27d63056ad8e3befac82808ffdb
Reviewed-on: http://gerrit.cloudera.org:8080/10530
Reviewed-by: Bikramjeet Vig 
Tested-by: Impala Public Jenkins 
---
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
1 file changed, 12 insertions(+), 8 deletions(-)

Approvals:
  Bikramjeet Vig: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/10530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ifc9da892efa3b27d63056ad8e3befac82808ffdb
Gerrit-Change-Number: 10530
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-6338: Disable more flaky bloom filter tests

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10530 )

Change subject: IMPALA-6338: Disable more flaky bloom filter tests
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc9da892efa3b27d63056ad8e3befac82808ffdb
Gerrit-Change-Number: 10530
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 30 May 2018 08:00:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5552: Add support for authorized proxy groups

2018-05-30 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10510 )

Change subject: IMPALA-5552: Add support for authorized proxy groups
..


Patch Set 5:

(2 comments)

> Patch Set 3:
>
> Update the commit message - IMPALA-5552 vs 5522

Fixed the typo.

http://gerrit.cloudera.org:8080/#/c/10510/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/10510/2/be/src/service/impala-server.cc@319
PS2, Line 319:   }
 :
 :   if (!FLAGS_authorized_proxy_group_config.empty()) {
 : AddAuthorizedProxyConfig(
 : aut
> The error can be more generic saying that there's something invalid about t
Done. I made the error more generic and removed the use of lambda.


http://gerrit.cloudera.org:8080/#/c/10510/2/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/10510/2/fe/src/main/java/org/apache/impala/service/JniFrontend.java@623
PS2, Line 623:
> For common Impala deployments, do you know what's in use?
That's weird that CM defaults to ShellBasedUnixGroupsMapping and I think that 
needs to be updated to use the JNI with fallback implementation instead. FYI, 
Sentry also relies calls the same API to get the groups: 
https://github.com/apache/sentry/blob/97f666345c1c068fed56a29512f9dcea77e44c8a/sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/HadoopGroupMappingService.java#L60

I added a check to fail hard when the groups mapping provider is a shell-based 
implementation.



--
To view, visit http://gerrit.cloudera.org:8080/10510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
Gerrit-Change-Number: 10510
Gerrit-PatchSet: 5
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 30 May 2018 06:28:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5552: Add support for authorized proxy groups

2018-05-30 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/10510 )

Change subject: IMPALA-5552: Add support for authorized proxy groups
..

IMPALA-5552: Add support for authorized proxy groups

The patch adds support for mapping of users to a list of proxy groups.

The following flags are added in impalad:
- authorized_proxy_group_config
- authorized_proxy_group_config_delimiter

Example:
--authorized_proxy_group_config=hue=group1,group2;user1=*

This feature is not supported on Shell-based Hadoop groups mapping
providers.

Testing:
- Added FE unit test to check for groups mapping provider
- Added BE unit test for the parsing logic
- Added a new test in test_authorization.py
- Ran all end-to-end test_authorization.py

Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
---
M be/src/service/CMakeLists.txt
M be/src/service/frontend.cc
M be/src/service/frontend.h
A be/src/service/impala-server-test.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/backend-gflag-util.cc
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A fe/src/test/java/org/apache/impala/service/JniFrontendTest.java
M tests/authorization/test_authorization.py
11 files changed, 364 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/10510/5
--
To view, visit http://gerrit.cloudera.org:8080/10510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
Gerrit-Change-Number: 10510
Gerrit-PatchSet: 5
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7090: Limit the size of expr created by EqualityDisjunctsToInRule

2018-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10528 )

Change subject: IMPALA-7090: Limit the size of expr created by 
EqualityDisjunctsToInRule
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2568/


--
To view, visit http://gerrit.cloudera.org:8080/10528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie40c3210271a9e3c7f1b2b869a8c2ec8bacaa72a
Gerrit-Change-Number: 10528
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 30 May 2018 06:07:11 +
Gerrit-HasComments: No