[07/19] impala git commit: IMPALA-6447: remove Python 2.7 dictionary comprehensions

2018-02-02 Thread philz
IMPALA-6447: remove Python 2.7 dictionary comprehensions In the fix for IMPALA-6441, we began importing from the stress test (concurrent_select.py). The import fails on some downstream environments that use Python 2.6. The failure is due to the fact that concurrent_select.py uses a few dictionary

[2/3] impala git commit: IMPALA-3916: Reserve SQL:2016 reserved words

2018-02-02 Thread philz
IMPALA-3916: Reserve SQL:2016 reserved words This patch reserves SQL:2016 reserved words, excluding: 1. Impala builtin function names. 2. Time unit words(year, month, etc.). 3. An exception list based on a discussion. Some test cases are modified to avoid these words. A impalad and catalogd

[02/19] impala git commit: IMPALA-5528: Bump total thread cache size when KRPC is enabled

2018-02-02 Thread philz
IMPALA-5528: Bump total thread cache size when KRPC is enabled KRPC in general tends to put more pressure on the thread caches due to allocations of more small objects (i.e. <1MB). While some of them are being addressed in KUDU-1865, it's shown that the following TCMalloc workarounds will provide

[04/19] impala git commit: IMPALA-6454: CTAS into Kudu fails with mixed-case partition or primary key column names.

2018-02-02 Thread philz
IMPALA-6454: CTAS into Kudu fails with mixed-case partition or primary key column names. CTAS into Kudu fails if the primary key and/or the partition column names are not specified in lower case.The problem is that we pass in the primary key column names directly from the parser instead we

[14/19] impala git commit: IMPALA-6450: fix EventSequence::Start()

2018-02-02 Thread philz
IMPALA-6450: fix EventSequence::Start() It looks like this newly-added DCHECK is being hit because of the same underlying issue as IMPALA-4631. This patch loosens the DCHECK to accept time going backward 1 tick, the same as the original workaround for IMPALA-4631. 'offset_' also isn't being

[3/3] impala git commit: IMPALA-6346: Potential deadlock in KrpcDataStreamMgr

2018-02-02 Thread philz
IMPALA-6346: Potential deadlock in KrpcDataStreamMgr In KrpcDataStreamMgr::CreateRecvr() we take the lock_ and then call recvr->TakeOverEarlySender() for all contexts. recvr->TakeOverEarlySender() then calls recvr_->mgr_->EnqueueDeserializeTask((), which can block if the deserialize pool queue is

[18/19] impala git commit: IMPALA-3562: support column restriction for compute stats

2018-02-02 Thread philz
IMPALA-3562: support column restriction for compute stats The 'compute stats' statement currently computes column-level statistics for all columns of a table. This adds potentially unneeded work for columns whose stats are not needed by queries. It can be especially costly for very wide tables

[09/19] impala git commit: IMPALA-6303: [DOCS] Fix incorrect mention of DataNodes

2018-02-02 Thread philz
IMPALA-6303: [DOCS] Fix incorrect mention of DataNodes The documentation on Impala Components (https://impala.apache.org/ docs/build/html/topics/impala_components.html) incorrectly states "... catalog service relays the metadata changes from Impala SQL statements to all the DataNodes in a cluster

[11/19] impala git commit: IMPALA-6455: unique tmpdirs for test_partition_metadata_compatibility

2018-02-02 Thread philz
IMPALA-6455: unique tmpdirs for test_partition_metadata_compatibility Concurrent hive statements running in local mode can race to modify the contents of temporary directories - see IMPALA-6108. This applies the workaround for IMPALA-6108 to the run_stmt_in_hive() utility function, which is used

[15/19] impala git commit: IMPALA-6430: Log relevant debug pages if wait_for_metric_value times out

2018-02-02 Thread philz
IMPALA-6430: Log relevant debug pages if wait_for_metric_value times out Log the memz, metrics and query page if the method wait_for_metric_value times out. This would help us understand the state of the defaulting impalad when the time out happens. Change-Id:

[05/19] impala git commit: IMPALA-6242: Change runtime-profile-test into using the same clock

2018-02-02 Thread philz
IMPALA-6242: Change runtime-profile-test into using the same clock In runtime-profile-test, both MonotonicStopWatch::Now() and MonotonicNanos() are used. The former may use CLOCK_MONOTONIC_COARSE or CLOCK_MONOTONIC while the latter always uses CLOCK_MONOTONIC. This may contribute to the flakiness

[06/19] impala git commit: IMPALA-6215: Removes race when using LibCache.

2018-02-02 Thread philz
IMPALA-6215: Removes race when using LibCache. LibCache's api to provide access to locally cached files has a race. Currently, the client of the cache accesses the locally cached path as a string, but nothing guarantees that the associated file is not removed before the client is done using it.

[16/19] impala git commit: IMPALA-6193: Track memory of incoming data streams

2018-02-02 Thread philz
IMPALA-6193: Track memory of incoming data streams This change adds memory tracking to incoming transmit data RPCs when using KRPC. We track memory against a global tracker called "Data Stream Service" until it is handed over to the stream manager. There we track it in a global tracker called

[13/19] impala git commit: IMPALA-3282: Adds regexp_escape built-in function

2018-02-02 Thread philz
IMPALA-3282: Adds regexp_escape built-in function Escapes the following special characters in RE2 library: .\+*?[^]$(){}=!<>|:- Testing: Add some unit tests into ExprTest.StringRegexpFunctions Add some E2E tests into exprs.test Change-Id: I84c3e0ded26f6eb20794c38b75be9b25cd111e4b Reviewed-on:

[19/19] impala git commit: IMPALA-6429: Fix decimal division

2018-02-02 Thread philz
IMPALA-6429: Fix decimal division Before this patch, it was possible for an overflow to not be detected when doing a decimal division. When scaling up the dividend before doing the division, we do not check for overflow. This is ok if the we are scaling up by 10^38 or less because the result is

[17/19] impala git commit: IMPALA-6441 addendum: fix reading rows from HS2 via Impyla

2018-02-02 Thread philz
IMPALA-6441 addendum: fix reading rows from HS2 via Impyla When fetching explain output from HS2 using Impyla, rows come back in lists of 1-tuples. This patch exhibits the need to do end-to-end testing when the case warrants. In this case, although the unit test for

[10/19] impala git commit: IMPALA-6024: Min sample bytes for COMPUTE STATS TABLESAMPLE

2018-02-02 Thread philz
IMPALA-6024: Min sample bytes for COMPUTE STATS TABLESAMPLE Adds a new query option COMPUTE_STATS_MIN_SAMPLE_SIZE which is the minimum number of bytes that will be scanned in COMPUTE STATS TABLESAMPLE, regardless of the user-supplied sampling percent. The motivation is to prevent sampling for

[12/19] impala git commit: IMPALA-2642: Fix a potential deadlock in statestore

2018-02-02 Thread philz
IMPALA-2642: Fix a potential deadlock in statestore The statestored can deadlock if the number of subscribers has reached STATESTORE_MAX_SUBSCRIBERS, because the DoSubscriberUpdate() method calls OfferUpdate(), while holding subscribers_lock_, which also tries to take the same lock in this

[1/4] impala git commit: IMPALA-6471: [docs] Corrected ALTER TABLE ADD PARTITION syntax for Kudu table

2018-02-02 Thread tarmstrong
Repository: impala Updated Branches: refs/heads/master ff86feaa6 -> a018038df IMPALA-6471: [docs] Corrected ALTER TABLE ADD PARTITION syntax for Kudu table Change-Id: I70c49286ed6e250707a6edb5ecd77448d1142d0c Reviewed-on: http://gerrit.cloudera.org:8080/9187 Reviewed-by: Thomas

[3/4] impala git commit: IMPALA-5990: End-to-end compression of metadata

2018-02-02 Thread tarmstrong
IMPALA-5990: End-to-end compression of metadata Currently the catalog data is compressed in the statestore, but uncompressed when passed between FE and BE. It results in a ~2GB limit on the metadata. IMPALA-3499 introduced a workaround in the impalad but there isn't one in the catalogd. This