[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 5 of "Cloudera Manager" removal. Most of these fixes entailed hiding the paragraph where there are mentions of CM and writing a replacement paragraph for the u

2017-02-17 Thread Laurel Hale (Code Review)
Laurel Hale has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6070

Change subject: IMPALA-3401 [DOCS] Phase 5 of "Cloudera Manager" removal. Most 
of these fixes entailed hiding the paragraph where there are mentions of CM and 
writing a replacement paragraph for the upstream docs. This removes the CM 
references from the rendered docs. A 
..

IMPALA-3401 [DOCS] Phase 5 of "Cloudera Manager"
removal. Most of these fixes entailed hiding the
paragraph where there are mentions of CM and
writing a replacement paragraph for the upstream
docs. This removes the CM references from the
rendered docs. A subsequent cleanup project will
remove occurrences of CM from the XML.

Change-Id: I4967fae275a8822274aece14a15107237445aba5
---
M docs/topics/impala_isilon.xml
M docs/topics/impala_logging.xml
M docs/topics/impala_s3.xml
M docs/topics/impala_webui.xml
4 files changed, 78 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/6070/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4967fae275a8822274aece14a15107237445aba5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 


[Impala-ASF-CR] IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal. Most of these fixes involved hiding the paragraphs with the DITA attribute 'audience="hidden"' and then inserting a paragraph s

2017-02-17 Thread Laurel Hale (Code Review)
Laurel Hale has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6069

Change subject: IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal. Most 
of these fixes involved hiding the paragraphs with the DITA attribute 
'audience="hidden"' and then inserting a paragraph suitable for upstream 
documentation. This hides the mention of Cloudera 
..

IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal.
Most of these fixes involved hiding the paragraphs with
the DITA attribute 'audience="hidden"' and then inserting
a paragraph suitable for upstream documentation. This
hides the mention of Cloudera Manager in the rendered
documentation. In a subsequent cleanup project, the
"Cloudera Manager" mentions will be removed from the
XML.

Change-Id: I3c3c2177e0b9c4c81f1541820013c66a59c0c7b1
---
M docs/shared/impala_common.xml
M docs/topics/impala_perf_resources.xml
M docs/topics/impala_perf_skew.xml
M docs/topics/impala_perf_testing.xml
M docs/topics/impala_scalability.xml
M docs/topics/impala_txtfile.xml
6 files changed, 49 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/6069/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3c3c2177e0b9c4c81f1541820013c66a59c0c7b1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 


[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5969/5/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 215: int msk_transition_day = 2456956;
const for these three vars


-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5969/5/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 226: tv.time().hours() < (msk_transition_hour_utc + 
msk_utc_offset) % 24)) {
Is this correct? Before, we had tv.time().hours() < 1. Now we have 22 + 4 % 24 
which is 2.


-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test

2017-02-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6068/1//COMMIT_MSG
Commit Message:

Line 22: decimal V2 clauses.  Will investigate further.
I think this probably should be fixed in this commit.


-- 
To view, visit http://gerrit.cloudera.org:8080/6068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS5, Line 1122: ((double) state->source_size - r) / state->source_size
I'm not yet convinced this is correct. I'm not convinced it is correct in HEAD.

First, a nit: I think the denominator should be increased by 1. (I think the 
expected value of the maximum of two independent random variables over the 
uniform distribution on (0,1) is 2/3, not 3/4)

Second, while I understand that this produces values in the given range, I 
don't think I understand how this implements the algorithm mentioned in the .h 
file, in which values of weight k are mapped to r^(1/k), where r is uniform 
random from (0,1).


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Inline big number strings

2017-02-17 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#7).

Change subject: IMPALA-2020: Inline big number strings
..

IMPALA-2020: Inline big number strings

We can directly spcecify these in the header instead of keeping the
constants out in a .o.  This lets the compiler inline them in a lot
more places, potentially giving better behavior than a table lookup
by directly inserting the constant.  This in turn allows divides by
now known compile time constants to be converted into multiply.
Avoiding the table lookup seems to be a win for other types as well,
giving anywhere from 18% to ~33% speedup over baseline.

This is likely to become even more imporant with big integer divide,
which requires additional computation for DECIMAL_V2.

Testing: Compare the binary output of decimal-util.cc before and
after; ran expr test suite.

Before:

[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:23:54 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d92070
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.83s

After:

[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 02:23:33 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=97438c98a3d950a4:1a45eeb9
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.63s

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
---
M be/src/benchmarks/overflow-benchmark.cc
M be/src/common/init.cc
M be/src/exprs/expr-value.h
M be/src/runtime/decimal-test.cc
M be/src/util/CMakeLists.txt
D be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
7 files changed, 19 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test

2017-02-17 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#2).

Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
..

IMPALA-4936 and IMPALA-4915: Fix decimal overflow test

The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: updated expr test with correctly fixed test cases.
For some reason, it seemed the test passed before that, and I
found that it wasn't correctly validating some of the NULL cases.

Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
2 files changed, 14 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/6068/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1121: int r = rand() % state->num_samples;
I know you didn't author this line, but could you switch to a better PRNG than 
rand, like 
http://en.cppreference.com/w/cpp/numeric/random/mersenne_twister_engine?


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 147 insertions(+), 34 

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 147 insertions(+), 34 

[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test

2017-02-17 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6068

Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
..

IMPALA-4936 and IMPALA-4915: Fix decimal overflow test

The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: updated expr test with correctly fixed test cases.
For some reason, it seemed the test passed before that, so it
is possible that it isn't correctly validating some of the non
decimal V2 clauses.  Will investigate further.

Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
2 files changed, 7 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/6068/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6025/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1025:   // can fit 10 times as many samples, up to a maximum of 
MAX_NUM_SAMPLES. We do not
> update references to '10'
Done


Line 1026:   // allocate room for MAX_NUM_SAMPLES samples up front in order to 
conserve memory.
> in order to -> to
Done


Line 1031:   if (new_capacity * 2 >= MAX_NUM_SAMPLES) new_capacity = 
MAX_NUM_SAMPLES;
> DCHECK_LT(state->capacity, new_capacity);
Done


Line 1051:   // ReservoirSamples. Initially, the array has enough room for 
MAX_NUM_SAMPLES / 1000
> update to reflect the new initial size
Done


Line 1063:   *state = ReservoirSampleState();
> It would be clearer what's happening if we added a constructor for Reservoi
Done


Line 1076:   // If the container is full, increase it's size by 10x.
> update 10
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4926: Upgrade LZ4 to 1.7.5

2017-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4926: Upgrade LZ4 to 1.7.5
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I10e4561d0e940fa15ca8248c8277acfc6dff3da3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..

IMPALA-4546: Fix Moscow timezone conversion after 2014

In 2014 Moscow timezone rules changed from UTC+3 with no DST to UTC+4
with no DST. A special case has been added to timestamp functions to
handle this.

Testing:
Added BE Expr tests.

Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
---
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
4 files changed, 114 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/5969/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..

IMPALA-4546: Fix Moscow timezone conversion after 2014

In 2014 Moscow timezone rules changed from UTC+3 with no DST to UTC+4
with no DST. A special case has been added to timestamp functions to
handle this.

Testing:
Added BE Expr tests.

Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
---
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
4 files changed, 114 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/5969/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5969/2/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 195:   // March 27, 2011 Moscow time transitioned to UTC+4 with no 
DST. NOTE: We currently
> we do not handle them (error/NULL?), or do we not handle them correctly?
We do not handle correctly (see patch 4)


Line 199:   return TIMEZONE_MSK_PRE_2011_DST;
> mention the time also 22:00:000 UTC because we check for that below
Done


Line 209:   }
> might be clearer to have const variable here int transition_hour_utc = 22 a
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Inline big number strings

2017-02-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
..


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/5902/5//COMMIT_MSG
Commit Message:

PS5, Line 27: Query: select sum(cast(cast(l_extendedprice as double) as
: decimal(14,2))) from tpch10_parquet.lineitem
: Query submitted at: 2017-02-18 01:23:54 (Coordinator:
: http://impala-dev:25000)
: Query progress can be monitored at:
: 
http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d92070
elide


PS5, Line 38: 0.83s
error bars?


http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/exprs/expr-value.h
File be/src/exprs/expr-value.h:

Line 177: decimal4_val = +DecimalUtil::MAX_UNSCALED_DECIMAL4;
> Unary plus necessary to convert to an Rvalue
Why do you want to convert it to an Rvalue? Just to avoid having to define it 
in a .cc file? If so, does defining (not declaring or initializing) in a .cc 
file slow down performance?

Also, add that comment in the file for future readers.


http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

Line 57:   template
Describe what it does.

Maybe call it Pow10 and call the parameter exponent?

private:


PS5, Line 58: T
Always is_integral?


PS5, Line 59: boost::
static_assert is now part of the language, but I'm not sure if it works on 
parameters, since parameters can't be constexpr.


Line 60: return scale == 0 ? 1 : 10 * GetConstScaleMultiplier(scale-1);
Check for no overflow, if it can be done simply.


Line 136:   if (__builtin_const_p(scale) && p < 10) return 
GetConstScaleMultiplier(scale);
Does this by itself speed anything up? If scale is constant at compile time, 
since this is inlined, won't it just inline to the right value?

https://godbolt.org/g/Tlfl8l


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 4:

(6 comments)

I'm pretty happy with this patch.

http://gerrit.cloudera.org:8080/#/c/6025/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1025:   // can fit 10 times as many samples, up to a maximum of 
MAX_NUM_SAMPLES. We do not
update references to '10'


Line 1026:   // allocate room for MAX_NUM_SAMPLES samples up front in order to 
conserve memory.
in order to -> to


Line 1031:   if (new_capacity * 2 >= MAX_NUM_SAMPLES) new_capacity = 
MAX_NUM_SAMPLES;
DCHECK_LT(state->capacity, new_capacity);

in case someone decides to significantly muck with the sampling constants such 
that this can overflow


Line 1051:   // ReservoirSamples. Initially, the array has enough room for 
MAX_NUM_SAMPLES / 1000
update to reflect the new initial size


Line 1063:   *state = ReservoirSampleState();
It would be clearer what's happening if we added a constructor for 
ReservoirSampleState that initializes the struct fields. Right now, it appears 
that num_samples and source_size could be uninitialized.


Line 1076:   // If the container is full, increase it's size by 10x.
update 10


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Inline big number strings

2017-02-17 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/exprs/expr-value.h
File be/src/exprs/expr-value.h:

Line 177: decimal4_val = +DecimalUtil::MAX_UNSCALED_DECIMAL4;
Unary plus necessary to convert to an Rvalue


http://gerrit.cloudera.org:8080/#/c/5902/4/be/src/util/decimal-util.cc
File be/src/util/decimal-util.cc:

Line 25
> If it's going to be this specific, why not
Let's go full tilt and actually fully inline this :)


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Inline big number strings

2017-02-17 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#5).

Change subject: IMPALA-2020: Inline big number strings
..

IMPALA-2020: Inline big number strings

We can directly spcecify these in the header instead of keeping the
constants out in a .o.  This lets the compiler inline them in a lot
more places, potentially giving better behavior than a table lookup
by directly inserting the constant.  This in turn allows divides by
now known compile time constants to be converted into multiply.
Avoiding the table lookup seems to be a win for other types as well,
giving anywhere from 18% to ~33% speedup over baseline.

This is likely to become even more imporant with big integer divide,
which requires additional computation for DECIMAL_V2.

Testing: Compare the binary output of decimal-util.cc before and
after; ran expr test suite.

Before:

[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:23:54 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d92070
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.83s

After:

[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 02:23:33 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=97438c98a3d950a4:1a45eeb9
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.63s

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
---
M be/src/benchmarks/overflow-benchmark.cc
M be/src/common/init.cc
M be/src/exprs/expr-value.h
M be/src/runtime/decimal-test.cc
M be/src/util/CMakeLists.txt
D be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
7 files changed, 20 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..


Patch Set 4: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/284/

-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6025/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS2, Line 1024: new_size 
> Queries with a large number buckets will run out of memory much quicker tha
2x works for me but:

For buckets with many samples you spend more time reallocating, and the peak 
memory consumption can be higher (space for 1.5 * MAX_NUM_SAMPLES).

See the experiment Taras posted in the commit msg. I'm fine with the 2x 
behavior, just pointing out the tradeoffs.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 141 insertions(+), 33 

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 141 insertions(+), 33 

[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream docs. Hid instances of CM and rewrote for upstream docs when necessary. This still leaves occurences of CM in the

2017-02-17 Thread Laurel Hale (Code Review)
Laurel Hale has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6067

Change subject: IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from 
upstream docs. Hid instances of CM and rewrote for upstream docs when 
necessary. This still leaves occurences of CM in the XML, but not in the 
rendered documentation. A later project will remove
..

IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream
docs. Hid instances of CM and rewrote for upstream docs when necessary.
This still leaves occurences of CM in the XML, but not in the rendered
documentation. A later project will remove all occurrences of CM from
the XML.

Change-Id: I4748300edc43b7071afc50e7cc7ddd64120c0d8d
---
M docs/topics/impala_impala_shell.xml
M docs/topics/impala_proxy.xml
M docs/topics/impala_resource_management.xml
M docs/topics/impala_timeouts.xml
M docs/topics/impala_udf.xml
5 files changed, 95 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/6067/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4748300edc43b7071afc50e7cc7ddd64120c0d8d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-17 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#15).

Change subject: IMPALA-2020: Add rounding for decimal casts
..

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

I have decided to wait on expanding the python test until the bugs
with overflow are fixed, which will make it easier to test sane
behavior.

[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 0|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5 |
+-+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 1|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.6 |
+-+
Fetched 1 row(s) in 0.01s

Note there is no free lunch:

[localhost:21000] > set decimal_v2=0;
DECIMAL_V2 set to 0
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:16:49 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=c546f270316b2176:820bb667
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.76s
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:16:52 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=524bf2693849ce99:be999b03
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.73s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:16:59 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=ca4f8413061576d9:2389ccde
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293814088985|
+--+
Fetched 1 row(s) in 0.85s
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:17:02 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=4d4b5f9f181306c7:f6726c6
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293814088985|
+--+
Fetched 1 row(s) in 0.96s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

[localhost:21000] > set decimal_v2=0;

[Impala-ASF-CR] Removing 'Cloudera Manager' from upstream docs. Used DITA attribute 'audience=hidden' when needed to preserve information for a future clean-up step. When it made sense, 'Cloudera Mana

2017-02-17 Thread Laurel Hale (Code Review)
Laurel Hale has posted comments on this change.

Change subject: Removing 'Cloudera Manager' from upstream docs. Used DITA 
attribute 'audience=hidden' when needed to preserve information for a future 
clean-up step. When it made sense, 'Cloudera Manager' was removed and sections 
were rewritten. This clean-up task does n
..


Patch Set 1:

Forgot to prepend "IMPALA-3401 [DOCS]. Will not forget next time.

-- 
To view, visit http://gerrit.cloudera.org:8080/6064
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76c9b53f587bc85c5c21e195f0a771183d4ef3a0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4926: Upgrade LZ4 to 1.7.5

2017-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4926: Upgrade LZ4 to 1.7.5
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/285/

-- 
To view, visit http://gerrit.cloudera.org:8080/6030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I10e4561d0e940fa15ca8248c8277acfc6dff3da3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] Removing 'Cloudera Manager' from upstream docs. Used DITA attribute 'audience=hidden' when needed to preserve information for a future clean-up step. When it made sense, 'Cloudera Mana

2017-02-17 Thread Laurel Hale (Code Review)
Laurel Hale has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6064

Change subject: Removing 'Cloudera Manager' from upstream docs. Used DITA 
attribute 'audience=hidden' when needed to preserve information for a future 
clean-up step. When it made sense, 'Cloudera Manager' was removed and sections 
were rewritten. This clean-up task does n
..

Removing 'Cloudera Manager' from upstream docs. Used DITA attribute
'audience=hidden' when needed to preserve information for a future
clean-up step. When it made sense, 'Cloudera Manager' was removed
and sections were rewritten. This clean-up task does not completely
remove 'Cloudera Manager' from the upstream docs, but it is now
hidden in these files so it won't show up in the rendered version.

Change-Id: I76c9b53f587bc85c5c21e195f0a771183d4ef3a0
---
M docs/topics/impala_admission.xml
M docs/topics/impala_config_performance.xml
M docs/topics/impala_noncm_installation.xml
M docs/topics/impala_prereqs.xml
M docs/topics/impala_tutorial.xml
5 files changed, 72 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/6064/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6064
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I76c9b53f587bc85c5c21e195f0a771183d4ef3a0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-17 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 4:

PS4 adds the "statistics predicates" to the PlannerTest and removes the custom 
predicate normalization from KuduScanNode.java, after it is now done during 
query rewrite.

-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-17 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#4).

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..

IMPALA-2328: Read support for min/max Parquet statistics

This change adds support for skipping row groups based on Parquet row
group statistics. With this change we only support reading statistics
from Parquet files for numerical types (bool, integer, floating point)
and for simple predicates of the formsor
  , where  is LT, LE, GE, GT, and EQ.

Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
---
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exprs/expr.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
27 files changed, 729 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

2017-02-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6038/3//COMMIT_MSG
Commit Message:

Line 12: 2. precision will be augmented if the scale is adjusted.
In addition to my other comment, I think we should mention some of the 
limitations here, for example, that it won't be possible to compute the AVG() 
of some decimal types at all.


http://gerrit.cloudera.org:8080/#/c/6038/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 337:   // an implicit divide. This is similar to the MS SQL server's 
behavior which takes
Suggest these minor fixes and reordering:

AVG() gets at least MIN_ADJUSTED_SCALE decimal places since it performs an 
implicit divide. However, the output type is not always the same as 
SUM()/COUNT(). Our behavior is similar to MS SQL Server's which takes the max 
of the input's scale and MIN_ADJUSTED_SCALE.

The observations are ok, but at a high level:
* the type is not the same as SUM()/COUNT()
* the behavior is not the same as MS SQL Server's

It would be good to provide a justification/intuition for why we chose yet 
another behavior (also in the commit msg).


http://gerrit.cloudera.org:8080/#/c/6038/3/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test:

Line 208: 
We should also add tests for those decimal types where AVG() cannot be computed 
(because we always overflow).


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Creating a table in Impala, changing the column schema
outside of Impala, and then reading again in Impala may
result in a crash. Impala may attempt to dereference
pointers that aren't there. This happens if a string column
is dropped and then a new, non string column is added with
the old string column's name.

The Kudu scan token contains the projection schema, and that
is validated when opening the Kudu scanner (with the
exception of KUDU-1881), but the issue is that during
planning, Impala assumes the types/nullability of columns
haven't changed when creating the scan tokens. This is fixed
by adding a check when creating the scan token, and failing
the query if the column types changed. Impala then relies on
the Kudu client to properly validate that the underlying
schema is still represented by the scan token, and that
deserialization will fail if it no longer matches. Test
cases were added for this particular crash scenario, which now
fails during planning as expected. This does not attempt to
validate the Kudu client validation at deserialization time,
though that would be valuable coverage to add in the future.

Columns being removed don't produce a crash; the query fails
gracefully. A test was added for this case.

Columns being added should not affect this scenario, but a
test was added anyway.

Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Reviewed-on: http://gerrit.cloudera.org:8080/5840
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M tests/common/kudu_test_suite.py
M tests/query_test/test_kudu.py
3 files changed, 227 insertions(+), 6 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 9: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB.

Summary before:

Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:

Operator   #Hosts   Avg Time  Max Time#Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
04:EXCHANGE 11.869ms   1.869ms   14.82K  15.21K 0   
 -1.00 B  UNPARTITIONED
03:AGGREGATE3   34.225ms  37.978ms   14.82K  15.21K  20.67 MB   
10.00 MB  FINALIZE
02:EXCHANGE 3  860.854us   1.024ms   15.60K  15.21K 0   
   0  HASH(ss_customer_sk)
01:AGGREGATE3   74.048ms  93.605ms   15.60K  15.21K   9.99 MB   
10.00 MB  STREAMING
00:SCAN HDFS3   68.162ms  78.181ms  183.59K 183.59K   3.84 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 141 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6025/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS2, Line 1036:   state->max_num_samples = new_size;
> The previous state gets copied in ReallocBuffer. The only thing we need to 
Ah, didn't remember the details of reallocate. Comment will be helpful, thanks.


PS2, Line 1167: (dst->num_samples < MAX_NUM_SAMPLES
> If DST does not have enough capacity for MAX_NUM_SAMPLES, it will be resize
I see, I missed that call before. Thanks.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB.

Summary before:

Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:

Operator   #Hosts   Avg Time  Max Time#Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
04:EXCHANGE 11.869ms   1.869ms   14.82K  15.21K 0   
 -1.00 B  UNPARTITIONED
03:AGGREGATE3   34.225ms  37.978ms   14.82K  15.21K  20.67 MB   
10.00 MB  FINALIZE
02:EXCHANGE 3  860.854us   1.024ms   15.60K  15.21K 0   
   0  HASH(ss_customer_sk)
01:AGGREGATE3   74.048ms  93.605ms   15.60K  15.21K   9.99 MB   
10.00 MB  STREAMING
00:SCAN HDFS3   68.162ms  78.181ms  183.59K 183.59K   3.84 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 141 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB.

Summary before:

Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:

Operator   #Hosts   Avg Time  Max Time#Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
04:EXCHANGE 11.869ms   1.869ms   14.82K  15.21K 0   
 -1.00 B  UNPARTITIONED
03:AGGREGATE3   34.225ms  37.978ms   14.82K  15.21K  20.67 MB   
10.00 MB  FINALIZE
02:EXCHANGE 3  860.854us   1.024ms   15.60K  15.21K 0   
   0  HASH(ss_customer_sk)
01:AGGREGATE3   74.048ms  93.605ms   15.60K  15.21K   9.99 MB   
10.00 MB  STREAMING
00:SCAN HDFS3   68.162ms  78.181ms  183.59K 183.59K   3.84 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 141 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 2:

(17 comments)

Marcel, I will run the targeted perf query later today.

http://gerrit.cloudera.org:8080/#/c/6025/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS2, Line 963: max_num_samples
> confusing to have MAX_NUM_SAMPLES as well. maybe this should be capacity
Done


PS2, Line 973: { return begin() + idx; }
> it'd be good to add bounds checks w/ dchecks
Done


PS2, Line 984: max_num_samples
> capacity
Done


Line 996:   bool IsFull() const {
> dcheck that num_samples <= capacity, i.e. that it is NOT over
Done


PS2, Line 990:   size_t byte_size() const {
 : return byte_size(max_num_samples);
 :   }
 : 
 :   // Returns true if the array of ReservoirSamples that follows 
this struct in memory is
 :   // full, and no more elements can be pushed back without 
resizing.
 :   bool IsFull() const {
 : return num_samples == max_num_samples;
 :   }
> i think both of these can be one line
One lined one of them, added a DCHECK to the other.


PS2, Line 1001: // make this const
> ?
Done


PS2, Line 1018: resizeReservoirSampleState
> nit: inconsistent casing in fn names
Make this one start with a capital letter. But other ones like begin() should 
probably start with a lowercase.


PS2, Line 1021: N_
> NUM
Done


PS2, Line 1024: new_size 
> new_capacity to avoid conflating with memory sizes
Done


PS2, Line 1024: new_size 
> Queries with a large number buckets will run out of memory much quicker tha
Done


PS2, Line 1024:   int new_size = state->max_num_samples * 10;
  :   DCHECK_EQ(state->max_num_samples, state->num_samples);
  :   DCHECK_LE(new_size, MAX_NUM_SAMPLES);
> Looks like it'll dcheck when capacity reaches MAX_NUM_SAMPLES. Shouldn't th
In this implementation it couldn't DCHECK, because we would start with capacity 
20, then go to 200, 2000, and finally 2. This is now changed because 
Mostafa suggested doubling instead.


PS2, Line 1036:   state->max_num_samples = new_size;
> why don't any other fields need to be initialized? e.g. the rng will be gar
The previous state gets copied in ReallocBuffer. The only thing we need to 
change is is the capacity. Added comment.


PS2, Line 1048: size
> again I'd vote for capacity
I got rid of this variable.


PS2, Line 1058:   *state = ReservoirSampleState();
> this looks like it'll initialize the memory properly. I think we'd probably
We don't need to do that on line 1035 because everything gets copied over in 
ReallocBuffer()


PS2, Line 1084:   ++state->source_size;
> I think this gets lost when you allocate a new State.
As mentioned above, everything is copied in realloc


PS2, Line 1152: ReservoirSampleMerge
> I think this algorithm will require some changes to handle varying length S
The capacity shouldn't matter in this algorithm. This algorithm only cares 
about num samples in src and dst states.


PS2, Line 1167: (dst->num_samples < MAX_NUM_SAMPLES
> I don't think you can rely on this anymore, dst may not have capacity MAX_N
If DST does not have enough capacity for MAX_NUM_SAMPLES, it will be resized on 
line 1170


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) publishing Impala 2.8 rendered docs

2017-02-17 Thread Laurel Hale (Code Review)
Laurel Hale has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6062

Change subject: publishing Impala 2.8 rendered docs
..

publishing Impala 2.8 rendered docs

Change-Id: Ied1bbe1650e54a045fbf48b3f91867fd15a1525b
---
A docs/build/html/commonltr.css
A docs/build/html/commonrtl.css
A docs/build/html/images/howto_access_control.png
A docs/build/html/images/howto_per_node_peak_memory_usage.png
A docs/build/html/images/howto_show_histogram.png
A docs/build/html/images/howto_static_server_pools_config.png
A docs/build/html/images/impala_arch.jpeg
A docs/build/html/images/support_send_diagnostic_data.png
A docs/build/html/index.html
A docs/build/html/shared/ImpalaVariables.html
A docs/build/html/shared/impala_common.html
A docs/build/html/topics/impala.html
A docs/build/html/topics/impala_abort_on_default_limit_exceeded.html
A docs/build/html/topics/impala_abort_on_error.html
A docs/build/html/topics/impala_admin.html
A docs/build/html/topics/impala_admission.html
A docs/build/html/topics/impala_aggregate_functions.html
A docs/build/html/topics/impala_aliases.html
A docs/build/html/topics/impala_allow_unsupported_formats.html
A docs/build/html/topics/impala_alter_table.html
A docs/build/html/topics/impala_alter_view.html
A docs/build/html/topics/impala_analytic_functions.html
A docs/build/html/topics/impala_appx_count_distinct.html
A docs/build/html/topics/impala_appx_median.html
A docs/build/html/topics/impala_array.html
A docs/build/html/topics/impala_auditing.html
A docs/build/html/topics/impala_authentication.html
A docs/build/html/topics/impala_authorization.html
A docs/build/html/topics/impala_avg.html
A docs/build/html/topics/impala_avro.html
A docs/build/html/topics/impala_batch_size.html
A docs/build/html/topics/impala_bigint.html
A docs/build/html/topics/impala_bit_functions.html
A docs/build/html/topics/impala_boolean.html
A docs/build/html/topics/impala_breakpad.html
A docs/build/html/topics/impala_char.html
A docs/build/html/topics/impala_cluster_sizing.html
A docs/build/html/topics/impala_comments.html
A docs/build/html/topics/impala_complex_types.html
A docs/build/html/topics/impala_components.html
A docs/build/html/topics/impala_compression_codec.html
A docs/build/html/topics/impala_compute_stats.html
A docs/build/html/topics/impala_concepts.html
A docs/build/html/topics/impala_conditional_functions.html
A docs/build/html/topics/impala_config.html
A docs/build/html/topics/impala_config_options.html
A docs/build/html/topics/impala_config_performance.html
A docs/build/html/topics/impala_connecting.html
A docs/build/html/topics/impala_conversion_functions.html
A docs/build/html/topics/impala_count.html
A docs/build/html/topics/impala_create_database.html
A docs/build/html/topics/impala_create_function.html
A docs/build/html/topics/impala_create_role.html
A docs/build/html/topics/impala_create_table.html
A docs/build/html/topics/impala_create_view.html
A docs/build/html/topics/impala_databases.html
A docs/build/html/topics/impala_datatypes.html
A docs/build/html/topics/impala_datetime_functions.html
A docs/build/html/topics/impala_ddl.html
A docs/build/html/topics/impala_debug_action.html
A docs/build/html/topics/impala_decimal.html
A docs/build/html/topics/impala_default_order_by_limit.html
A docs/build/html/topics/impala_delegation.html
A docs/build/html/topics/impala_delete.html
A docs/build/html/topics/impala_describe.html
A docs/build/html/topics/impala_development.html
A docs/build/html/topics/impala_disable_codegen.html
A docs/build/html/topics/impala_disable_row_runtime_filtering.html
A docs/build/html/topics/impala_disable_streaming_preaggregations.html
A docs/build/html/topics/impala_disable_unsafe_spills.html
A docs/build/html/topics/impala_disk_space.html
A docs/build/html/topics/impala_distinct.html
A docs/build/html/topics/impala_dml.html
A docs/build/html/topics/impala_double.html
A docs/build/html/topics/impala_drop_database.html
A docs/build/html/topics/impala_drop_function.html
A docs/build/html/topics/impala_drop_role.html
A docs/build/html/topics/impala_drop_stats.html
A docs/build/html/topics/impala_drop_table.html
A docs/build/html/topics/impala_drop_view.html
A docs/build/html/topics/impala_exec_single_node_rows_threshold.html
A docs/build/html/topics/impala_explain.html
A docs/build/html/topics/impala_explain_level.html
A docs/build/html/topics/impala_explain_plan.html
A docs/build/html/topics/impala_faq.html
A docs/build/html/topics/impala_file_formats.html
A docs/build/html/topics/impala_fixed_issues.html
A docs/build/html/topics/impala_float.html
A docs/build/html/topics/impala_functions.html
A docs/build/html/topics/impala_functions_overview.html
A docs/build/html/topics/impala_grant.html
A docs/build/html/topics/impala_group_by.html
A docs/build/html/topics/impala_group_concat.html
A docs/build/html/topics/impala_hadoop.html
A docs/build/html/topics/impala_having.html
A 

[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5969/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 3720:   TestStringValue(UTC_TO_MSC("2014-10-25 22:00:00"), "2014-10-26 
01:00:00");
> What do you think should be done about this?
After speaking to Greg, we decided that we should return NULL for that 1 hour 
to be consistent.


-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-02-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..


Patch Set 1:

(10 comments)

Thanks for the suggestions Henry. New screenshots are here:
https://drive.google.com/file/d/0B7VW0hRyzVlKTzZPR3VEVlVEYWc/view?usp=sharing
https://drive.google.com/file/d/0B7VW0hRyzVlKOXowTUM2QWVySm8/view?usp=sharing

http://gerrit.cloudera.org:8080/#/c/6013/1/be/src/util/thread.cc
File be/src/util/thread.cc:

Line 178:   void ThreadOverviewUrlCallback(bool track_jvm_threads,
> Shouldn't this go above ThreadGroupUrlCallback() to match with the sample j
No, this is the output of the overview callback.


PS1, Line 194: [this, track_jvm_threads] (const Webserver::ArgumentMap& args, 
Document* doc) {
 : this->ThreadOverviewUrlCallback(track_jvm_threads, args, doc)
> Make this a method? MakeUrlCallBack or something, used multiple times.
This has changed.


PS1, Line 209: RegisterUrlCallback
> Rather than adding another top-level link, consider making two tabs on the 
Done


PS1, Line 234: bool track_jvm_threads,
> how about just making this a member of ThreadMgr? Actually - I don't think 
Done


Line 260: << status.GetDetail();
> return, then remove else { }
Done


PS1, Line 262: jvmThreadsVal
> C++ naming styles
power of habit.. :) Done


PS1, Line 327: INFO
> warn/error? How about bubbling this error up to the web endpoint by setting
Changed it to WARN. What is the "error" member?


http://gerrit.cloudera.org:8080/#/c/6013/1/be/src/util/thread.h
File be/src/util/thread.h:

Line 191: /// the "thread-manager."
> Describe track_jvm_threads in the method comment?
Done


http://gerrit.cloudera.org:8080/#/c/6013/1/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

Line 720:   // Summary of a JVM thread. Includes stacktraces, locked monitors
> Do you think its good to include the locked synchronizers or the lock it is
It already does that.


http://gerrit.cloudera.org:8080/#/c/6013/1/www/jvm-threadz.tmpl
File www/jvm-threadz.tmpl:

PS1, Line 35: Is n
> Native
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests

2017-02-17 Thread Michael Brown (Code Review)
Hello David Knupp,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6002

to look at the new patch set (#5).

Change subject: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
..

IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests

This patch allows any test suites whose workload is "targeted-stress" to
be run in so-called "exhaustive" mode. Before this patch, only suites
whose workload was "functional-query" would be run exhaustively. More on
this flaw is in IMPALA-3947.

The net effects are:

1. We fix IMPALA-4904, which allows test_ddl_stress to start running
   again.
2. We also improve the situation in IMPALA-4914 by getting
   TestSpillStress to run, though we don't fix its
   not-running-concurrently problem.

The other mini-cluster stress tests were disabled in this commit:
  IMPALA-2605: Omit the sort and mini stress tests
so they are not directly affected here.

I also tried to clarify what "exhaustive" means in some of our shell
scripts, via help text and comments.

An exhaustive build+test run showed test_ddl_stress and TestSpillStress
now get run and passed. This adds roughly 12 minutes to a build that's
on the order of 13-14 hours.

Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
---
M bin/run-all-tests.sh
M buildall.sh
M tests/custom_cluster/test_spilling.py
M tests/stress/test_ddl_stress.py
4 files changed, 67 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6002/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR](asf-site) publishing Impala 2.8 rendered docs

2017-02-17 Thread Laurel Hale (Code Review)
Laurel Hale has abandoned this change.

Change subject: publishing Impala 2.8 rendered docs
..


Abandoned

practicing with Jim to publish on Impala ASF site

-- 
To view, visit http://gerrit.cloudera.org:8080/6062
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ied1bbe1650e54a045fbf48b3f91867fd15a1525b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/284/

-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests

2017-02-17 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6002/3//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for 
exhaustive tests
> Thanks for the point Henry. I once saw on an Impala code review (I can't fi
Done


http://gerrit.cloudera.org:8080/#/c/6002/3/tests/stress/test_ddl_stress.py
File tests/stress/test_ddl_stress.py:

Line 59
> Given the number of catalog race conditions we tend to see, I think it is p
TEST_IDS (LHS L45) is equivalent to RHS L52. The RHS is simply the standard 
py.test way of going it. The real difference here is the introduction of 
unique_database, which I agree removes the potential of "CREATE DATABASE IF NOT 
EXISTS ddl_stress_testdb"  to be run concurrently. From a test perspective, 
partially reverting this to the previous behavior should not cause the test to 
fail. Let me address that in a separate patch set.


-- 
To view, visit http://gerrit.cloudera.org:8080/6002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) publishing Impala 2.8 rendered docs

2017-02-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: publishing Impala 2.8 rendered docs
..


Patch Set 1: Code-Review-2 Verified-1

Practicing with Laurel

-- 
To view, visit http://gerrit.cloudera.org:8080/6062
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ied1bbe1650e54a045fbf48b3f91867fd15a1525b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-17 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#13).

Change subject: IMPALA-2020: Add rounding for decimal casts
..

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 0|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5 |
+-+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 1|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.6 |
+-+
Fetched 1 row(s) in 0.01s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 467 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/13
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-02-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..

IMPALA-4885: Expose Jvm thread info in web UI

This commit exposes information about JVM threads to the impalad and
catalogd web UIs. This information includes statistics about the number
of threads running in the JVM as well as per-thread stacktraces, monitors and
synchronizers. Total CPU, user CPU and blocked time is also reported per
thread.

Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
---
M be/src/catalog/catalogd-main.cc
M be/src/service/impalad-main.cc
M be/src/statestore/statestored-main.cc
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/thread.cc
M be/src/util/thread.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/common/JniUtil.java
A www/jvm-threadz.tmpl
M www/threadz.tmpl
A www/threadz_tabs.tmpl
12 files changed, 401 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/6013/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests

2017-02-17 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" 
for exhaustive tests
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6002/3//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for 
exhaustive tests
> Although I'm not keen on the strict 72-char limit (we have many other commi
Thanks for the point Henry. I once saw on an Impala code review (I can't find 
it to cite it) a fair refute of this idea: writing Git commit headers like 
"IMPALA-{4904, 4914}" was a bad idea, because if someone needs to search Git 
logs for IMPALA-4914, he won't match this particular header.


-- 
To view, visit http://gerrit.cloudera.org:8080/6002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests

2017-02-17 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" 
for exhaustive tests
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6002/3//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for 
exhaustive tests
> I agree this loses some information, but I think it's the right tradeoff:
Although I'm not keen on the strict 72-char limit (we have many other commit 
messages that go over this - what's the reason for enforcing it?) - note you 
can save some chars by compressing the list of fixed JIRAs:

  IMPALA-{4904, 4914}: etc.


-- 
To view, visit http://gerrit.cloudera.org:8080/6002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests

2017-02-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" 
for exhaustive tests
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6002/3//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for 
exhaustive tests
> Can you please suggest a meaningful header that summarizes the patch that a
I agree this loses some information, but I think it's the right tradeoff:

IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests


http://gerrit.cloudera.org:8080/#/c/6002/3/tests/stress/test_ddl_stress.py
File tests/stress/test_ddl_stress.py:

Line 59
> No change in ordering of DDL calls. The CREATE DATABASE that is executed no
Given the number of catalog race conditions we tend to see, I think it is 
possible that this changes the test meaning.

My feeling is that it would be good to test both things: multiple tables in one 
DB and multiple tables each with its own DB.

If this method and TEST_IDS were left unchanged, would the test fail?


-- 
To view, visit http://gerrit.cloudera.org:8080/6002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests

2017-02-17 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" 
for exhaustive tests
..


Patch Set 4: Code-Review+1

(9 comments)

Thanks for the review, Jim. Patch set 4 does not have any functional changes 
relative to patch set 3. Some of your comments I answered by making changes to 
the comments in code; most I answer as comments in Gerrit.

Carry +1

http://gerrit.cloudera.org:8080/#/c/6002/3//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for 
exhaustive tests
> long line
Can you please suggest a meaningful header that summarizes the patch that also 
fits in 72 columns? "Long line" on its own isn't helpful. I realize it's long, 
but I'm not sure how to shorten it without loss of information.


Line 29: An exhaustive build+test run showed test_ddl_stress and TestSpillStress
> Roughly how much longer does this make exhaustive runs?
Done


http://gerrit.cloudera.org:8080/#/c/6002/3/bin/run-all-tests.sh
File bin/run-all-tests.sh:

PS3, Line 100: XXX
> For my education: what do the three X's mean
Vim highlights it (like how it highlights "TODO"). I like to use them as 
notices.


PS3, Line 102: `
> I don't think I've seen this idiom before. I recognize it is in HEAD alread
That's the gist. I expect it's either a Martinism or Caseyism. I like it: it 
allows for clearer formatting and organization of multi-line quoted text on RHS 
of an assignment. Note that parentheses won't work, nor will backslash.


http://gerrit.cloudera.org:8080/#/c/6002/3/buildall.sh
File buildall.sh:

Line 188:"its dependencies. If already running, all services are 
restarted."\
> What happened to the spaces?
I noticed this when running buildall.sh -help They are not needed and 
./buidall.sh -help will show two spaces (on master). See this gist for more 
info, if you care. :) I don't want to clutter the review with the explanation.

https://gist.github.com/mikesbrown/ca0c5e4a2ba13dd929f72c57adf4e8bc


PS3, Line 196: ONLY APPLIES to suites with workloads:"\
 :"functional-query, targeted-stress"
> This seems likely to become stale if that changes, as it may not show up to
I think it's important to convey it *somewhere* what "running exhaustively" 
actually means. buildall.sh is the common entry point to our system, and we 
already have help text here. I simply clarified the help text. I also left a 
notice (that "XXX" you asked about) in hopes people will be prompted to update 
it.


http://gerrit.cloudera.org:8080/#/c/6002/3/tests/custom_cluster/test_spilling.py
File tests/custom_cluster/test_spilling.py:

PS3, Line 56: ) and ca
> Would it be worthwhile or even possible to fix it to be more typical? If so
Done


PS3, Line 66: equivalent
> This one I don't quite understand. The CustomClusterTestSuite presumably sh
Done


http://gerrit.cloudera.org:8080/#/c/6002/3/tests/stress/test_ddl_stress.py
File tests/stress/test_ddl_stress.py:

Line 59
> Does removing this (to pytest.mark.parametrize) change the possible orderin
No change in ordering of DDL calls. The CREATE DATABASE that is executed now is 
part of the CREATE DATABASE from the unique_database fixture before the test 
proper starts. The difference is each test will use its own unique database 
instead of sharing across "ddl_stres_testdb".


-- 
To view, visit http://gerrit.cloudera.org:8080/6002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests

2017-02-17 Thread Michael Brown (Code Review)
Hello David Knupp,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6002

to look at the new patch set (#4).

Change subject: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" 
for exhaustive tests
..

IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive 
tests

This patch allows any test suites whose workload is "targeted-stress" to
be run in so-called "exhaustive" mode. Before this patch, only suites
whose workload was "functional-query" would be run exhaustively. More on
this flaw is in IMPALA-3947.

The net effects are:

1. We fix IMPALA-4904, which allows test_ddl_stress to start running
   again.
2. We also improve the situation in IMPALA-4914 by getting
   TestSpillStress to run, though we don't fix its
   not-running-concurrently problem.

The other mini-cluster stress tests were disabled in this commit:
  IMPALA-2605: Omit the sort and mini stress tests
so they are not directly affected here.

I also tried to clarify what "exhaustive" means in some of our shell
scripts, via help text and comments.

An exhaustive build+test run showed test_ddl_stress and TestSpillStress
now get run and passed. This adds roughly 12 minutes to a build that's
on the order of 13-14 hours.

Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
---
M bin/run-all-tests.sh
M buildall.sh
M tests/custom_cluster/test_spilling.py
M tests/stress/test_ddl_stress.py
4 files changed, 67 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6002/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

2017-02-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6038/2/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test:

PS2, Line 197: 0
> Yes, I tried verifying with sum()/num_rows(). That's why I used tpch_parque
I meant I verified with sum(col)/count(*).


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

2017-02-17 Thread Michael Ho (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6038

to look at the new patch set (#3).

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
..

IMPALA-4821: Update AVG() for DECIMAL_V2

This change implements the DECIMAL_V2's behavior for AVG().
The major differences are:
1. output type's scale has a minimum of 6
2. precision will be augmented if the scale is adjusted.

Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/expr-test.cc
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/decimal.test
5 files changed, 231 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6038/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

2017-02-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6038/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 447:   // The scale of the accumulated sum is set to the scale of the 
output type.
> I think this comment would be better in the frontend code since that's wher
Done


http://gerrit.cloudera.org:8080/#/c/6038/2/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 336:   // AVG() always get at least MIN_ADJUSTED_SCALE decimal places 
since it performs
> gets
Done


Line 338:   int resultScale = Math.max(ScalarType.MIN_ADJUSTED_SCALE, 
digitsAfter);
> Is the resulting type the same as if we did (SUM(col) / COUNT(col))? In any
Done


http://gerrit.cloudera.org:8080/#/c/6038/2/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test:

PS2, Line 197: 0
> do these results make sense? seems surprising that this decimal place is al
Yes, I tried verifying with sum()/num_rows(). That's why I used 
tpch_parquet.lineitem to help verify the result.


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering

2017-02-17 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#8).

Change subject: IMPALA-4624: Implement Parquet dictionary filtering
..

IMPALA-4624: Implement Parquet dictionary filtering

Here is a basic summary of the changes:
Frontend looks for conjuncts that operate on a single slot and pass a
map from slot id to the conjunct index through thrift to the backend.
The conjunct indices are the incides into the normal PlanNode conjuncts list.
The conjuncts need to satisfy certain conditions:
1. They are bound on a single slot
2. They are deterministic (no random functions)
3. They evaluate to FALSE on a NULL input. This is because the dictionary
does not include NULLs, so any condition that evaluates to TRUE on NULL
cannot be evaluated by looking only at the dictionary.

The backend converts the indices into ExprContexts. These are cloned in
the scanner threads.

The dictionary read codepath has been removed from ReadDataPage into its
own function, InitDictionary. This has also been turned into its own step
in row group initialization. ReadDataPage will not see any dictionary
pages unless the parquet file is invalid.

For dictionary filtering, we initialize dictionaries only as needed to evaluate
the conjuncts. The Parquet scanner evaluates the single slot conjuncts on the
dictionary to see if any dictionary entry passes. If no entry passes, the row
group is eliminated. If the row group passes the dictionary filtering, then we
initialize all remaining dictionaries.

Since column chunks can have a mixture of encodings, dictionary filtering
uses three tests to determine whether this is purely dictionary encoded:
1. If the encoding_stats is in the parquet file, then use it to determine if
there are only dictionary encoded pages (i.e. there are no data pages with
an encoding other than PLAIN_DICTIONARY).
-OR-
2. If the encoding stats are not present, then look at the encodings. The column
is purely dictionary encoded if:
a) PLAIN_DICTIONARY is present
AND
b) Only PLAIN_DICTIONARY, RLE, or BIT_PACKED encodings are listed
-OR-
3. If this file was written by an older version of Impala, then we know that
dictionary failover happens when the dictionary reaches 40,000 values.
Dictionary filtering can proceed as long as the dictionary is smaller than
that.

parquet-mr writes the encoding list correctly in the current version in our
environment (1.5.0). This means that check #2 works on some existing files
(potentially most existing parquet-mr files).
parquet-mr writes the encoding stats starting in 1.9.0. This is the version
where check #1 will start working.

Impala's parquet writer now implements both, so either check above will work.

Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
M common/thrift/PlanNodes.thrift
M common/thrift/parquet.thrift
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
A testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
M tests/query_test/test_scanners.py
19 files changed, 1,056 insertions(+), 161 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/5904/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5904
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 2:

Taras, why don't you run a targeted perf query to measure runtime overhead of 
2x vs. 10x resizing.

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-17 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5951/12/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS12, Line 49: double
> shouldn't this be T?
No, T may not have enough precision to hold the value of 10^scale - look at the 
old code, which also does the multiplication in double.


PS12, Line 52: max_value < 0
> why is the max_value < 0 check needed?
If we cast to a decimal type whose underlying width is not large enough to hold 
the precision, this will become true.

Make it a DCHECK?  This should be impossible, guaranteed by the type system.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for exhaustive tests

2017-02-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" 
for exhaustive tests
..


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6002/3//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4904,IMPALA-4914: whitelist workload "targeted-stress" for 
exhaustive tests
long line


Line 29: An exhaustive build+test run showed test_ddl_stress and TestSpillStress
Roughly how much longer does this make exhaustive runs?


http://gerrit.cloudera.org:8080/#/c/6002/3/bin/run-all-tests.sh
File bin/run-all-tests.sh:

PS3, Line 100:   `
For my education: what do the three X's mean


PS3, Line 102: 
I don't think I've seen this idiom before. I recognize it is in HEAD already, 
but, for the purposes of my own education, is this command substitution where 
the command contains only whitespace, thereby just acting like "remove these 
backticks and all the space between them"?


http://gerrit.cloudera.org:8080/#/c/6002/3/buildall.sh
File buildall.sh:

Line 188:"its dependencies. If already running, all services are 
restarted."\
What happened to the spaces?


PS3, Line 196: ONLY APPLIES to suites with workloads:"\
 :"functional-query, targeted-stress"
This seems likely to become stale if that changes, as it may not show up to the 
author of a change to run-all-tests.sh. Is there anything to be done about that?


http://gerrit.cloudera.org:8080/#/c/6002/3/tests/custom_cluster/test_spilling.py
File tests/custom_cluster/test_spilling.py:

PS3, Line 56: 
Would it be worthwhile or even possible to fix it to be more typical? If so, 
should that be this patch or should it be a new JIRA?


PS3, Line 66: 
This one I don't quite understand. The CustomClusterTestSuite presumably shuts 
down impalad, but that's OK once this test is done, yes?


http://gerrit.cloudera.org:8080/#/c/6002/3/tests/stress/test_ddl_stress.py
File tests/stress/test_ddl_stress.py:

Line 59
Does removing this (to pytest.mark.parametrize) change the possible orderings 
of the DDL calls?


-- 
To view, visit http://gerrit.cloudera.org:8080/6002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-17 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6056/3/bin/impala-config.sh
File bin/impala-config.sh:

PS3, Line 75: IMPALA_TOOLCHAIN_BUILD_ID
> I actually don't need to because I built this Kudu with a new jenkins job (
Sounds ok. If you're revving nightly, it might be better just to have a 
'latest' symlink (just like a SNAPSHOT version) than to make a commit every 
night, but whatever is easiest.


-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-17 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#4).

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..

IMPALA-4934: Disable Kudu OpenSSL initialization

Bumps the Kudu version to include the change to the client
that allows Impala to disable SSL initialization.

In authentication.cc, after Impala initializes OpenSSL,
Impala then disables Kudu's OpenSSL init.

Change-Id: I3f13f3af512c6d771979638da593685524c73086
---
M be/src/rpc/authentication.cc
M bin/impala-config.sh
2 files changed, 10 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/6056/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-17 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6056/3/be/src/rpc/authentication.cc
File be/src/rpc/authentication.cc:

PS3, Line 663: impala
> remove?
Done


http://gerrit.cloudera.org:8080/#/c/6056/3/bin/impala-config.sh
File bin/impala-config.sh:

PS3, Line 75: IMPALA_TOOLCHAIN_BUILD_ID
> Don't you need to change this as well?
I actually don't need to because I built this Kudu with a new jenkins job 
(script still in review [1]) which builds Kudu with an existing toolchain, and 
adds the newer Kudu. I still had to add the gerrit commit to the toolchain 
buildall.sh (as you saw) so that future toolchain builds build the latest Kudu 
version. The goal of this is to start revving the Kudu versions more regularly, 
perhaps at least testing nightly. 

1: https://gerrit.cloudera.org/#/c/6014/


-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-17 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6056/3/be/src/rpc/authentication.cc
File be/src/rpc/authentication.cc:

PS3, Line 663: impala
remove?


http://gerrit.cloudera.org:8080/#/c/6056/3/bin/impala-config.sh
File bin/impala-config.sh:

PS3, Line 75: IMPALA_TOOLCHAIN_BUILD_ID
Don't you need to change this as well?


-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4926: Upgrade LZ4 to 1.7.5

2017-02-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4926: Upgrade LZ4 to 1.7.5
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I10e4561d0e940fa15ca8248c8277acfc6dff3da3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' Rewrote sections to eliminate 'Cloudera Manager' from topics. Look for subsequent phases to remove remaining instances of CM.

2017-02-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' 
Rewrote sections to eliminate 'Cloudera Manager' from topics. Look for 
subsequent phases to remove remaining instances of CM.
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6049/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' Rewrote 
sections to eliminate 'Cloudera Manager' from topics. Look for subsequent 
phases to remove remaining instances of CM.
Commit messages should wrap at 72 characters


-- 
To view, visit http://gerrit.cloudera.org:8080/6049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-17 Thread Matthew Jacobs (Code Review)
Hello Thomas Tauber-Marshall, Sailesh Mukil, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5840

to look at the new patch set (#9).

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..

IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Creating a table in Impala, changing the column schema
outside of Impala, and then reading again in Impala may
result in a crash. Impala may attempt to dereference
pointers that aren't there. This happens if a string column
is dropped and then a new, non string column is added with
the old string column's name.

The Kudu scan token contains the projection schema, and that
is validated when opening the Kudu scanner (with the
exception of KUDU-1881), but the issue is that during
planning, Impala assumes the types/nullability of columns
haven't changed when creating the scan tokens. This is fixed
by adding a check when creating the scan token, and failing
the query if the column types changed. Impala then relies on
the Kudu client to properly validate that the underlying
schema is still represented by the scan token, and that
deserialization will fail if it no longer matches. Test
cases were added for this particular crash scenario, which now
fails during planning as expected. This does not attempt to
validate the Kudu client validation at deserialization time,
though that would be valuable coverage to add in the future.

Columns being removed don't produce a crash; the query fails
gracefully. A test was added for this case.

Columns being added should not affect this scenario, but a
test was added anyway.

Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
---
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M tests/common/kudu_test_suite.py
M tests/query_test/test_kudu.py
3 files changed, 227 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-17 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 9: Code-Review+2

fixed spelling and carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No