[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-08 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#5).

Change subject: IMPALA-4623: Enable file handle cache
..

IMPALA-4623: Enable file handle cache

Currently, every scan range maintains a file handle, even
when multiple scan ranges are accessing the same file.
Opening the file handles causes load on the NameNode, which
can lead to scaling issues.

There are two parts to this transaction:
1. Enable file handle caching by default
2. Share the file handle between scan ranges from the same
file

The scan range no longer maintains its own Hdfs file
handle. On each read, the io thread will get the Hdfs file
handle from the cache (opening it if necessary) and use
that for the read. This allows multiple scan ranges on the
same file to use the same file handle. Since the file
offsets are no longer consistent for an individual scan
range, all Hdfs reads need to either use hdfsPread or do
a seek before reading. Additionally, since Hdfs read
statistics are maintained on the file handle, the read
statistics must be retrieved and cleared after each read.

Scan ranges that are accessing data cached by Hdfs
will get a file handle from the cache, but the file
handle will be kept on the scan range for the time
that the scan range is in use. This prevents the
cache from closing the file handle while the data
buffer is in use.

To manage contention, the file handle cache is now
partitioned by a hash of the key into independent
caches with independent locks. The allowed capacity
of the file handle cache is split evenly among the
partitions. File handles are evicted independently
for each partition. The file handle cache maintains
ownership of the file handles at all times, but it
will not evict a file handle that is in use.

If max_cached_file_handles is set to 0, file handle
caching is off and the previous behavior applies.

Tests:
test_hdfs_fd_caching.py copies the files from an existing
table into a new directory and uses that to create an
external table. It queries the external table, then
uses the hdfs commandline to manipulate the hdfs file
(delete, move, etc). It queries again to make sure we
don't crash. Then, it runs "invalidate metadata". In
the delete case, we expect zero rows. In the move case,
we expect the same number of rows.

TODO:
1. Determine appropriate defaults.
2. Other tests
  a. File overwrite
  b. Any others?
3. For scan ranages that use Hdfs caching, should there
be some sharing at the scanner level?

Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/buffered-block-mgr.h
A be/src/runtime/disk-io-mgr-handle-cache.h
A be/src/runtime/disk-io-mgr-handle-cache.inline.h
M be/src/runtime/disk-io-mgr-internal.h
M be/src/runtime/disk-io-mgr-reader-context.cc
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M tests/query_test/test_hdfs_fd_caching.py
11 files changed, 732 insertions(+), 368 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 14: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message

2017-05-08 Thread Michael Ho (Code Review)
Hello Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6787

to look at the new patch set (#6).

Change subject: IMPALA-5197: Erroneous corrupted Parquet file message
..

IMPALA-5197: Erroneous corrupted Parquet file message

The Parquet file column reader may fail in the middle
of producing a scratch tuple batch for various reasons
such as exceeding memory limit or cancellation. In which
case, the scratch tuple batch may not have materialized
all the rows in a row group. We shouldn't erroneously
report that the file is corrupted in this case as the
column reader didn't completely read the entire row group.

A new test case is added to verify that we won't see this
error message. A new failpoint phase GETNEXT_SCANNER is
also added to differentiate it from the GETNEXT in the
scan node itself.

Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M common/thrift/PlanNodes.thrift
M tests/common/impala_service.py
M tests/failure/test_failpoints.py
9 files changed, 40 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/6787/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Wood 


[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message

2017-05-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5197: Erroneous corrupted Parquet file message
..


Patch Set 5: Code-Review+2

(1 comment)

Rebase. Carry +2.

http://gerrit.cloudera.org:8080/#/c/6787/4/tests/failure/test_failpoints.py
File tests/failure/test_failpoints.py:

Line 140:   # IMPALA-5197: None of the debug action should trigger 
corrupted file message
> actions
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Wood 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-05-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6812/1//COMMIT_MSG
Commit Message:

PS1, Line 10: statistic
> How about "we use the Parquet field RowGroup.num_rows"?
Works for me.


http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 440:   *dst_slot = file_metadata_.row_groups[row_group_idx_].num_rows;
> There's also FileMetaData::num_rows. Can't we use that instead of looping o
We could, but not sure it's worth it. One scanner does not necessarily process 
an entire Parquet file, so we'd need to make sure that exactly one scanner 
thread deals with the entire file just for this special case. Taras, maybe you 
can take a look and see how invasive that would be?


Line 1455: // Column readers are not needed because we are not reading from 
any columns if this
> Can we then optimize something like 
The transformation is only valid if l_comment is non-nullable. We have no 
concept of nullability for HDFS tables.


http://gerrit.cloudera.org:8080/#/c/6812/1/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test:

Line 34: |  output: 
sum_zero_if_empty(functional_parquet.alltypes.parquet-stats: num_rows)
> i don't know what this means.
Personally, I prefer to show what is actually being executed in the explain 
plan. Otherwise, if something goes wrong it could be hard to debug because we 
do not know which code path it is taking.

Do you have an alternative proposal for showing that the optimized path is 
being taken? How would we debug/support/test this feature? How will users 
understand their the query plan?

Let's start a thread/doc about these usability/supportability issues.


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP

2017-05-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP
..


Patch Set 7:

(2 comments)

Thanks for the reviews. I'm going to squash in the "TimestampValue constructors 
refactoring" change [1] since that also has a +2 and it's related.

1: https://gerrit.cloudera.org/#/c/6510/11

http://gerrit.cloudera.org:8080/#/c/6526/7/be/src/exec/kudu-util.h
File be/src/exec/kudu-util.h:

Line 89:   if (COPY_STRINGS) {
> it doesn't feel like this particular if should be very performance-relevant
Done


http://gerrit.cloudera.org:8080/#/c/6526/7/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 1: // Licensed to the Apache Software Foundation (ASF) under one
> this file is included in bunch of .h files. now that you pulled out a bunch
Good call


-- 
To view, visit http://gerrit.cloudera.org:8080/6526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


IMPALA-5287: Test skip.header.line.count on gzip

This change fixed IMPALA-4873 by adding the capability to supply a dict
'test_file_vars' to run_test_case(). Keys in this dict will be replaced
with their values inside test queries before they are executed.

Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Reviewed-on: http://gerrit.cloudera.org:8080/6817
Reviewed-by: Michael Brown 
Tested-by: Impala Public Jenkins
---
M testdata/bin/generate-schema-statements.py
M testdata/data/README
A testdata/data/table_with_header.gz
A testdata/data/table_with_header_2.gz
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
9 files changed, 75 insertions(+), 39 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Brown: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-05-08 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 455: DCHECK_LE(row_group_rows_read_, file_metadata_.num_rows);
What if file_metadata_.num_rows or 
file_metadata_.row_groups[row_group_idx_].num_rows have negative values?

We have seen cases where a single file had too many rows which causes an 
overflow and stats had a negative value.


Line 1455: // Column readers are not needed because we are not reading from 
any columns if this
> DCHECK that there is exactly one materialized slot
Can we then optimize something like 
select count(l_comment) from lineitem to select count(*) from lineitem

The later is 7x faster.


http://gerrit.cloudera.org:8080/#/c/6812/1/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test:

Line 34: |  output: 
sum_zero_if_empty(functional_parquet.alltypes.parquet-stats: num_rows)
> i don't know what this means.
Why do we need to print this information in the plan?
Won't this be enabled for all Parquet files moving forward?


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5245: fix ASAN buffer-allocator-test
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5245: fix ASAN buffer-allocator-test
..


IMPALA-5245: fix ASAN buffer-allocator-test

* Use the allocator_may_return_null=1 ASAN option so that the allocation
  just fails instead of crashing the process.
* Work around an bug in the implementation of the option where
  posix_memalign() does not return ENOMEM. I filed an upstream bug
  https://bugs.llvm.org/show_bug.cgi?id=32968.

Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c
Reviewed-on: http://gerrit.cloudera.org:8080/6819
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins
---
M be/src/runtime/bufferpool/buffer-allocator-test.cc
M be/src/runtime/bufferpool/system-allocator.cc
M be/src/testutil/mem-util.h
M be/src/util/aligned-new.h
M be/src/util/bloom-filter.cc
M bin/run-backend-tests.sh
M bin/start-catalogd.sh
M bin/start-impalad.sh
M bin/start-statestored.sh
9 files changed, 12 insertions(+), 10 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly

2017-05-08 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-4866: Hash join node does not apply limits correctly
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6778/1//COMMIT_MSG
Commit Message:

PS1, Line 11: exchange
> typo
Done


http://gerrit.cloudera.org:8080/#/c/6778/1/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 506: DCHECK(status.ok());
> there are a lot of other places that this loop can exit. Can't the out_batc
I have moved the check outside the while loop, so that this gets invoked 
everytime we break out of it. 
Ran a bunch of queries and tried to verify that the limit gets applied 
correctly Found a few cases where num_rows_returned was not incremented and the 
ReachedLimit check was not correctly applied.


PS1, Line 580: 
> >= ?
Removed this check now. Please  ignore.


Line 581:   DCHECK(current_probe_row_ == NULL);
> I think you need to update num_rows_returned_ as well
Removed this check. Please  ignore.


http://gerrit.cloudera.org:8080/#/c/6778/1/tests/common/test_dimensions.py
File tests/common/test_dimensions.py:

PS1, Line 114: # Don't run with NUM_NODES=1 due to IMPALA-561
 : # ALL_CLUSTER_SIZES = [0, 1]
> update comment
Ran the private tests by changing this value ( but without the fix to correctly 
apply the limits). Current tests did not catch this. 
I am reverting it back and adding a few tests which can catch this issue.


-- 
To view, visit http://gerrit.cloudera.org:8080/6778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-05-08 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6812/1/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test:

Line 34: |  output: 
sum_zero_if_empty(functional_parquet.alltypes.parquet-stats: num_rows)
i don't know what this means.

could we reference 'count(*)' directly? that's going to be more intuitive than 
(num_rows)


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly

2017-05-08 Thread anujphadke (Code Review)
anujphadke has uploaded a new patch set (#2).

Change subject: IMPALA-4866: Hash join node does not apply limits correctly
..

IMPALA-4866: Hash join node does not apply limits correctly

Hash join node currently does not apply the limits correctly.
This issue gets masked most of the times since the planner sticks
an exchange node on top of most of the joins. This issue gets
exposed when NUM_NODES=1.

Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
---
M be/src/exec/partitioned-hash-join-node.cc
A 
testdata/workloads/functional-query/queries/QueryTest/single-node-joins-with-limits.test
M tests/query_test/test_join_queries.py
3 files changed, 56 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6778/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP

2017-05-08 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP
..


Patch Set 7: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6526/7/be/src/exec/kudu-util.h
File be/src/exec/kudu-util.h:

Line 89:   if (COPY_STRINGS) {
it doesn't feel like this particular if should be very performance-relevant. 
let's make it a standard function again, with an extra bool param.


http://gerrit.cloudera.org:8080/#/c/6526/7/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 1: // Licensed to the Apache Software Foundation (ASF) under one
this file is included in bunch of .h files. now that you pulled out a bunch of 
function bodies, those includes may not be necessary anymore. (i see one in 
util/promise.h, for instance). would be nice to clean that up as well.

doesn't have to be as part of this change, but it's good to do some 
housecleaning as you go.


-- 
To view, visit http://gerrit.cloudera.org:8080/6526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2
..

IMPALA-5085: large rows in BufferedTupleStreamV2

The stream defaults to pages of default_page_len_. If a row doesn't
fit in that page, it will allocate another page up to max_page_len_
bytes and append a single row to that page, then immediately advance
to the next page. This means that when writing a stream, the large
page only needs to be kept in memory temporarily, which helps with
memory requirements.  E.g. consider a hash join that is repartitioning
1 unpinned stream into 16 unpinned streams. We will need
default_page_len_ * 14 + max_page_len_ * 2 buffers because when
processing a large row we only need one large write buffer at a time.

The large row cases are not as optimised for memory consumption or
performance - queries processing very large numbers of large rows
are an extreme edge case that is likely to hit other performance
bottlenecks first. Pages with large rows can have up to 50%
internal fragmentation. I tried to avoid adding empty pages to the
stream, but it was tricky to avoid in the case of a read/write
stream where the read iterator was already pointing at the write
page - in that case we can end up with an empty page.

To avoid duplicating more logic between AddRow() and AllocateRow()
I restructured things so that AddRowSlow() is implemented in terms
of AllocateRowSlow(). AllocateRow() now takes a function as an
argument to populate the row.

Testing:
* Added tests for the case where 0 rows are added to the stream
* Extend BigRow to exercise the new code.
* Also test large strings and read/write streams.

Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
---
M be/src/runtime/buffered-tuple-stream-v2-test.cc
M be/src/runtime/buffered-tuple-stream-v2.cc
M be/src/runtime/buffered-tuple-stream-v2.h
M be/src/runtime/buffered-tuple-stream-v2.inline.h
M common/thrift/generate_error_codes.py
5 files changed, 537 insertions(+), 231 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6638/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2
..


Patch Set 3:

Rebased onto the async pin change

-- 
To view, visit http://gerrit.cloudera.org:8080/6638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-08 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 9:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 66: singleton_intermediate_tuple_(nullptr),
initialize in .h where possible


Line 88:   RowDescriptor build_row_desc(intermediate_tuple_desc_, false);
i would intuitively think that the build row is the input row.


Line 97: RETURN_IF_ERROR(build_expr->Init(state, build_row_desc));
it feels dicey to pass a reference to an automatic variable, Init() might hold 
on to it (and there is no reason it shouldn't - descriptors should last for the 
lifetime of the query).

this row descriptor should be member var and created in the c'tor.


Line 232:   ScalarExprEvaluator* const* evaluators = _evaluators_[0];
isn't that the same as const X**?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

Line 156:   friend class AggFnEvaluator;
why are these all friends? it looks like more of the "protected" functions need 
to be public. (why is Init() protected?)


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5158: Part 1: include untracked memory in MemTracker dumps

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-5158: Part 1: include untracked memory in MemTracker 
dumps
..

IMPALA-5158: Part 1: include untracked memory in MemTracker dumps

The MemTracker dumps are easier to understand if all children of the
process tracker actually add up to the process consumption. There is
always a discrepancy, which corresponds (mostly) to untracked memory.
This commit adds a new line item in the MemTracker dump which is the
process consumption value minus the sum of all its children.

Here is an example of the new output:

  Process: Limit=8.35 GB Total=123.74 MB Peak=219.93 MB
Free Disk IO Buffers: Total=1.37 MB Peak=1.37 MB
RequestPool=default-pool: Total=609.09 KB Peak=10.14 MB
  Query(fd40f6b7542f7e7c:4aef2723): Total=609.09 KB Peak=1.19 MB
Fragment fd40f6b7542f7e7c:4aef2723: Total=609.09 KB Peak=1.19 MB
  AGGREGATION_NODE (id=3): Total=24.00 KB Peak=36.00 KB
Exprs: Total=4.00 KB Peak=4.00 KB
  NESTED_LOOP_JOIN_NODE (id=2): Total=412.90 KB Peak=412.90 KB
Nested Loop Join Builder: Total=392.00 KB Peak=392.00 KB
  HDFS_SCAN_NODE (id=1): Total=160.00 KB Peak=768.00 KB
Exprs: Total=4.00 KB Peak=4.00 KB
  HDFS_SCAN_NODE (id=0): Total=0 Peak=344.00 KB
  PLAN_ROOT_SINK: Total=0 Peak=0
  CodeGen: Total=4.19 KB Peak=267.50 KB
Block Manager: Limit=6.68 GB Total=0 Peak=0
RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB
Untracked Memory: Total=121.78 MB

Change-Id: I77e5b2ef5f85f82ee0226e4dad638233fec9428f
---
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
2 files changed, 46 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/6820/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77e5b2ef5f85f82ee0226e4dad638233fec9428f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 14:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/549/

-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-05-08 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 14: Code-Review+2

fixing some 'unused results' warnings.

-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-05-08 Thread Marcel Kornacker (Code Review)
Hello Impala Public Jenkins, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6535

to look at the new patch set (#14).

Change subject: IMPALA-2550: Switch to per-query exec rpc
..

IMPALA-2550: Switch to per-query exec rpc

Coordinator:
- FragmentInstanceState -> BackendState, which in turn records
  FragmentInstanceStats

QueryState
- does query-wide setup in a separate thread (which also launches
  the instance exec threads)
- has a query-wide 'prepared' state at which point all static setup
  is done and all FragmentInstanceStates are accessible

Also renamed QueryExecState to ClientRequestState.

Simplified handling of execution status (in FragmentInstanceState):
- status only transmitted via ReportExecStatus rpc
- in particular, it's not returned anymore from the Cancel rpc

FIS: Fixed bugs related to partially-prepared state (in Close() and 
ReleaseThreadToken())

Change-Id: I20769e420711737b6b385c744cef4851cee3facd
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/codegen-anyval.h
M be/src/common/status.cc
M be/src/common/status.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exprs/expr-test.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-block-mgr-test.cc
M be/src/runtime/buffered-tuple-stream-v2-test.cc
M be/src/runtime/bufferpool/buffer-pool-test.cc
A be/src/runtime/coordinator-backend-state.cc
A be/src/runtime/coordinator-backend-state.h
A be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-test.cc
A be/src/runtime/debug-options.cc
A be/src/runtime/debug-options.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
D be/src/runtime/plan-fragment-executor.cc
D be/src/runtime/plan-fragment-executor.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/runtime/test-env.h
M be/src/scheduling/query-schedule.cc
M be/src/scheduling/query-schedule.h
M be/src/service/CMakeLists.txt
M be/src/service/child-query.cc
M be/src/service/child-query.h
R be/src/service/client-request-state.cc
R be/src/service/client-request-state.h
M be/src/service/fe-support.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/desc-tbl-builder.cc
M be/src/testutil/fault-injection-util.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/uid-util.h
M common/thrift/ExecStats.thrift
M common/thrift/ImpalaInternalService.thrift
M tests/common/test_result_verifier.py
73 files changed, 3,884 insertions(+), 3,782 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/6535/14
-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Dimitris Tsirogiannis
If you don't populate it for CREATE TABLE SORT BY() then I think you should
remove it altogether.

Dimitris

On Mon, May 8, 2017 at 4:33 PM, Lars Volker  wrote:

>
>
> On Tue, May 9, 2017 at 1:24 AM, Dimitris Tsirogiannis <
> dtsirogian...@cloudera.com> wrote:
>
>> The other alternative would be to populate it with the partitioning
>> columns, if any. Thoughts?
>>
>>
> Adding the partitioning columns to the sort by list is not supported. They
> can be added to the pre-insert sort not using the clustered hint. I'm not
> sure though if I understood your statement correctly.
>
> My question was what to do if a user executes "ALTER TABLE SORT BY();",
> meaning to remove all sort by columns. Should we remove all columns from
> the property, or remove the property altogether?
>
>
>> Dimitris
>>
>> On Mon, May 8, 2017 at 4:02 PM, Lars Volker  wrote:
>>
>>>
>>> On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker 
>>> wrote:
>>>


 On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis <
 dtsirogian...@cloudera.com> wrote:

> I believe it sends the wrong message and is probably confusing to
> throw an error when someone writes a CREATE TABLE with an empty SORT BY()
> but allow the same clause in an ALTER. No doubt the users can read the
> documentation and figure it out but its an extra step. Also, as Marcel
> mentions, scripting may be easier as you don't have to figure out whether
> to add a clause or not. Anyways, the patch is great as is, I just wanted 
> to
> mention this.
>

 Sounds like we should allow an empty list then.

>>>
>>> I will change the parser accordingly. On a related note, ALTER TABLE
>>> SORT BY () will leave an empty property 'sort.by.columns'. Should it remove
>>> the property altogether instead?
>>>


>
>
> Dimitris
>
> On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker 
> wrote:
>
>>
>>
>> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis <
>> dtsirogian...@cloudera.com> wrote:
>>
>>> For consistency, I believe we should at least allow an empty SORT
>>> BY() clause in the CREATE TABLE statement, but I'll defer the decision 
>>> to
>>> Alex or Marcel.
>>>
>>
>> Do we think there'll be scripts generating create table statements
>> with empty sort-by clauses? I'm not sure there's a benefit to supporting
>> that (but happy to stand corrected).
>>
>>
>>>
>>> Dimitris
>>>
>>> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
>>> ger...@cloudera.org> wrote:
>>>
 Lars Volker has posted comments on this change.

 Change subject: IMPALA-4166: Add SORT BY sql clause
 
 ..


 Patch Set 20:

 > There is still one thing that is not clear to me. Why is it
 allowed
  > to do an ALTER TABLE with an empty SORT BY clause but not a
 CREATE
  > TABLE?

 The easiest way to specify an empty list of sort by columns during
 CREATE TABLE is to omit the SORT BY clause altogether. To keep things
 simple I did not add additional support for an empty SORT BY () clause.

 For ALTER TABLE there needs to be a way to specify an empty list of
 columns, but since SORT BY is used to identify the command, the most 
 simple
 form seemed to be an empty column list().

 Do you think we should allow CREATE TABLE SORT BY() in addition to
 specify an empty set of column?

 --
 To view, visit http://gerrit.cloudera.org:8080/6495
 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

 Gerrit-MessageType: comment
 Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
 Gerrit-PatchSet: 20
 Gerrit-Project: Impala-ASF
 Gerrit-Branch: master
 Gerrit-Owner: Lars Volker 
 Gerrit-Reviewer: Alex Behm 
 Gerrit-Reviewer: Dimitris Tsirogiannis 
 Gerrit-Reviewer: Lars Volker 
 Gerrit-Reviewer: Marcel Kornacker 
 Gerrit-Reviewer: Thomas Tauber-Marshall 
 Gerrit-HasComments: No

>>>
>>>
>>
>

>>>
>>
>


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 9:

(10 comments)

Still trying to come to grips with it all but did an initial pass over the 
query execution changes.

http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 122:   /// Output expressions to convert row batches onto output values.
It's confusing that these aren't used in all subclasses of DataSink (e.g. 
DataStreamSender, JoinBuildSink). Clarify that not all sinks use these?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

PS9, Line 69: expr_mem_pool
Does this mean that multiple scanner threads share a single MemPool? That 
doesn't seem safe.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/sort-node.h
File be/src/exec/sort-node.h:

PS9, Line 68: materialize_tuple_
materialize_tuple_ doesn't seem to be defined. I think we always need these 
exprs anyway, right?


Line 69:   std::vector slot_materialize_exprs_;
See my comment about the name in sorter.h.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/subplan-node.h
File be/src/exec/subplan-node.h:

Line 65:   /// tree rooted at 'node'.
* and do any initialization that is required as a result of setting the subplan.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

PS9, Line 78: slot_materialize_exprs_
output_tuple_exprs_? Usually when we have exprs materializing tuples it's named 
after the tuple it's materializing.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

PS9, Line 94: Clone
Maybe this should be called ShallowClone() to make it more explicit. Without 
reading the function comment, I would assume it did a deep clone.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

PS9, Line 182: slot_materialize_exprs_
Maybe something like 'sort_tuple_exprs_'? I think the important thing is that 
there is one expr per slot in the sort tuple.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

Line 49:   ScalarExprEvaluator::Close(key_expr_evaluators_lhs_, state);
opened_ = false?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

PS9, Line 141: key_expr_evaluators_lhs_
ordering_expr_evaluators_lhs_ to match the expr name?


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message

2017-05-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5197: Erroneous corrupted Parquet file message
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Wood 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 21:

Rebased, added support for CREATE TABLE SORT BY() with empty sort by list. Will 
need to rebase again though, since some conflicting change got merge in the 
meantime.

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#21).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can only be specified for Hdfs tables and effectiveness
may vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort by columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
33 files changed, 1,379 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/21
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-05-08 Thread Henry Robinson (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5715

to look at the new patch set (#10).

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..

IMPALA-4669: [KUTIL] Add kudu_util library to the build.

A few miscellaneous changes to allow kudu_util to compile with Impala.

Add kudu_version.cc to substitute for the version.cc file that is
automatically built during the full Kudu build.

Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to
use deprecated names for LZ4 methods.

Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to
disable building with nvm support, removing a library dependency.

Also remove imported FindOpenSSL.cmake in favour of the standard one provided
by cmake itself.

Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/common/CMakeLists.txt
A be/src/common/kudu_version.cc
M be/src/common/logging.cc
M be/src/exec/kudu-util.h
A be/src/kudu/gutil
M be/src/kudu/util/CMakeLists.txt
M be/src/kudu/util/cache.cc
M be/src/kudu/util/compression/compression_codec.cc
M be/src/kudu/util/env.cc
M be/src/kudu/util/flags.cc
A be/src/kudu/util/kudu_export.h
M be/src/kudu/util/logging.cc
M be/src/kudu/util/minidump.cc
D cmake_modules/FindOpenSSL.cmake
16 files changed, 137 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5715/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Lars Volker
On Tue, May 9, 2017 at 1:24 AM, Dimitris Tsirogiannis <
dtsirogian...@cloudera.com> wrote:

> The other alternative would be to populate it with the partitioning
> columns, if any. Thoughts?
>
>
Adding the partitioning columns to the sort by list is not supported. They
can be added to the pre-insert sort not using the clustered hint. I'm not
sure though if I understood your statement correctly.

My question was what to do if a user executes "ALTER TABLE SORT BY();",
meaning to remove all sort by columns. Should we remove all columns from
the property, or remove the property altogether?


> Dimitris
>
> On Mon, May 8, 2017 at 4:02 PM, Lars Volker  wrote:
>
>>
>> On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker 
>> wrote:
>>
>>>
>>>
>>> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis <
>>> dtsirogian...@cloudera.com> wrote:
>>>
 I believe it sends the wrong message and is probably confusing to throw
 an error when someone writes a CREATE TABLE with an empty SORT BY() but
 allow the same clause in an ALTER. No doubt the users can read the
 documentation and figure it out but its an extra step. Also, as Marcel
 mentions, scripting may be easier as you don't have to figure out whether
 to add a clause or not. Anyways, the patch is great as is, I just wanted to
 mention this.

>>>
>>> Sounds like we should allow an empty list then.
>>>
>>
>> I will change the parser accordingly. On a related note, ALTER TABLE SORT
>> BY () will leave an empty property 'sort.by.columns'. Should it remove the
>> property altogether instead?
>>
>>>
>>>


 Dimitris

 On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker 
 wrote:

>
>
> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis <
> dtsirogian...@cloudera.com> wrote:
>
>> For consistency, I believe we should at least allow an empty SORT
>> BY() clause in the CREATE TABLE statement, but I'll defer the decision to
>> Alex or Marcel.
>>
>
> Do we think there'll be scripts generating create table statements
> with empty sort-by clauses? I'm not sure there's a benefit to supporting
> that (but happy to stand corrected).
>
>
>>
>> Dimitris
>>
>> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
>> ger...@cloudera.org> wrote:
>>
>>> Lars Volker has posted comments on this change.
>>>
>>> Change subject: IMPALA-4166: Add SORT BY sql clause
>>> 
>>> ..
>>>
>>>
>>> Patch Set 20:
>>>
>>> > There is still one thing that is not clear to me. Why is it allowed
>>>  > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE
>>>  > TABLE?
>>>
>>> The easiest way to specify an empty list of sort by columns during
>>> CREATE TABLE is to omit the SORT BY clause altogether. To keep things
>>> simple I did not add additional support for an empty SORT BY () clause.
>>>
>>> For ALTER TABLE there needs to be a way to specify an empty list of
>>> columns, but since SORT BY is used to identify the command, the most 
>>> simple
>>> form seemed to be an empty column list().
>>>
>>> Do you think we should allow CREATE TABLE SORT BY() in addition to
>>> specify an empty set of column?
>>>
>>> --
>>> To view, visit http://gerrit.cloudera.org:8080/6495
>>> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>>>
>>> Gerrit-MessageType: comment
>>> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
>>> Gerrit-PatchSet: 20
>>> Gerrit-Project: Impala-ASF
>>> Gerrit-Branch: master
>>> Gerrit-Owner: Lars Volker 
>>> Gerrit-Reviewer: Alex Behm 
>>> Gerrit-Reviewer: Dimitris Tsirogiannis 
>>> Gerrit-Reviewer: Lars Volker 
>>> Gerrit-Reviewer: Marcel Kornacker 
>>> Gerrit-Reviewer: Thomas Tauber-Marshall 
>>> Gerrit-HasComments: No
>>>
>>
>>
>

>>>
>>
>


[Impala-ASF-CR] Bump Kudu version to 7533364

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to 7533364
..


Patch Set 2: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/545/

-- 
To view, visit http://gerrit.cloudera.org:8080/6797
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Dimitris Tsirogiannis
The other alternative would be to populate it with the partitioning
columns, if any. Thoughts?

Dimitris

On Mon, May 8, 2017 at 4:02 PM, Lars Volker  wrote:

>
> On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker 
> wrote:
>
>>
>>
>> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis <
>> dtsirogian...@cloudera.com> wrote:
>>
>>> I believe it sends the wrong message and is probably confusing to throw
>>> an error when someone writes a CREATE TABLE with an empty SORT BY() but
>>> allow the same clause in an ALTER. No doubt the users can read the
>>> documentation and figure it out but its an extra step. Also, as Marcel
>>> mentions, scripting may be easier as you don't have to figure out whether
>>> to add a clause or not. Anyways, the patch is great as is, I just wanted to
>>> mention this.
>>>
>>
>> Sounds like we should allow an empty list then.
>>
>
> I will change the parser accordingly. On a related note, ALTER TABLE SORT
> BY () will leave an empty property 'sort.by.columns'. Should it remove the
> property altogether instead?
>
>>
>>
>>>
>>>
>>> Dimitris
>>>
>>> On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker 
>>> wrote:
>>>


 On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis <
 dtsirogian...@cloudera.com> wrote:

> For consistency, I believe we should at least allow an empty SORT BY()
> clause in the CREATE TABLE statement, but I'll defer the decision to Alex
> or Marcel.
>

 Do we think there'll be scripts generating create table statements with
 empty sort-by clauses? I'm not sure there's a benefit to supporting that
 (but happy to stand corrected).


>
> Dimitris
>
> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
> ger...@cloudera.org> wrote:
>
>> Lars Volker has posted comments on this change.
>>
>> Change subject: IMPALA-4166: Add SORT BY sql clause
>> 
>> ..
>>
>>
>> Patch Set 20:
>>
>> > There is still one thing that is not clear to me. Why is it allowed
>>  > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE
>>  > TABLE?
>>
>> The easiest way to specify an empty list of sort by columns during
>> CREATE TABLE is to omit the SORT BY clause altogether. To keep things
>> simple I did not add additional support for an empty SORT BY () clause.
>>
>> For ALTER TABLE there needs to be a way to specify an empty list of
>> columns, but since SORT BY is used to identify the command, the most 
>> simple
>> form seemed to be an empty column list().
>>
>> Do you think we should allow CREATE TABLE SORT BY() in addition to
>> specify an empty set of column?
>>
>> --
>> To view, visit http://gerrit.cloudera.org:8080/6495
>> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>>
>> Gerrit-MessageType: comment
>> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
>> Gerrit-PatchSet: 20
>> Gerrit-Project: Impala-ASF
>> Gerrit-Branch: master
>> Gerrit-Owner: Lars Volker 
>> Gerrit-Reviewer: Alex Behm 
>> Gerrit-Reviewer: Dimitris Tsirogiannis 
>> Gerrit-Reviewer: Lars Volker 
>> Gerrit-Reviewer: Marcel Kornacker 
>> Gerrit-Reviewer: Thomas Tauber-Marshall 
>> Gerrit-HasComments: No
>>
>
>

>>>
>>
>


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5715/7/be/src/kudu/util/status.h
File be/src/kudu/util/status.h:

Line 122: #ifndef KUDU_LOG
> Fixed it properly: the alternative definition was coming from stubs.h, but 
I don't see a change here


-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 13: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/544/

-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Lars Volker
On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker 
wrote:

>
>
> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis <
> dtsirogian...@cloudera.com> wrote:
>
>> I believe it sends the wrong message and is probably confusing to throw
>> an error when someone writes a CREATE TABLE with an empty SORT BY() but
>> allow the same clause in an ALTER. No doubt the users can read the
>> documentation and figure it out but its an extra step. Also, as Marcel
>> mentions, scripting may be easier as you don't have to figure out whether
>> to add a clause or not. Anyways, the patch is great as is, I just wanted to
>> mention this.
>>
>
> Sounds like we should allow an empty list then.
>

I will change the parser accordingly. On a related note, ALTER TABLE SORT
BY () will leave an empty property 'sort.by.columns'. Should it remove the
property altogether instead?

>
>
>>
>>
>> Dimitris
>>
>> On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker 
>> wrote:
>>
>>>
>>>
>>> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis <
>>> dtsirogian...@cloudera.com> wrote:
>>>
 For consistency, I believe we should at least allow an empty SORT BY()
 clause in the CREATE TABLE statement, but I'll defer the decision to Alex
 or Marcel.

>>>
>>> Do we think there'll be scripts generating create table statements with
>>> empty sort-by clauses? I'm not sure there's a benefit to supporting that
>>> (but happy to stand corrected).
>>>
>>>

 Dimitris

 On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
 ger...@cloudera.org> wrote:

> Lars Volker has posted comments on this change.
>
> Change subject: IMPALA-4166: Add SORT BY sql clause
> ..
>
>
> Patch Set 20:
>
> > There is still one thing that is not clear to me. Why is it allowed
>  > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE
>  > TABLE?
>
> The easiest way to specify an empty list of sort by columns during
> CREATE TABLE is to omit the SORT BY clause altogether. To keep things
> simple I did not add additional support for an empty SORT BY () clause.
>
> For ALTER TABLE there needs to be a way to specify an empty list of
> columns, but since SORT BY is used to identify the command, the most 
> simple
> form seemed to be an empty column list().
>
> Do you think we should allow CREATE TABLE SORT BY() in addition to
> specify an empty set of column?
>
> --
> To view, visit http://gerrit.cloudera.org:8080/6495
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
> Gerrit-PatchSet: 20
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Lars Volker 
> Gerrit-Reviewer: Alex Behm 
> Gerrit-Reviewer: Dimitris Tsirogiannis 
> Gerrit-Reviewer: Lars Volker 
> Gerrit-Reviewer: Marcel Kornacker 
> Gerrit-Reviewer: Thomas Tauber-Marshall 
> Gerrit-HasComments: No
>


>>>
>>
>


[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP

2017-05-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#7).

Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP
..

IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP

Adds Impala support for TIMESTAMP types stored in Kudu.

Impala stores TIMESTAMP values in 96-bits and has nanosecond
precision. Kudu's timestamp is a 64-bit microsecond delta
from the Unix epoch (called UNIXTIME_MICROS), so a conversion
is necessary.

When writing to Kudu, TIMESTAMP values in nanoseconds are
averaged to the nearest microsecond.

When reading from Kudu, the KuduScanner returns
UNIXTIME_MICROS with 8bytes of padding so Impala can convert
the value to a TimestampValue in-line and copy the entire
row.
TODO: Kudu still needs to provide a knob to enable this:
  https://gerrit.cloudera.org/#/c/6624/

Testing:
Updated the functional_kudu schema to use TIMESTAMPs instead
of converting to STRING, so this provides some decent
coverage. Some BE tests were added, and some EE tests as
well.

TODO: Support pushing down TIMESTAMP predicates
TODO: Support TIMESTAMPs in range partitioning expressions

Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
A be/src/runtime/timestamp-value.inline.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/datasets/functional/functional_schema_template.sql
A 
testdata/workloads/functional-query/queries/QueryTest/kudu-overflow-ts-abort-on-error.test
A testdata/workloads/functional-query/queries/QueryTest/kudu-overflow-ts.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M tests/query_test/test_kudu.py
M tests/query_test/test_queries.py
M tests/query_test/test_scanners.py
26 files changed, 497 insertions(+), 231 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6526/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types

2017-05-08 Thread Lars Volker (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6563

to look at the new patch set (#12).

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet 
Statistics for remaining types
..

IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for 
remaining types

This change adds functionality to write and read parquet::Statistics for
Decimal, String, and Timestamp values. As an exception, we don't read
statistics for CHAR columns, since CHAR support is broken in Impala
(IMPALA-1652).

This change also switches from using the deprecated fields 'min' and
'max' to populate the new fields 'min_value' and 'max_value' in
parquet::Statistics, that were added in parquet-format pull request #46.

The HdfsParquetScanner will preferably read the new fields if they are
populated and if the column order 'TypeDefinedOrder' has been used to
compute the statistics. For columns without a column order set or with
only the deprecated fields populated, the scanner will read them only if
they are of simple numeric type, i.e. boolean, integer, or floating
point.

This change removes the validation of the Parquet Statistics we write to
Hive from the tests, since Hive does not write the new fields. Instead
it adds a parquet file written by Hive that uses the deprecated fields
for its statistics. It uses that file to exercise the fallback logic for
supported types in a test.

This change also cleans up the interface of ParquetPlainEncoder in
parquet-common.h.

Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exec/parquet-plain-test.cc
M be/src/util/dict-encoding.h
M common/thrift/parquet.thrift
M testdata/data/README
A testdata/data/deprecated_statistics.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_parquet_stats.py
19 files changed, 976 insertions(+), 349 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types

2017-05-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet 
Statistics for remaining types
..


Patch Set 11:

Thanks for having a look.

-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types

2017-05-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet 
Statistics for remaining types
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6563/11/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test:

Line 184: aggregation(SUM, NumDictFilteredRowGroups): 2
> I don't think it will reduce coverage. '01/01/11' is a value that doesn't e
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5158: Part 1: include untracked memory in MemTracker dumps

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6820

Change subject: IMPALA-5158: Part 1: include untracked memory in MemTracker 
dumps
..

IMPALA-5158: Part 1: include untracked memory in MemTracker dumps

The MemTracker dumps are easier to understand if all children of the
process tracker actually add up to the process consumption. There is
always a discrepancy, which corresponds (mostly) to untracked memory.
This commit adds a new line item in the MemTracker dump which is the
process consumption value minus the sum of all its children.

Here is an example of the new output:

  Process: Limit=8.35 GB Total=123.74 MB Peak=219.93 MB
Free Disk IO Buffers: Total=1.37 MB Peak=1.37 MB
RequestPool=default-pool: Total=609.09 KB Peak=10.14 MB
  Query(fd40f6b7542f7e7c:4aef2723): Total=609.09 KB Peak=1.19 MB
Fragment fd40f6b7542f7e7c:4aef2723: Total=609.09 KB Peak=1.19 MB
  AGGREGATION_NODE (id=3): Total=24.00 KB Peak=36.00 KB
Exprs: Total=4.00 KB Peak=4.00 KB
  NESTED_LOOP_JOIN_NODE (id=2): Total=412.90 KB Peak=412.90 KB
Nested Loop Join Builder: Total=392.00 KB Peak=392.00 KB
  HDFS_SCAN_NODE (id=1): Total=160.00 KB Peak=768.00 KB
Exprs: Total=4.00 KB Peak=4.00 KB
  HDFS_SCAN_NODE (id=0): Total=0 Peak=344.00 KB
  PLAN_ROOT_SINK: Total=0 Peak=0
  CodeGen: Total=4.19 KB Peak=267.50 KB
Block Manager: Limit=6.68 GB Total=0 Peak=0
RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB
Untracked Memory: Total=121.78 MB

Change-Id: I77e5b2ef5f85f82ee0226e4dad638233fec9428f
---
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
2 files changed, 47 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/6820/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I77e5b2ef5f85f82ee0226e4dad638233fec9428f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types

2017-05-08 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet 
Statistics for remaining types
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6563/11/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test:

Line 184: aggregation(SUM, NumDictFilteredRowGroups): 2
> Joe, stats filtering reduced the number or row groups that make it into dic
I don't think it will reduce coverage. '01/01/11' is a value that doesn't 
exist, and I just want to make sure we can eliminate row groups based on that. 
Could you add a line showing that stats filtering is eliminating the rest:
aggregation(SUM, NumStatsFilteredRowGroups): 22


-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5207,IMPALA-5214: distcc fixes

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5207,IMPALA-5214: distcc fixes
..


IMPALA-5207,IMPALA-5214: distcc fixes

enable_distcc should reset IMPALA_DISTCC_ENABLED, otherwise
disable_distcc is not reversible.

Remove the requirement of a toolchain at /opt/Impala-Toolchain. We can
easily identify paths starting with $IMPALA_TOOLCHAIN and remap them in
distcc.sh.

Testing:
Did a local build with distcc with IMPALA_TOOLCHAIN at a different
location. Tried toggling disable_distcc and enable_distcc.

Change-Id: Ic6456d0101cd15287c543cb576be6cd2391f1f26
Reviewed-on: http://gerrit.cloudera.org:8080/6655
Reviewed-by: Tim Armstrong 
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M bin/distcc/README.md
M bin/distcc/distcc.sh
M bin/distcc/distcc_env.sh
3 files changed, 12 insertions(+), 25 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved
  Tim Armstrong: Looks good to me, but someone else must approve



-- 
To view, visit http://gerrit.cloudera.org:8080/6655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic6456d0101cd15287c543cb576be6cd2391f1f26
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message

2017-05-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5197: Erroneous corrupted Parquet file message
..


Patch Set 4: Code-Review+1

(1 comment)

Thanks for trying alternate testing routes. I'm ok with these changes.

http://gerrit.cloudera.org:8080/#/c/6787/4/tests/failure/test_failpoints.py
File tests/failure/test_failpoints.py:

Line 140:   # IMPALA-5197: None of the debug action should trigger 
corrupted file message
actions


-- 
To view, visit http://gerrit.cloudera.org:8080/6787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Wood 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types

2017-05-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet 
Statistics for remaining types
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6563/11/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test:

Line 184: aggregation(SUM, NumDictFilteredRowGroups): 2
Joe, stats filtering reduced the number or row groups that make it into dict 
filtering. Will this reduce test coverage for dict filtering?


-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types

2017-05-08 Thread Lars Volker (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6563

to look at the new patch set (#11).

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet 
Statistics for remaining types
..

IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for 
remaining types

This change adds functionality to write and read parquet::Statistics for
Decimal, String, and Timestamp values. As an exception, we don't read
statistics for CHAR columns, since CHAR support is broken in Impala
(IMPALA-1652).

This change also switches from using the deprecated fields 'min' and
'max' to populate the new fields 'min_value' and 'max_value' in
parquet::Statistics, that were added in parquet-format pull request #46.

The HdfsParquetScanner will preferably read the new fields if they are
populated and if the column order 'TypeDefinedOrder' has been used to
compute the statistics. For columns without a column order set or with
only the deprecated fields populated, the scanner will read them only if
they are of simple numeric type, i.e. boolean, integer, or floating
point.

This change removes the validation of the Parquet Statistics we write to
Hive from the tests, since Hive does not write the new fields. Instead
it adds a parquet file written by Hive that uses the deprecated fields
for its statistics. It uses that file to exercise the fallback logic for
supported types in a test.

This change also cleans up the interface of ParquetPlainEncoder in
parquet-common.h.

Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exec/parquet-plain-test.cc
M be/src/util/dict-encoding.h
M common/thrift/parquet.thrift
M testdata/data/README
A testdata/data/deprecated_statistics.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_parquet_stats.py
19 files changed, 975 insertions(+), 349 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-05-08 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Patch Set 8:

(2 comments)

PS8 is just a rebase. PS9 includes the review responses.

http://gerrit.cloudera.org:8080/#/c/5715/7/be/src/kudu/util/compression/compression_codec.cc
File be/src/kudu/util/compression/compression_codec.cc:

Line 28: #ifdef DISALLOW_COPY_AND_ASSIGN
> Maybe comment that snappy.h redefines this so that we don't lose track of w
Done


http://gerrit.cloudera.org:8080/#/c/5715/7/be/src/kudu/util/status.h
File be/src/kudu/util/status.h:

Line 122: #ifndef KUDU_LOG
> Is this still needed? I don't see how this can get defined twice.
Fixed it properly: the alternative definition was coming from stubs.h, but we 
can prevent that include with a compile-time definition.


-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-05-08 Thread Henry Robinson (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5715

to look at the new patch set (#9).

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..

IMPALA-4669: [KUTIL] Add kudu_util library to the build.

A few miscellaneous changes to allow kudu_util to compile with Impala.

Add kudu_version.cc to substitute for the version.cc file that is
automatically built during the full Kudu build.

Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to
use deprecated names for LZ4 methods.

Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to
disable building with nvm support, removing a library dependency.

Also remove imported FindOpenSSL.cmake in favour of the standard one provided
by cmake itself.

Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/common/CMakeLists.txt
A be/src/common/kudu_version.cc
M be/src/common/logging.cc
M be/src/exec/kudu-util.h
A be/src/kudu/gutil
M be/src/kudu/util/CMakeLists.txt
M be/src/kudu/util/cache.cc
M be/src/kudu/util/compression/compression_codec.cc
M be/src/kudu/util/env.cc
M be/src/kudu/util/flags.cc
A be/src/kudu/util/kudu_export.h
M be/src/kudu/util/logging.cc
M be/src/kudu/util/minidump.cc
M be/src/kudu/util/status.h
D cmake_modules/FindOpenSSL.cmake
17 files changed, 139 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5715/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP

2017-05-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP
..


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 150:   row_format_flags |= 
kudu::client::KuduScanner::PAD_UNIXTIME_MICROS_TO_16_BYTES;
> why start with no_flags instead of setting this directly in l149?
Done


Line 201:   if (slot->type().type != TYPE_TIMESTAMP) continue;
> pull out timestamp slots into a separate vector (which you set up in prepar
Done


http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/exec/kudu-util.cc
File be/src/exec/kudu-util.cc:

Line 111: Status WriteKuduRowValue(kudu::KuduPartialRow* row, int col, 
PrimitiveType type,
> fix signature (output params go at end).
Done.

I think the order may have been such that copy_strings can have a default 
parameter. I moved 'copy_strings' to a template variable though to avoid 
branching when calling this. That also means this fn is moved to the header, 
feel free to push back if you think that warrants more build-time impact than 
it is worth.


http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

Line 19: #include "runtime/timestamp-value.inline.h"
> remove this, unless you're calling these functions here (which doesn't appe
Done


Line 98: if (UNLIKELY(nullptr == localtime_r(, ))) {
> fix order
Done


http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 25
> are there other includes you can remove?
Looks like I can remove another boost header. I think I still need gregorian 
and local_time for types used in function definitions in this file. 
compiler_config appears to affect compilation options [1], and seems risky to 
touch without understanding it more.

1: http://www.boost.org/doc/libs/1_64_0/boost/date_time/compiler_config.hpp


http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

Line 34: bool TimestampValue::UtcToUnixTime(time_t* unix_time) const {
> you need to prefix all of these with inline.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/547/

-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 3:

(1 comment)

Thanks for the reviews!

http://gerrit.cloudera.org:8080/#/c/6817/3/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

PS3, Line 2092: '${{env:IMPALA_HOME}}
> Sorry that you did more work. I just wanted to know the difference and why 
No worries, I had tried {impala_home} myself but it didn't work. Your question 
made me curious why, so I learned something.


-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5239: transfer reservations between trackers

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-5239: transfer reservations between trackers
..

IMPALA-5239: transfer reservations between trackers

This is a primitive needed to implement claiming and distribution
of initial reservations. It supports transferring reservation between
any two ReservationTrackers under the same query.

Testing:
* Added a test that exercises transfer between trackers at different
  levels and with different relationships.

Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
---
M be/src/runtime/bufferpool/reservation-tracker-test.cc
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/bufferpool/reservation-tracker.h
3 files changed, 243 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6708/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6708
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message

2017-05-08 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#4).

Change subject: IMPALA-5197: Erroneous corrupted Parquet file message
..

IMPALA-5197: Erroneous corrupted Parquet file message

The Parquet file column reader may fail in the middle
of producing a scratch tuple batch for various reasons
such as exceeding memory limit or cancellation. In which
case, the scratch tuple batch may not have materialized
all the rows in a row group. We shouldn't erroneously
report that the file is corrupted in this case as the
column reader didn't completely read the entire row group.

A new test case is added to verify that we won't see this
error message. A new failpoint phase GETNEXT_SCANNER is
also added to differentiate it from the GETNEXT in the
scan node itself.

Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M common/thrift/PlanNodes.thrift
M tests/common/impala_service.py
M tests/failure/test_failpoints.py
9 files changed, 40 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/6787/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Wood 


[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5245: fix ASAN buffer-allocator-test
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/546/

-- 
To view, visit http://gerrit.cloudera.org:8080/6819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6817/3/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

PS3, Line 2092: '{impala_home}/testda
> Specifying {impala_home} here didn't work, since it does not get replaced i
Sorry that you did more work. I just wanted to know the difference and why it 
existed.


-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-05-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6812/1//COMMIT_MSG
Commit Message:

PS1, Line 10: statistic
> Suggestions? How important do you think it is to distinguish the Parquet nu
How about "we use the Parquet field RowGroup.num_rows"?


http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 440:   *dst_slot = file_metadata_.row_groups[row_group_idx_].num_rows;
There's also FileMetaData::num_rows. Can't we use that instead of looping over 
the row groups?


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message

2017-05-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5197: Erroneous corrupted Parquet file message
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6787/3/tests/failure/test_failpoints.py
File tests/failure/test_failpoints.py:

PS3, Line 140: IMPALA-5251
Oops. I meant IMPALA-5197.


-- 
To view, visit http://gerrit.cloudera.org:8080/6787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Wood 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5184: build fe against both Hive 1 & 2 APIs

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5184: build fe against both Hive 1 & 2 APIs
..


IMPALA-5184: build fe against both Hive 1 & 2 APIs

This adds a compatibility shim layer with Hive 1 and Hive 2
implementations. The version-specific code lives in
fe/src/compat-hive-$IMPALA_HIVE_MAJOR_VERSION and
common/thrift/hive-$IMPALA_HIVE_MAJOR_VERSION-api/

The shim adds wrapper methods to handle differing method signatures
and and config variables that changed slightly.

Some thrift classes were also moved from the 'cli' to 'rpc' package.
We work around these by implementing subclasses with the same name
in a different package for compatibility or by implementing shim
methods that operate on the classes. We also need to change the
package in the TCLIService.thrift, which is done with a
search-and-replace.

Also avoid the sticky config variable problem with some of the source
paths by requiring an _OVERRIDE suffix on the variable to override it
from the environment.

Testing:
Made sure that I could build Impala on master as normal, and also
with the following config overrides in bin/impala-config-local.sh:

  export IMPALA_HADOOP_VERSION=3.0.0-alpha1-cdh6.x-SNAPSHOT
  export IMPALA_HBASE_VERSION=2.0.0-cdh6.x-SNAPSHOT
  export IMPALA_HIVE_VERSION=2.1.0-cdh6.x-SNAPSHOT
  export IMPALA_SENTRY_VERSION=1.5.1-cdh6.x-SNAPSHOT
  export IMPALA_PARQUET_VERSION=1.5.0-cdh6.x-SNAPSHOT

I manually assembled the dependencies by copying the following files
from the Hive 2 source and from a Hadoop 3 build:
$CDH_COMPONENTS_HOME/hive-2.1.0-cdh6.x-SNAPSHOT/src/metastore/if/hive_metastore.thrift
$CDH_COMPONENTS_HOME/hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/lib/native/*
$CDH_COMPONENTS_HOME/hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/include/hdfs.h

Change-Id: Ifbc265281c04fe3136bc3c920dbac966742ce09a
Reviewed-on: http://gerrit.cloudera.org:8080/5538
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
R common/thrift/hive-1-api/TCLIService.thrift
M fe/pom.xml
A 
fe/src/compat-hive-1/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java
A 
fe/src/compat-hive-1/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java
A 
fe/src/compat-hive-1/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java
A fe/src/compat-hive-1/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java
A 
fe/src/compat-hive-1/java/org/apache/hive/service/rpc/thrift/TGetSchemasReq.java
A 
fe/src/compat-hive-1/java/org/apache/hive/service/rpc/thrift/TGetTablesReq.java
A fe/src/compat-hive-1/java/org/apache/impala/compat/MetastoreShim.java
A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/analysis/TypeDef.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
27 files changed, 492 insertions(+), 85 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ifbc265281c04fe3136bc3c920dbac966742ce09a
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5184: build fe against both Hive 1 & 2 APIs

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5184: build fe against both Hive 1 & 2 APIs
..


Patch Set 11: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbc265281c04fe3136bc3c920dbac966742ce09a
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message

2017-05-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5197: Erroneous corrupted Parquet file message
..


Patch Set 3:

PS3 added a new assert in test_failpoints.py and removed the new test in 
test_scanners.py.

-- 
To view, visit http://gerrit.cloudera.org:8080/6787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Wood 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 3:

(5 comments)

Thanks for the comments. Please see PS4.

http://gerrit.cloudera.org:8080/#/c/6817/3/testdata/data/README
File testdata/data/README:

PS3, Line 100: ,,
> double comma
Done


http://gerrit.cloudera.org:8080/#/c/6817/3/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

PS3, Line 2092: '${{env:IMPALA_HOME}}
> How does this differ from {impala_home} ?
Specifying {impala_home} here didn't work, since it does not get replaced in 
DEPENDENT_LOAD sections. This is due to the somewhat convoluted way we build 
these. Instead, one pair of {} will be stripped by the .format() call and the 
result will be passed to Hive, which (I think) will then expand the variable.

I added the replacement. Should we replace all other occurrences in this file 
in this commit?


http://gerrit.cloudera.org:8080/#/c/6817/3/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

PS3, Line 288: replace 
> "replaced"
Done


Line 289: values in queries before they are executed.
> Mention that callers need to be careful about key name choices. For example
Done


http://gerrit.cloudera.org:8080/#/c/6817/3/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS3, Line 685: cls.ImpalaTestMatrix.add_constraint(lambda v:\
 : v.get_value('table_format').file_format == 'text' and\
> The backslashes are not needed.
Fixed here and elsewhere.


-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Lars Volker (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6817

to look at the new patch set (#4).

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..

IMPALA-5287: Test skip.header.line.count on gzip

This change fixed IMPALA-4873 by adding the capability to supply a dict
'test_file_vars' to run_test_case(). Keys in this dict will be replaced
with their values inside test queries before they are executed.

Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
---
M testdata/bin/generate-schema-statements.py
M testdata/data/README
A testdata/data/table_with_header.gz
A testdata/data/table_with_header_2.gz
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
9 files changed, 75 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/6817/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-08 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 9:

(45 comments)

first half

http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 285:   boost::scoped_ptr expr_mem_tracker_;
> There will be one per PlanNode in the future. Currently, it still counts to
if this is per querystate, the mempool would need to be thread-safe, not sure 
that's worth it (and if this is for evaluators, why have it per plannode?).

also, since we now clearly distinguish exprs and their evaluators, and this is 
specifically for evaluators, let's express that in the variable name. 
expr_eval_mem_pool_?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.cc
File be/src/exprs/agg-fn-evaluator.cc:

Line 96: const AggFn& agg_fn, AggFnEvaluator** result) {
move input params to front


Line 161: Status AggFnEvaluator::Open(vector& evaluators, 
RuntimeState* state) {
why not const&?


Line 514:   for (int i = 0; i < evaluators.size(); ++i) {
single line, here and subsequent functions, where possible


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 80:   /// and caches all constant input arguments.
regarding the caching: is there already a todo to move that into the expr setup 
(as opposed to evaluator setup)?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn.cc
File be/src/exprs/agg-fn.cc:

Line 68: input_expr->AssignFnCtxIdx(_ctx_idx);
scalarexpr does that in create. do the same for this class?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 87:   const int num_children = nodes[*node_idx].num_children;
we don't use 'const ..' elsewhere for automatic variables.


Line 88:   const bool isAggFn = root->IsAggFn();
formatting


Line 92: // Reset the function context index for each input expression.
what is this referring to?


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 81:   /// status on failure.
> 'fn_ctx_idx_ptr' removed. It doesn't take Expr** as 'root' is expected to b
should this be a private function and the two subclasses be friends (to call 
this)?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 82:   static Status CreateTree(ObjectPool* pool, const TExpr& texpr, Expr* 
root);
this is really CreateChildren, not CreateTree.

move in/out params to end


Line 134:   /// Creates an expr tree with root 'parent' via depth-first 
traversal.
misleading: the root of the created tree is not 'parent' (the root is installed 
as a new child into 'parent')


Line 135:   /// Called recursively to create expr trees of sub-expressions.
"for subexpressions"


Line 140:   ///   node_idx: index in 'nodes' of the root of the next child 
TExprNode tree
let's call this root_idx, makes it easier to follow


Line 146:   static Status CreateTreeFromThrift(ObjectPool* pool,
while we're cleaning up, move in/out params to end


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

Line 1: // Licensed to the Apache Software Foundation (ASF) under one
could you flag what's new in this file that i should look at?


Line 69: inited_(false),
initialize vars in .h where practical


Line 82:   (*evaluator)->fn_ctxs_.clear();
why do you need that (for a newly created object)?


Line 86: for (FunctionContext* fn_ctx : (*evaluator)->fn_ctxs_) 
DCHECK(fn_ctx != nullptr);
do that in CreateFnCtxs?


Line 133:   if (opened_) return Status::OK();
what's the need for guarding against multiple open calls? (how would that 
happen?)


Line 138:   is_clone_? FunctionContext::THREAD_LOCAL : 
FunctionContext::FRAGMENT_LOCAL;
is this really instance_local? if so, leave todo in the right places 
(presumably at least udf.h)


Line 155: delete fn_ctxs_[i];
instead put a unique_ptr into fn_ctxs?


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

Line 72:   std::vector* evaluators) 
WARN_UNUSED_RESULT;
> Yes, ideally, that should be done in Expr::Init(). This is being tracked by
todo?


Line 111:   /// should only be called after Open() has been called on this 
expr. Returns an error if
> There is subtlety in this. "expr" is potentially a sub-expression of root_ 
but why not just call GetValue(nullptr) on the subexpr?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

Line 55: /// rooted at a node is defind by [fn_ctx_idx_start_, fn_ctx_idx_end_).
defined


Line 65:   static Status Create(ObjectPool* pool, RuntimeState* state, MemPool* 
mem_pool,

[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5715/7/be/src/kudu/util/compression/compression_codec.cc
File be/src/kudu/util/compression/compression_codec.cc:

Line 28: #ifdef DISALLOW_COPY_AND_ASSIGN
Maybe comment that snappy.h redefines this so that we don't lose track of why 
this is here.


http://gerrit.cloudera.org:8080/#/c/5715/7/be/src/kudu/util/status.h
File be/src/kudu/util/status.h:

Line 122: #ifndef KUDU_LOG
Is this still needed? I don't see how this can get defined twice.


-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-05-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 1:

(36 comments)

http://gerrit.cloudera.org:8080/#/c/6812/1//COMMIT_MSG
Commit Message:

PS1, Line 10: statistic
> I would prefer to use a different word than "statistic" in this context. We
Suggestions? How important do you think it is to distinguish the Parquet 
num_rows from parquet::Statistics? In the not-too-distant future we will use 
parquet::Statistics for optimizing min/max aggregates also. Seems fair enough 
to call all these "Parquet stats".


Line 13: 
Mention the new rewrite rule


http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 426: // Populate the slot with Parquet num rows statistic.
Populate the single slot with the Parquet num rows statistic.


Line 431: memset(tuple_buf, 0, tuple_buf_size);
Don't think we usually do this, and I don't think it's necessary.


Line 432: while (!row_batch->AtCapacity()) {
Do we have a suitable file with many row groups for testing this logic? We can 
decrease the batch_size in the test.


Line 488: DCHECK(row_group_idx_ <= file_metadata_.row_groups.size());
Why this change?


Line 1455: // Column readers are not needed because we are not reading from 
any columns if this
DCHECK that there is exactly one materialized slot


http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 158:   const bool optimize_parquet_count_star() { return 
optimize_parquet_count_star_; }
const function


Line 373:   bool optimize_parquet_count_star_;
const


http://gerrit.cloudera.org:8080/#/c/6812/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 585: if (copiedInputExprs.size() != inputExprs.size()) return;
Have you tried also substituting the FunctionCallExpr.mergeAggInputFn_ in 
AggregateInfo.substitute() so the Update() and Merge() are consistent again?

I understand what the issue is, but this does not seem quite right.


http://gerrit.cloudera.org:8080/#/c/6812/1/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

Line 331:* Return true if the slots being materialized are all partition 
columns.
The new behavior is tricky to reason about, can we simplify it?


Line 335: if (materializedSlots.isEmpty()) return true;
I think this might break the behavior of OPTIMIZE_PARTITION_KEY_SCANS=true for 
unpartitioned tables.

What happens when you SET OPTIMIZE_PARTITION_KEY_SCANS=true and then run 
"select count(distinct 1) from unpartitioned_table"?


http://gerrit.cloudera.org:8080/#/c/6812/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 131:   // Set to true when this scan node can optimize a count(*) query by 
populating the
Sentence is a little misleading because this flag is initialized to true if the 
query block has an aggregation that is amenable to being optimized with Parquet 
stats. But then the flag is changed to have a different meaning, so it's a 
little difficult to understand what it does.

How about something along the lines of this:
1. Pass in a 'private final' flag that indicates whether the aggregation is 
amenable to optimization. This flag never changes.
2. If the optimization can be performed, set the smap. Use the smap nullness to 
determine whether the optimization was applied.


Line 136:   protected ExprSubstitutionMap aggSmap_;
// Should be applied to the AggregateInfo from the same query block.

Expand the comment to explain why we cannot use PlanNode.outputSmap_ for this 
purpose.

Suggest: optimizedAggSmap_


Line 243:   if (optimizeParquetCountStar_ && fileFormats.size() == 1 && 
conjuncts_.isEmpty() &&
Create a helper function for this optimization, and describe in its function 
comment what this optimization does and how it works. In particular, that we 
are adding a new scan slot.


Line 245: // Create two functions that we will put into an smap. We 
want to replace the
Preconditions.checkState(desc_.getPath().destTable() == null);
Preconditions.checkState(collectionConjuncts_.isEmpty());

The first condition asserts that we are not scanning a nested collection.


Line 246: // count function with a sum function.
count() and sum()

mention that it is a special sum() function


http://gerrit.cloudera.org:8080/#/c/6812/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

Line 213:* @param limit
can just remove this


http://gerrit.cloudera.org:8080/#/c/6812/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File 

[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message

2017-05-08 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#3).

Change subject: IMPALA-5197: Erroneous corrupted Parquet file message
..

IMPALA-5197: Erroneous corrupted Parquet file message

The Parquet file column reader may fail in the middle
of producing a scratch tuple batch for various reasons
such as exceeding memory limit or cancellation. In which
case, the scratch tuple batch may not have materialized
all the rows in a row group. We shouldn't erroneously
report that the file is corrupted in this case as the
column reader didn't completely read the entire row group.

A new test case is added to verify that we won't see this
error message. A new failpoint phase GETNEXT_SCANNER is
also added to differentiate it from the GETNEXT in the
scan node itself.

Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M common/thrift/PlanNodes.thrift
M tests/common/impala_service.py
M tests/failure/test_failpoints.py
9 files changed, 40 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/6787/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Wood 


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6817/1/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS1, Line 688: 
> Cool idea. I did this.
Can you add this JIRA to the commit message? :-)

https://issues.apache.org/jira/browse/IMPALA-4873


-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6817/3/testdata/data/README
File testdata/data/README:

PS3, Line 100: ,,
double comma


http://gerrit.cloudera.org:8080/#/c/6817/3/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

PS3, Line 2092: '${{env:IMPALA_HOME}}
How does this differ from {impala_home} ?


http://gerrit.cloudera.org:8080/#/c/6817/3/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

PS3, Line 288: replace 
"replaced"


Line 289: values in queries before they are executed.
Mention that callers need to be careful about key name choices. For example, 
'$FILESYSTEM_PREFIX' is a bad choice for a key when using this dict.


http://gerrit.cloudera.org:8080/#/c/6817/3/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS3, Line 685: cls.ImpalaTestMatrix.add_constraint(lambda v:\
 : v.get_value('table_format').file_format == 'text' and\
The backslashes are not needed.


-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Lars Volker (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6817

to look at the new patch set (#3).

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..

IMPALA-5287: Test skip.header.line.count on gzip

This change adds the capability to supply a dict 'test_file_vars' to
run_test_case(). Keys in this dict will be replaced with their values
inside test queries before they are executed.

Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
---
M testdata/data/README
A testdata/data/table_with_header.gz
A testdata/data/table_with_header_2.gz
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
8 files changed, 60 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/6817/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 2:

(1 comment)

Thanks for the review comments and the good suggestions. Please see PS 3.

http://gerrit.cloudera.org:8080/#/c/6817/1/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS1, Line 688: 
> I didn't realize there was magic that changed the default database dependin
Cool idea. I did this.


-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5169: Add support for async pins in buffer pool

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5169: Add support for async pins in buffer pool
..


Patch Set 7:

Rebased

-- 
To view, visit http://gerrit.cloudera.org:8080/6612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5169: Add support for async pins in buffer pool

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#7).

Change subject: IMPALA-5169: Add support for async pins in buffer pool
..

IMPALA-5169: Add support for async pins in buffer pool

Makes Pin() do async reads behind-the-scenes, instead of
blocking until the read completes. The blocking is done
instead when the client tries to access the buffer via
PageHandle::GetBuffer() or ExtractBuffer().

This is implemented with a new sub-state of "pinned"
where the page has a buffer and consumes reservation
but the buffer does not contain valid data.

Motivation:
This unlocks various opportunities to overlap read I/Os
with other work:
* Reads to different disks can execute in parallel
* I/O and computation can be overlapped.

This initially benefits BufferedTupleStream::PinStream(),
where many pages are pinned at once. With this change
the reads run asynchronously. This can potentially lead
to large speedups when spilling. E.g. if the pages for a Hash
Join's partition are spread across 10 disks, we could get 10x
the read throughput, plus overlap the I/O with hash table build.

In future we can use this to do read-ahead over unpinned
BufferedTupleStreams or for unpinned Runs in Sorter, but
this requires changes to the client code to Pin() pages
in advance.

Testing:
* BufferedTupleStreamV2 already exercises this.
* Various BufferPool tests already exercise this.
* Added a basic test to cover edge cases made possible by the
  new state transitions.
* Extended the randomised test to cover this.

Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79
---
M be/src/runtime/buffered-tuple-stream-v2.cc
M be/src/runtime/buffered-tuple-stream-v2.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
8 files changed, 418 insertions(+), 153 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6612/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Lars Volker (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6817

to look at the new patch set (#2).

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..

IMPALA-5287: Test skip.header.line.count on gzip

This change adds the capability to supply a dict 'test_file_vars' to
run_test_case(). Keys in this dict will be replaced with their values
inside test queries before they are executed.

Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
---
A testdata/data/table_with_header.gz
A testdata/data/table_with_header_2.gz
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
7 files changed, 46 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/6817/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] Bump Kudu version to 7533364

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to 7533364
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/545/

-- 
To view, visit http://gerrit.cloudera.org:8080/6797
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to 7533364

2017-05-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Bump Kudu version to 7533364
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6797
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6817/1/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS1, Line 688: 
> Good catch. I want to use the default databases for the scans, so that the 
I didn't realize there was magic that changed the default database depending on 
fileformat/encoding/compression/etc. I was going to suggest you preserve the 
unique_database usage and access the 'mixed' table via full path: 
"$DATABASE.mixed". But I just looked at the code, and use_db is overloaded to 
deal with both what you want to use, and what I was going to suggest, and both 
aren't possible together, I think.

What you really need is a way to pass in another database to be used in 
substitution in this test and access it in the test files, say "$DB2.mixed".

Idea:

What do you think about adding a new parameter to run_test_case() that takes in 
a dictionary, where the keys are testfile substitution variables and the values 
are the value to be substituted? This means you don't add a separate parameter 
"db2" to run_test_suite() but instead support arbitrary substitution.

The literal way this would be done would be:

  def test_text_scanner_with_header(self, vector, unique_database):
self.run_test_case(
'QueryTest/hdfs-text-scan-with-header',
vector,
testfile_vars={
  '$DB2': unique_database,
})

run_test_case() would just iterate over the keys and values in testfile_vars, 
and your testfile now can specify any variables it wants for substitution. Your 
default database will still be correct based on file format etc. You refer to 
the mixed table as "$DB2.mixed".


-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5166: clean up BufferPool counters

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-5166: clean up BufferPool counters
..

IMPALA-5166: clean up BufferPool counters

Misc changes to improve usability of the profiles.

* Separate out detailed BufferPool metrics into a "Buffer pool"
  sub-profile.
* Only create the limit counter if there is a limit
* Show BufferPool using in query MemTracker (it was accidentally
  disabled before because there was no query-level profile).
* Reduce clutter in MemTracker dump by only showing buffer pool
  reservation, not usage (the usage was misleading anyway because
  it didn't include child usage).
* Remove TotalUnpinnedBytes, which had limited value - WriteIoBytes
  and PeakUnpinnedBytes can answer most of the same questions - i.e.
  did it unpin any pages, and how many did it need to write to disk.
* Add buffer pool metrics to /memz (if buffer pool is enabled) and
  reorder /memz so more useful information is up the top.

Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c
---
M be/src/runtime/bufferpool/buffer-allocator.cc
M be/src/runtime/bufferpool/buffer-pool-counters.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/mem-tracker.cc
M be/src/util/default-path-handlers.cc
M be/src/util/memory-metrics.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/thrift/metrics.json
M www/memz.tmpl
12 files changed, 137 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/6690/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 13:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/544/

-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-05-08 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 13: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-05-08 Thread Marcel Kornacker (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6535

to look at the new patch set (#13).

Change subject: IMPALA-2550: Switch to per-query exec rpc
..

IMPALA-2550: Switch to per-query exec rpc

Coordinator:
- FragmentInstanceState -> BackendState, which in turn records
  FragmentInstanceStats

QueryState
- does query-wide setup in a separate thread (which also launches
  the instance exec threads)
- has a query-wide 'prepared' state at which point all static setup
  is done and all FragmentInstanceStates are accessible

Also renamed QueryExecState to ClientRequestState.

Simplified handling of execution status (in FragmentInstanceState):
- status only transmitted via ReportExecStatus rpc
- in particular, it's not returned anymore from the Cancel rpc

FIS: Fixed bugs related to partially-prepared state (in Close() and 
ReleaseThreadToken())

Change-Id: I20769e420711737b6b385c744cef4851cee3facd
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/codegen-anyval.h
M be/src/common/status.cc
M be/src/common/status.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exprs/expr-test.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-block-mgr-test.cc
M be/src/runtime/buffered-tuple-stream-v2-test.cc
M be/src/runtime/bufferpool/buffer-pool-test.cc
A be/src/runtime/coordinator-backend-state.cc
A be/src/runtime/coordinator-backend-state.h
A be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-test.cc
A be/src/runtime/debug-options.cc
A be/src/runtime/debug-options.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
D be/src/runtime/plan-fragment-executor.cc
D be/src/runtime/plan-fragment-executor.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/runtime/test-env.h
M be/src/scheduling/query-schedule.cc
M be/src/scheduling/query-schedule.h
M be/src/service/CMakeLists.txt
M be/src/service/child-query.cc
M be/src/service/child-query.h
R be/src/service/client-request-state.cc
R be/src/service/client-request-state.h
M be/src/service/fe-support.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/desc-tbl-builder.cc
M be/src/testutil/fault-injection-util.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/uid-util.h
M common/thrift/ExecStats.thrift
M common/thrift/ImpalaInternalService.thrift
M tests/common/test_result_verifier.py
73 files changed, 3,873 insertions(+), 3,775 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/6535/13
-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6817/1/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS1, Line 688: 
> Can you talk about why you removed unique_database? When I look at QueryTes
Good catch. I want to use the default databases for the scans, so that the test 
runs with functional and functional_text_gzip. Do you see a better option than 
splitting it up into two tests, one for the scans and one containing the DDLs?


-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup

2017-05-08 Thread Tim Armstrong (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6626

to look at the new patch set (#9).

Change subject: IMPALA-5220: memory maintenance cleanup
..

IMPALA-5220: memory maintenance cleanup

Remove logic that tries to release pages from TcMalloc's page heap:
TCMalloc's behaviour changed so that it automatically does this with
"aggressive decommit" and committed spans can't accumulate in the page
heap.

Also increase the memory maintenance interval - 1s is too aggressive and
can free memory that will be imminently reused by a running query, e.g.
particularly buffer pool buffers.

Testing:
Tried to reproduce IMPALA-2800 in a couple of ways:
* Ran a big aggregation locally and cancelled it
* Looked at memz/ of some live clusters (production and stress test).
In all cases "Bytes in page heap freelist" was 0.

This confirms that IMPALA-2800 was already solved, probably
by the gperftools 2.5 upgrade, where aggressive decommit
would mean that memory is released to the system in
free() instead of the ReleaseFreeMemory() callst.

I was able to confirm that the ReleaseFreeMemory() calls were unnecessary
to avoid retaining memory by running a couple of stress tests
locally with this patch and checking that "Bytes in page
heap freelist" was 0 after the change and that memory consumption
was generally sensible.

Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
---
M be/src/common/init.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/util/memory-metrics.h
5 files changed, 25 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6626/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup

2017-05-08 Thread Tim Armstrong (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6626

to look at the new patch set (#8).

Change subject: IMPALA-5220: memory maintenance cleanup
..

IMPALA-5220: memory maintenance cleanup

Remove logic that tries to release pages from TcMalloc's page heap:
TCMalloc's behaviour changed so that it automatically does this with
"aggressive decommit" and committed spans can't accumulate in the page
heap.

Also increase the memory maintenance interval - 1s is too aggressive and
can free memory that will be imminently reused by a running query, e.g.
particularly buffer pool buffers.

Testing:
Tried to reproduce IMPALA-2800 in a couple of ways:
* Ran a big aggregation locally and cancelled it
* Looked at memz/ of some live clusters (production and stress test).
In all cases "Bytes in page heap freelist" was 0.

This confirms that IMPALA-2800 was already solved, probably
by the gperftools 2.5 upgrade, where aggressive decommit
would mean that memory is released to the system in
free() instead of the ReleaseFreeMemory() callst.

I was able to confirm that the ReleaseFreeMemory() calls were unnecessary
to avoid retaining memory by running a couple of stress tests
locally with this patch and checking that "Bytes in page
heap freelist" was 0 after the change and that memory consumption
was generally sensible.

Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
---
M be/src/common/init.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/util/memory-metrics.h
5 files changed, 25 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6626/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5220: memory maintenance cleanup
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6626/7/be/src/common/init.cc
File be/src/common/init.cc:

PS7, Line 137: whenever
 : // consumption is checked
> is that true?  consumption() doesn't refresh it -- I think only mem-tracker
You're right. Updated the comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/6626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6817/1/testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test
File 
testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test:

PS1, Line 115: drop table if exists mixed;
DDL


http://gerrit.cloudera.org:8080/#/c/6817/1/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS1, Line 688: 
Can you talk about why you removed unique_database? When I look at 
QueryTest/hdfs-text-scan-with-header I see DDL statements which could cause 
collisions.

There are 4 versions of this test in exhaustive:


  

  
  
  
  


-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 20: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5207,IMPALA-5214: distcc fixes

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5207,IMPALA-5214: distcc fixes
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6655/3/bin/distcc/distcc_env.sh
File bin/distcc/distcc_env.sh:

PS3, Line 101: 
> why is this not necessary?
switch_compiler does it already


-- 
To view, visit http://gerrit.cloudera.org:8080/6655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6456d0101cd15287c543cb576be6cd2391f1f26
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5207,IMPALA-5214: distcc fixes

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5207,IMPALA-5214: distcc fixes
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/543/

-- 
To view, visit http://gerrit.cloudera.org:8080/6655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6456d0101cd15287c543cb576be6cd2391f1f26
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test

2017-05-08 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5245: fix ASAN buffer-allocator-test
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5245: fix ASAN buffer-allocator-test
..


Patch Set 2:

I ran buffer-allocator-test locally with an ASAN build. I was able to reproduce 
the problem before the fix and confirmed it went away after the fix.

-- 
To view, visit http://gerrit.cloudera.org:8080/6819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 1:

Update testdata/data/README to describe those two new files?

-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip

2017-05-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5287: Test skip.header.line.count on gzip
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5207,IMPALA-5214: distcc fixes

2017-05-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5207,IMPALA-5214: distcc fixes
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6655/3/bin/distcc/distcc_env.sh
File bin/distcc/distcc_env.sh:

PS3, Line 101: 
why is this not necessary?


-- 
To view, visit http://gerrit.cloudera.org:8080/6655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6456d0101cd15287c543cb576be6cd2391f1f26
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test

2017-05-08 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5245: fix ASAN buffer-allocator-test
..


Patch Set 2:

Looks good. One last question - how did you test?

-- 
To view, visit http://gerrit.cloudera.org:8080/6819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5245: fix ASAN buffer-allocator-test
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6819/1/be/src/runtime/bufferpool/system-allocator.cc
File be/src/runtime/bufferpool/system-allocator.cc:

Line 124:   // Workaround ASAN bug where posix_memalign returns 0 even when 
allocation fails.
> There are a couple of other places where posix_memalign is called, too. Can
I think we can avoid adding the workaround there since OOM in those places is 
unexpected (and ASAN is not a production config, so it's fine to fail hard). I 
added DCHECKs instead.


-- 
To view, visit http://gerrit.cloudera.org:8080/6819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-5245: fix ASAN buffer-allocator-test
..

IMPALA-5245: fix ASAN buffer-allocator-test

* Use the allocator_may_return_null=1 ASAN option so that the allocation
  just fails instead of crashing the process.
* Work around an bug in the implementation of the option where
  posix_memalign() does not return ENOMEM. I filed an upstream bug
  https://bugs.llvm.org/show_bug.cgi?id=32968.

Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c
---
M be/src/runtime/bufferpool/buffer-allocator-test.cc
M be/src/runtime/bufferpool/system-allocator.cc
M be/src/testutil/mem-util.h
M be/src/util/aligned-new.h
M be/src/util/bloom-filter.cc
M bin/run-backend-tests.sh
M bin/start-catalogd.sh
M bin/start-impalad.sh
M bin/start-statestored.sh
9 files changed, 12 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/6819/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types

2017-05-08 Thread Lars Volker (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6563

to look at the new patch set (#10).

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet 
Statistics for remaining types
..

IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for 
remaining types

This change adds functionality to write and read parquet::Statistics for
Decimal, String, and Timestamp values. As an exception, we don't read
statistics for CHAR columns, since CHAR support is broken in Impala
(IMPALA-1652).

This change also switches from using the deprecated fields 'min' and
'max' to populate the new fields 'min_value' and 'max_value' in
parquet::Statistics, that were added in parquet-format pull request #46.

The HdfsParquetScanner will preferably read the new fields if they are
populated and if the column order 'TypeDefinedOrder' has been used to
compute the statistics. For columns without a column order set or with
only the deprecated fields populated, the scanner will read them only if
they are of simple numeric type, i.e. boolean, integer, or floating
point.

This change removes the validation of the Parquet Statistics we write to
Hive from the tests, since Hive does not write the new fields. Instead
it adds a parquet file written by Hive that uses the deprecated fields
for its statistics. It uses that file to exercise the fallback logic for
supported types in a test.

This change also cleans up the interface of ParquetPlainEncoder in
parquet-common.h.

Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exec/parquet-plain-test.cc
M be/src/util/dict-encoding.h
M common/thrift/parquet.thrift
M testdata/data/README
A testdata/data/deprecated_statistics.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test
M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_parquet_stats.py
18 files changed, 974 insertions(+), 348 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5184: build fe against both Hive 1 & 2 APIs

2017-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5184: build fe against both Hive 1 & 2 APIs
..


Patch Set 11:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/541/

-- 
To view, visit http://gerrit.cloudera.org:8080/5538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbc265281c04fe3136bc3c920dbac966742ce09a
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test

2017-05-08 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5245: fix ASAN buffer-allocator-test
..


Patch Set 1:

(1 comment)

Thanks for fixing this!

http://gerrit.cloudera.org:8080/#/c/6819/1/be/src/runtime/bufferpool/system-allocator.cc
File be/src/runtime/bufferpool/system-allocator.cc:

Line 124:   // Workaround ASAN bug where posix_memalign returns 0 even when 
allocation fails.
There are a couple of other places where posix_memalign is called, too. Can you 
fix those, too?


-- 
To view, visit http://gerrit.cloudera.org:8080/6819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types

2017-05-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet 
Statistics for remaining types
..


Patch Set 8:

(2 comments)

Thanks for the review. I addressed your comments in PS9. Next, I'll rebase the 
change.

http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/parquet-column-stats.cc
File be/src/exec/parquet-column-stats.cc:

Line 104: case TYPE_DECIMAL:
> doesn't look like it does, but if i'm wrong, that would need to be disabled
The stats Update() method receives the same values that the ParquetPlainEncoder 
writes to the file as data. The len in those values is set to 
UnpaddedCharLength(). test_write_statistics_char_types() also ensures that they 
are written unpadded.


http://gerrit.cloudera.org:8080/#/c/6563/8/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 99:   // Copies the memory of 'value' into 'buffer' and make 'value' point 
to 'buffer'.
> i didn't realize that this resets 'buffer' ('copies into' can also mean app
I expanded the comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types

2017-05-08 Thread Lars Volker (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6563

to look at the new patch set (#9).

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet 
Statistics for remaining types
..

IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for 
remaining types

This change adds functionality to write and read parquet::Statistics for
Decimal, String, and Timestamp values. As an exception, we don't read
statistics for CHAR columns, since CHAR support is broken in Impala
(IMPALA-1652).

This change also switches from using the deprecated fields 'min' and
'max' to populate the new fields 'min_value' and 'max_value' in
parquet::Statistics, that were added in parquet-format pull request #46.

The HdfsParquetScanner will preferably read the new fields if they are
populated and if the column order 'TypeDefinedOrder' has been used to
compute the statistics. For columns without a column order set or with
only the deprecated fields populated, the scanner will read them only if
they are of simple numeric type, i.e. boolean, integer, or floating
point.

This change removes the validation of the Parquet Statistics we write to
Hive from the tests, since Hive does not write the new fields. Instead
it adds a parquet file written by Hive that uses the deprecated fields
for its statistics. It uses that file to exercise the fallback logic for
supported types in a test.

This change also cleans up the interface of ParquetPlainEncoder in
parquet-common.h.

Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exec/parquet-plain-test.cc
M be/src/util/dict-encoding.h
M common/thrift/parquet.thrift
M testdata/data/README
A testdata/data/deprecated_statistics.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test
M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_parquet_stats.py
18 files changed, 911 insertions(+), 285 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-08 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
..


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

Line 1309:   string key = string(fname) + ":" + std::to_string(mtime);
you can get rid of dealing with strings here (and doing a bunch of allocations) 
entirely by using something out of util/hash-util.h to compute the hash on 
fname and mtime separately (no need to convert it to a string).


Line 1315:   // Get lock
that's obvious, no need to paraphrase a single statement (unless there's 
something subtle going on).


Line 1318:   // Look up in the multimap (find range of values for key)
same here


Line 1326: if (!elem->reference_count_) {
use explicit numeric comparisons, this isn't a bool (although maybe it should 
be).


Line 1331:   ImpaladMetrics::IO_MGR_NUM_CACHED_FILE_HANDLES->Increment(-1L);
hm, i'm not sure we should do that. it's still a cached file handle.


Line 1395: if (p.size_ > p.capacity_) {
single line


http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

Line 273:: fh_(fh), reference_count_(0) {}
single line


Line 276:   int reference_count_;
why refcount instead of just 'bool in_use' or something like that?


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


  1   2   >