[Impala-ASF-CR] IMPALA-5990: Part 1: JNI-based LZ4 de/compression

2017-09-27 Thread Alex Behm (Code Review)
Alex Behm has abandoned this change. ( http://gerrit.cloudera.org:8080/8150 )

Change subject: IMPALA-5990: Part 1: JNI-based LZ4 de/compression
..


Abandoned

Will rework to simplify.
--
To view, visit http://gerrit.cloudera.org:8080/8150
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I237802944875b07080db0159ff8ec548150fd95e
Gerrit-Change-Number: 8150
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-5990: Part 1: JNI-based LZ4 de/compression

2017-09-27 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8150 )

Change subject: IMPALA-5990: Part 1: JNI-based LZ4 de/compression
..


Patch Set 1:

I had looked at that library but concluded we would not want to pull it in 
as-is because it includes a lot of stuff we don't need and also the LZ4 code - 
but we want to use the LZ4 code from our toolchain. Reimplementing the JNI 
wrappers we would need seems so simple that it's not worth the trouble pulling 
the library and integrating it into our build/toolchain.

However, looking at the library code more carefully, I think we make the 
implementation I posted here much simpler, so let me try and rework that.


--
To view, visit http://gerrit.cloudera.org:8080/8150
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I237802944875b07080db0159ff8ec548150fd95e
Gerrit-Change-Number: 8150
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 28 Sep 2017 05:36:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-09-27 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 6:

> > I totally agree about doing anything fancy as a followup.  In
 > > particular, I did not find anything at all useful in AVX to help.
 > > There is no vectored multiply large enough for this, and FMA
 > > operations won't help either as they basically only help with
 > > precision loss on floating point types.
 > >
 > > Considering the perf difference is so extreme in some cases, I
 > > think we should strongly consider either living with the broken
 > > behavior until we can come up with a solution, or else verifying
 > > with users making use of the DECIMAL_V2 feature that this will
 > not
 > > be a problem.
 >
 > AVX2 has 32x8 times 32x8 multiplication. I have elsewhere simulated
 > 64-bit multiplication with this, combines with additions and
 > shifts, and received a speedup on 64x4 times 64x4 operations.

Oh, and AVX512 has vpmullq for multiplying vectors of 64-bit values.


--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 6
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 28 Sep 2017 01:56:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-09-27 Thread Tim Wood (Code Review)
Tim Wood has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8102 )

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..


Patch Set 9:

Pre-review-test run of EE tests using IMPALA-5986 fix pass with blue.


--
To view, visit http://gerrit.cloudera.org:8080/8102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66
Gerrit-Change-Number: 8102
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Wed, 27 Sep 2017 21:05:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-09-27 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 6:

> I totally agree about doing anything fancy as a followup.  In
 > particular, I did not find anything at all useful in AVX to help.
 > There is no vectored multiply large enough for this, and FMA
 > operations won't help either as they basically only help with
 > precision loss on floating point types.
 >
 > Considering the perf difference is so extreme in some cases, I
 > think we should strongly consider either living with the broken
 > behavior until we can come up with a solution, or else verifying
 > with users making use of the DECIMAL_V2 feature that this will not
 > be a problem.

AVX2 has 32x8 times 32x8 multiplication. I have elsewhere simulated 64-bit 
multiplication with this, combines with additions and shifts, and received a 
speedup on 64x4 times 64x4 operations.


--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 6
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 27 Sep 2017 20:31:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] Move tests related to the old join node.

2017-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8153 )

Change subject: Move tests related to the old join node.
..


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1271/


--
To view, visit http://gerrit.cloudera.org:8080/8153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
Gerrit-Change-Number: 8153
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 27 Sep 2017 19:20:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort

2017-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8123 )

Change subject: IMPALA-5870: Improve runtime profile for partial sort
..

IMPALA-5870: Improve runtime profile for partial sort

A recent change (IMPALA-5498) added the ability to do partial sorts,
which divide their input up into runs each of which is sorted
individually, avoiding the need to spill. Some of the debug output
wasn't updated vs. regular sorts, leading to confusion.

This patch removes the counters 'SpilledRuns' and 'MergesPerformed'
since they will always be 0, and it renames the 'IntialRunsCreated'
counter to 'RunsCreated' since the 'Initial' refers to the fact that
in a regular sort those runs may be spilled or merged.

It also adds a profile info string 'SortType' that can take the values
'Total', 'TopN', or 'Partial' to reflect the type of exec node being
used.

Example profile snippet for a partial sort:
SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%)
 SortType: Partial
 ExecOption: Codegen Enabled
- NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of 
samples: 1)
- InMemorySortTime: 34.201us
- PeakMemoryUsage: 2.02 MB (2117632)
- RowsReturned: 44 (44)
- RowsReturnedRate: 109.11 K/sec
- RunsCreated: 1 (1)
- SortDataSize: 572.00 B (572)

Testing:
- Manually ran several sorting queries and inspected their profiles
- Updated a kudu_insert test that relied on the 'SpilledRuns' counter
  to be 0 for a partial sort.

Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Reviewed-on: http://gerrit.cloudera.org:8080/8123
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exec/partial-sort-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/runtime/sorter.cc
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
5 files changed, 11 insertions(+), 4 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort

2017-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8123 )

Change subject: IMPALA-5870: Improve runtime profile for partial sort
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 27 Sep 2017 18:55:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-09-27 Thread Tim Wood (Code Review)
Tim Wood has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8102 )

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..


Patch Set 9:

I've started a run of the full EE suite in pre_review_test to verify framework 
change does not affect other tests.


--
To view, visit http://gerrit.cloudera.org:8080/8102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66
Gerrit-Change-Number: 8102
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Wed, 27 Sep 2017 17:53:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-09-27 Thread Tim Wood (Code Review)
Tim Wood has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8102 )

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..


Patch Set 9:

> Uploaded patch set 9.

Pre-review build https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/344/ 
shows no flappers!


--
To view, visit http://gerrit.cloudera.org:8080/8102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66
Gerrit-Change-Number: 8102
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Wed, 27 Sep 2017 17:47:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-09-27 Thread Tim Wood (Code Review)
Hello Matthew Mulder, Michael Brown, David Knupp, Alex Behm, Mostafa Mokhtar, 
Michael Ho,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8102

to look at the new patch set (#9).

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..

IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

Main source for TPCDS query and result definitions: 
https://github.com/gregrahn/tpcds-kit.
TPC-DS v2.5.0 qualification queries from G. Rahn, Cloudera, Inc.
Data set constructed in mini-cluster using $IMPALA_HOME/buildall.sh 
-testdata
This commit continues previous work on IMPALA-5376 in the ASF Impala repo
and the Cloudera Gerrit service.

This commit splits multi-query tests in the TPC-DS suite definition into one
query and result set per test file, as the test framework requires.  Names for
such files have -1, -2... inner suffixes.

The complete TPC-DS test suite runs with passes, skips and xfails,
but no failures, as reflected by runs of
$IMPALA_HOME/tests/run-tests.py query_test/test_tpcds_queries.py ...
Expected result sets come from the TPC-DS kit.  Some TPC-DS test cases
in this commit have been modified in sematically-neutral ways so as to pass
on Impala; others are marked to skip or xfail due to bugs.  Tests that flap
are marked to skip, with a bug ID, since they don't reliably pass or xfail.
The tests/query_test/test_tpcds_queries.py driver file is authoritative for the
active/skip/xfail status for each case and a brief reason.  The following list
describes the current status as:
--- test-name
deviance from TPC-DS spec
changes made

--- tpcds-q22a.test
RESULT MISMATCH in LSD of AVG() values
Fixed AVG()s
--- tpcds-q30.test
UNRECOGNIZED CHARACTER
MARKED XFAIL, IMPALA-5961.
--- tpcds-q31.test
RESULT MISMATCH in LSD of DECIMAL values
ADDED TRUNCATE(2)s AROUND LAST 4 COLUMNS. MARKED SKIP, IMPALA-5956
--- tpcds-q35a.test
RESULT MISMATCH
MARKED XFAIL, IMPALA-5950.
--- tpcds-q36a.test
RESULT MISMATCH
MARKED XFAIL, IMPALA-4741
--- tpcds-q39.test
MULTIPLE RESULT SET not recognized by test framework
MARKED XFAIL.
--- tpcds-q47.test
RESULT MISMATCH in LSD of DECIMAL values
ADDED TRUNCATE(2) TO 8th COLUMN OF WITH TABLE, TAKE ACTUAL RESULT AS EXPECTED.
--- tpcds-q49.test
RESULT MISMATCH in LSD of DECIMAL values
MARKED XFAIL, IMPALA-5945
--- tpcds-q57.test
RESULT MISMATCH, excess scale in DECIMAL values
FIXED, ADDED TRUNCATE(2) AROUND 6th COLUMN.
--- tpcds-q58.test
RESULT MISMATCH in DECIMAL values
MARKED XFAIL. IMPALA-5946
--- tpcds-q59.test
RESULT MISMATCH, excess scale in DECIMAL values
FIXED, ADDED TRUNCATE(2) AROUND 4th-10th COLUMNS.
--- tpcds-q61.test
RESULT MISMATCH in DECIMAL value
FIXED. CAST RESULT QUOTIENT TO DECIMAL(15, 4), TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q63.test
RESULT MISMATCH, excess scale in DECIMAL values
ADDED TRUNCATE(2) TO 3rd COLUMN
--- tpcds-q64.test
RESULT MISMATCH
ADDED ORDER BY COLUMNS.
--- tpcds-q66.test
RESULT MISMATCH
MARKED XFAIL, IMPALA-4741
--- tpcds-q77a.test
RESULT MISMATCH
FIXED. TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q78.test
RESULT MISMATCH
FIXED. TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q83.test
RESULT MISMATCH
MARKED XFAIL. IMPALA-5945.
--- tpcds-q85.test
MISSING TABLE "reason"
MARKED XFAIL, IMPALA-5960
--- tpcds-q86a.test
RESULT MISMATCH
FIXED. TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q89.test
RESULT MISMATCH, DECIMAL values flap
MARKED XFAIL. ADDED ROUND(2) TO 8th COLUMN, TAKE ACTUAL RESULTS AS EXPECTED, 
IMPALA-5956.
--- tpcds-q90.test
RESULT MISMATCH
MARKED XFAIL, IMPALA-5945.
--- tpcds-q93.test
MISSING TABLE "reason"
MARKED XFAIL, IMPALA-5960
--- tpcds-q98.test
RESULT MISMATCH
FIXED, ADDED ROUND() TO LAST COLUMN

IMPALA-5986: Allow SET option names to contain digits when resetting them 
between queries.

Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66
---
A testdata/workloads/tpcds/queries/tpcds-q10.test
A testdata/workloads/tpcds/queries/tpcds-q10a.test
A testdata/workloads/tpcds/queries/tpcds-q11.test
A testdata/workloads/tpcds/queries/tpcds-q12.test
A testdata/workloads/tpcds/queries/tpcds-q13.test
A testdata/workloads/tpcds/queries/tpcds-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q14a-1.test
A testdata/workloads/tpcds/queries/tpcds-q14a-2.test
A testdata/workloads/tpcds/queries/tpcds-q15.test
A testdata/workloads/tpcds/queries/tpcds-q16.test
A testdata/workloads/tpcds/queries/tpcds-q17.test
A testdata/workloads/tpcds/queries/tpcds-q18.test
A testdata/workloads/tpcds/queries/tpcds-q18a.test
A testdata/workloads/tpcds/queries/tpcds-q20.test
A testdata/workloads/tpcds/queries/tpcds-q21.test
A testdata/workloads/tpcds/queries/tpcds-q22.test
A testdata/workloads/tpcds/queries/tpcds-q22a.test
M testdata/workloads/tpcds/queries/tpcds-q23-1.test
M testdata/workloads/tpcds/queries/tpcds-q23-2.test
A testdata/workloads/tpcds/queries/tpcds-q24-1.test

[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala

2017-09-27 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7954 )

Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala
..


Patch Set 5:

Ping?


--
To view, visit http://gerrit.cloudera.org:8080/7954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff
Gerrit-Change-Number: 7954
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 27 Sep 2017 17:20:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-09-27 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44
PS6, Line 44: There is a significant regression (50% increase in runtime) in
> I'm pretty sure it's the memory copying - making a memory allocation should
Makes sense. Might still be worth doing a quick experiment with MT_DOP to see 
if the regression remains the same (MT_DOP is easier to reason about).

I'm definitely in favor of this change, but uncompressed Parquet with a lot of 
string columns is unfortunately very common, despite our recommendations.



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 27 Sep 2017 16:25:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-09-27 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 6: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc@245
PS5, Line 245:   context_->ReleaseCompletedResources(nullptr, true);
> I can obviously fix it if you feel strongly but it just doesn't seem that i
Ah, I understand.


http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@477
PS5, Line 477:   int64_t size, const char* err_ctx, uint8_t** buffer);
> Done. Also switch to const char* so we don't need to create a std::string f
Thx. ctx somewhat reminds me of the various context objects that we use 
elsewhere, but I don't feel strongly about it.



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 27 Sep 2017 16:20:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5990: Part 1: JNI-based LZ4 de/compression

2017-09-27 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8150 )

Change subject: IMPALA-5990: Part 1: JNI-based LZ4 de/compression
..


Patch Set 1:

It looks like libraries like https://github.com/lz4/lz4-java exist to do this 
sort of thing. What's our thinking on using them instead of implementing our 
own wrappers?


--
To view, visit http://gerrit.cloudera.org:8080/8150
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I237802944875b07080db0159ff8ec548150fd95e
Gerrit-Change-Number: 8150
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 27 Sep 2017 16:20:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-09-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44
PS6, Line 44: There is a significant regression (50% increase in runtime) in
> Just to clarify my point. My hope is that most of the overhead may ultimate
I'm pretty sure it's the memory copying - making a memory allocation should at 
least an order of magnitude cheaper than doing a pass over a data page. Unsure 
if the difference is due to the extra instructions executed or the increase in 
cache pressure from having two copies of the data.

I haven't measured but I'm also not convinced that the Disk IO Mgr's buffer 
caching is necessarily more efficient or scalable than TCMalloc. The IO mgr 
just has a global lock whereas TCMalloc has locks per size class plus batching 
via the thread cache. The buffer pool should be more scalable for large 
allocations than either in any case.

The queries that regressed are close to the worst possible case since they 
don't do any work aside from materialising the strings and evaluating a 
conjunct. Plus the data is already present in the buffer cache.



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 27 Sep 2017 16:09:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-09-27 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44
PS6, Line 44: There is a significant regression (50% increase in runtime) in
> Maybe the main difference is that copying into a mempool will ultimately fr
Just to clarify my point. My hope is that most of the overhead may ultimately 
go away when backing the mem pool by the buffer pool (which I presume can 
recycle buffers?).



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 27 Sep 2017 15:42:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-09-27 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44
PS6, Line 44: There is a significant regression (50% increase in runtime) in
> Do we understand where the regression comes from exactly? Somehow I find it
Maybe the main difference is that copying into a mempool will ultimately free 
the copied memory, but I/O buffers may be recycled and would not take the 
tcmalloc hit from freeing from another thread?



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 27 Sep 2017 15:38:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-09-27 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44
PS6, Line 44: There is a significant regression (50% increase in runtime) in
Do we understand where the regression comes from exactly? Somehow I find it 
hard to believe a 50% overhead just from copying. Is it maybe another tcmalloc 
issue where different threads allocate/free the memory? Have you tried a 
similar experiment with MT_DOP?



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 27 Sep 2017 15:34:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5307: Part 3: remove TODO from RCFile

2017-09-27 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8151 )

Change subject: IMPALA-5307: Part 3: remove TODO from RCFile
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8151
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594bb246cab64e15de750114890881a2ad9f504d
Gerrit-Change-Number: 8151
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Wed, 27 Sep 2017 15:27:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: Part 3: remove TODO from RCFile

2017-09-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8151


Change subject: IMPALA-5307: Part 3: remove TODO from RCFile
..

IMPALA-5307: Part 3: remove TODO from RCFile

Our RCFile implementations already copies out data (it sets
set_contains_tuple_data to false). Remove a TODO that suggests undoing
this. The current implementation is suboptimal but improving RCFile
performance is not a priority.

Change-Id: I594bb246cab64e15de750114890881a2ad9f504d
---
M be/src/exec/hdfs-rcfile-scanner.cc
1 file changed, 0 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/8151/1
--
To view, visit http://gerrit.cloudera.org:8080/8151
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I594bb246cab64e15de750114890881a2ad9f504d
Gerrit-Change-Number: 8151
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5307: Part 2: copy out strings in uncompressed Avro

2017-09-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/8146 )

Change subject: IMPALA-5307: Part 2: copy out strings in uncompressed Avro
..

IMPALA-5307: Part 2: copy out strings in uncompressed Avro

The approach is to re-materialize strings in those tuples that
survive conjunct evaluation and may reference disk I/O buffers
directly. This means that perf should not regress for the
following cases:
* Compressed Avro files.
* Non-string columns.
* Selective scans where the majority of tuples are filtered out.

This approach will also work for the Sequence and Text scanners.

Includes some improvements to Avro codegen to replace more constants to
help win back some performance (with limited success): replaced
InitTuple() with an optimised version and substituted
tuple_byte_size() with a constant.

Removes dead code for handling CHAR(n) - CHAR(n) is now always fixed
length.

Perf:
Did microbenchmarks on uncompressed Avro files, one with all columns
from lineitem and one with only l_comment. Tests were run with:
  set num_scanner_threads=1;

I ran the query 5 times and extracted MaterializeTupleTime from the
profile to measure CPU cost of materialization. Overall string
materialization got significantly slower, mainly because of the
extra memcpy() calls required.

Selecting one string from a table with multiple columns:

  select min(l_comment) from biglineitem_avro
  1.814 -> 2.096

Selecting one string from a table with one column:

  select min(l_comment) from biglineitem_comment; profile;
  1.708 -> 3.7

Selecting one string from a table with one column with predicate:

  select min(l_comment) from biglineitem_comment where length(l_comment) > 
1;
  1.691 -> 1.449

Selecting all columns:

  select min(l_orderkey), min(l_partkey), min(l_suppkey), min(l_linenumber),
min(l_quantity), min(l_extendedprice), min(l_discount), min(l_tax),
min(l_returnflag), min(l_linestatus), min(l_shipdate),
min(l_commitdate), min(l_receiptdate), min(l_shipinstruct),
min(l_shipmode), min(l_comment) from biglineitem_avro; profile;
  2.335 -> 3.711

Selecting an int column (no strings):
  select min(l_linenumber) from biglineitem_avro
  1.806 -> 1.819

Testing:
Ran exhaustive tests.

Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/status.cc
M be/src/common/status.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
A be/src/runtime/tuple-ir.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
19 files changed, 329 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8146/7
--
To view, visit http://gerrit.cloudera.org:8080/8146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233
Gerrit-Change-Number: 8146
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5988: optimise MemPool::TryAllocate()

2017-09-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8145 )

Change subject: IMPALA-5988: optimise MemPool::TryAllocate()
..

IMPALA-5988: optimise MemPool::TryAllocate()

Testing:
Ran core tests.

Perf:
Experiments using this on top of a WIP Avro patch for IMPALA-5307
showed noticable improvements in CPU efficiency - up to 10%

Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/runtime/mem-pool.h
6 files changed, 30 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/8145/2
--
To view, visit http://gerrit.cloudera.org:8080/8145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
Gerrit-Change-Number: 8145
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4252: Move runtime filters to ScanNode

2017-09-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8148 )

Change subject: IMPALA-4252: Move runtime filters to ScanNode
..


Patch Set 1:

(4 comments)

Looks fine, just very minor comments. It looks like there weren't really any 
changes to the code aside from the move and method signature change.

http://gerrit.cloudera.org:8080/#/c/8148/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8148/1//COMMIT_MSG@12
PS1, Line 12:
Was the move totally mechanical? It would be good to mention if there were any 
changes required to the logic that need closer scrutiny.


http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.h
File be/src/exec/scan-node.h:

http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.h@195
PS1, Line 195: time_ms
Comment needs an update - this is no longer an argument.


http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.h@198
PS1, Line 198: RuntimeState* state
Both HdfsScanNodeBase and KuduScanNodeBase have a "RuntimeState* 
runtime_state_" member. Might be cleaner to just move that member to ScanNode 
and avoid the inconsistency of passing RuntimeState* via arguments some of the 
time.


http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.cc@79
PS1, Line 79: // TODO: Move this to Prepare()
Does this comment still make sense? I wish whoever added it had mentioned why 
it wasn't already moved to Prepare().



--
To view, visit http://gerrit.cloudera.org:8080/8148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086
Gerrit-Change-Number: 8148
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 27 Sep 2017 15:19:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Move tests related to the old join node.

2017-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8153 )

Change subject: Move tests related to the old join node.
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1271/


--
To view, visit http://gerrit.cloudera.org:8080/8153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
Gerrit-Change-Number: 8153
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 27 Sep 2017 15:16:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] Move tests related to the old join node.

2017-09-27 Thread Alex Behm (Code Review)
Hello anujphadke, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8153

to look at the new patch set (#2).

Change subject: Move tests related to the old join node.
..

Move tests related to the old join node.

No tests were added/dropped or modified. They are consolidated into
fewer .test files.

Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
---
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M testdata/workloads/functional-query/queries/QueryTest/inline-view.test
D testdata/workloads/functional-query/queries/QueryTest/joins-partitioned.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
D 
testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test
M tests/query_test/test_join_queries.py
M tests/query_test/test_runtime_filters.py
7 files changed, 126 insertions(+), 136 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/8153/2
--
To view, visit http://gerrit.cloudera.org:8080/8153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
Gerrit-Change-Number: 8153
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] Move tests related to the old join node.

2017-09-27 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8153 )

Change subject: Move tests related to the old join node.
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
Gerrit-Change-Number: 8153
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 27 Sep 2017 15:16:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] Move tests related to the old join node.

2017-09-27 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8153 )

Change subject: Move tests related to the old join node.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8153/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8153/1//COMMIT_MSG@8
PS1, Line 8:
> It would be good to mention that the tests were only moved.
Done



--
To view, visit http://gerrit.cloudera.org:8080/8153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
Gerrit-Change-Number: 8153
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 27 Sep 2017 15:15:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5951: test catalogd timeout fails to cause expected exception

2017-09-27 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8154


Change subject: IMPALA-5951: test_catalogd_timeout fails to cause expected 
exception
..

IMPALA-5951: test_catalogd_timeout fails to cause expected exception

test_catalogd_timeout sets a Kudu operation timeout of 1ms and then
performs various Kudu operations which it expects to fail due to a
timeout.

Since the test was written, things have sped up - for example, Impala
used to create a new Kudu client for each operation, but that was
changed in IMPALA-5167, such that the operations now occasionally
complete quickly enough that they don't timeout.

This patch disables all but the first test. The remaining tests can
likely be reactivated once we have a way of invalidating clients
(IMPALA-5685).

Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05
---
M 
testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test
1 file changed, 13 insertions(+), 13 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8154/1
--
To view, visit http://gerrit.cloudera.org:8080/8154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05
Gerrit-Change-Number: 8154
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] Test cleanup related to the old join node.

2017-09-27 Thread anujphadke (Code Review)
anujphadke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8153 )

Change subject: Test cleanup related to the old join node.
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
Gerrit-Change-Number: 8153
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 27 Sep 2017 15:07:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5983: Fix crash in to/from utc timestamp("10:00:00", 'MSK')

2017-09-27 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8139 )

Change subject: IMPALA-5983: Fix crash in to/from_utc_timestamp("10:00:00", 
'MSK')
..


Patch Set 2:

> Thanks for fixing this. Can you add some tests?
I have added a new check an end-to-end tests.


--
To view, visit http://gerrit.cloudera.org:8080/8139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I592389333a16a15a680beed389e2702dfc16fe1d
Gerrit-Change-Number: 8139
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 27 Sep 2017 15:05:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] Test cleanup related to the old join node.

2017-09-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8153 )

Change subject: Test cleanup related to the old join node.
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8153/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8153/1//COMMIT_MSG@8
PS1, Line 8:
It would be good to mention that the tests were only moved.



--
To view, visit http://gerrit.cloudera.org:8080/8153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
Gerrit-Change-Number: 8153
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 27 Sep 2017 15:06:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Test cleanup related to the old join node.

2017-09-27 Thread Alex Behm (Code Review)
Alex Behm has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8153


Change subject: Test cleanup related to the old join node.
..

Test cleanup related to the old join node.

Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
---
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
M testdata/workloads/functional-query/queries/QueryTest/inline-view.test
D testdata/workloads/functional-query/queries/QueryTest/joins-partitioned.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
D 
testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test
M tests/query_test/test_join_queries.py
M tests/query_test/test_runtime_filters.py
7 files changed, 126 insertions(+), 136 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/8153/1
--
To view, visit http://gerrit.cloudera.org:8080/8153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
Gerrit-Change-Number: 8153
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] IMPALA-5983: Fix crash in to/from utc timestamp("10:00:00", 'MSK')

2017-09-27 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8139

to look at the new patch set (#2).

Change subject: IMPALA-5983: Fix crash in to/from_utc_timestamp("10:00:00", 
'MSK')
..

IMPALA-5983: Fix crash in to/from_utc_timestamp("10:00:00", 'MSK')

Moscow timezone is handled differrently than other timezones,
and it was possible to cause unhandled boost exception by
trying to convert "dateless" timestamps like "10:00:00".

These timestamps cannot be handled by timestamp conversion
generally, because daylight saving time logic needs month
and day to work correctly. The condition HasDateOrTime()
incorrectly suggested that these timestamps can be handled,
so I made the condition stricter.

Change-Id: I592389333a16a15a680beed389e2702dfc16fe1d
---
M be/src/exprs/timestamp-functions.cc
M tests/query_test/test_timezones.py
2 files changed, 9 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/8139/2
--
To view, visit http://gerrit.cloudera.org:8080/8139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I592389333a16a15a680beed389e2702dfc16fe1d
Gerrit-Change-Number: 8139
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort

2017-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8123 )

Change subject: IMPALA-5870: Improve runtime profile for partial sort
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1270/


--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 27 Sep 2017 14:55:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.

2017-09-27 Thread Balazs Jeszenszky (Code Review)
Balazs Jeszenszky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8136 )

Change subject: IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8136/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8136/1//COMMIT_MSG@24
PS1, Line 24: Limitations
add: only for HDFS tables


http://gerrit.cloudera.org:8080/#/c/8136/1//COMMIT_MSG@25
PS1, Line 25: --enable_stats_extrapolation=true
Is this really required to be a startup option, wouldn't a query option (SET) 
work? I expect it will be enabled most of the time, and disabling it would be a 
troubleshooting step or workaround for some issues, particularly in planning. 
It would be better not to have to restart to do that.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@67
PS1, Line 67: Stats extrapolation disabled
How would Impala behave if stats are computed with TABLESAMPLE, and then stats 
extrapolation is disabled?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@777
PS1, Line 777: if (totalBytes_ == 0) {
totalBytes_ doesn't get a new value in the above section, why not make this 
check at the beginning, as it was?



--
To view, visit http://gerrit.cloudera.org:8080/8136
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7
Gerrit-Change-Number: 8136
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 27 Sep 2017 09:58:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.

2017-09-27 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8136 )

Change subject: IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.
..


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@55
PS1, Line 55: >)]
I was expecting to see a mention of REPEATABLE here (per change description).


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@58
PS1, Line 58: partition-level row counts are not modified.
does this allow for the state of table vs. partition stats to be inconsistent 
(e.g., combining per-partition stats may not add up to the table level stats)?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@161
PS1, Line 161: .
pls clarify if these should never be valid at the same time.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@184
PS1, Line 184: partition
nit: partitions


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@780
PS1, Line 780: cardinality_
it would be clearer if this is named outputCardinality_ (the comment for this 
method refers to it this way, so there's already some drift).


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@830
PS1, Line 830: getNumRows
used at 4 call sites in this for-loop. assign it to a variable.
will also make things more consistent with the non-partitioned case at L845.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1122
PS1, Line 1122: tbl_.getNumClusteringCols() > 0
this is used in several places-- it would be clearer if it was wrapped in 
something like "isPartitioned()"


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@744
PS1, Line 744: Impala
got confused here since previous line mentions Hive and Impala tables (why 
should the partitions be Impala partitions in all cases?)


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@795
PS1, Line 795: Hive table
given previous comment referencing Hive and Impala tables, I'm unclear about 
what is specific to Hive tables here.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@860
PS1, Line 860: totalFileBytes / sampleFileBytes
is returned result guaranteed to be >= val?



--
To view, visit http://gerrit.cloudera.org:8080/8136
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7
Gerrit-Change-Number: 8136
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 27 Sep 2017 06:54:45 +
Gerrit-HasComments: Yes