[Impala-ASF-CR] IMPALA-5990: Part 1: JNI-based LZ4 de/compression

2017-09-26 Thread Alex Behm (Code Review)
Alex Behm has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8150


Change subject: IMPALA-5990: Part 1: JNI-based LZ4 de/compression
..

IMPALA-5990: Part 1: JNI-based LZ4 de/compression

Adds LZ4 de/compression in FeSupport geared towards
eventually compressing catalog objects end-to-end
in their journey from the catalogd over the statestored
to coordinator impalads.

A follow-on change will use the new LZ4 functions to
do the actual compression.

Testing:
- Added a new unit test

Change-Id: I237802944875b07080db0159ff8ec548150fd95e
---
M be/src/service/fe-support.cc
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/service/FeSupport.java
A fe/src/test/java/org/apache/impala/service/Lz4CompressionTest.java
5 files changed, 424 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8150/1
--
To view, visit http://gerrit.cloudera.org:8080/8150
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I237802944875b07080db0159ff8ec548150fd95e
Gerrit-Change-Number: 8150
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.

2017-09-26 Thread Philip Zeyliger (Code Review)
Hello Alex Behm, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8100

to look at the new patch set (#3).

Change subject: IMPALA-5940: Avoid log spew by using Status::Expected.
..

IMPALA-5940: Avoid log spew by using Status::Expected.

In IMPALA-5926, we fixed a case where closing the session triggered a
stack trace in the logs which impacted performance for short-running
queries. Looking at log files from several active clusters, I identified a few
other cases where we could clean up log spew with the same (trivial)
approach.

The following messages will no longer log:

  "Failed to codegen MaterializeTuple() due to unsupported type: $0"
  "Not implemented for this format."
  "ScalarFnCall Codegen not supported for CHAR"
  "Could not codegen CodegenMaterializeExprs: "
  "Could not codegen TupleRowComparator::Compare(): $0"

These codegen related messages were happening at every execution node, and were
therefore very common.

In addition, the following messages, which were very frequent in the logs, now
will log, but without the back trace:

  "... Client ... timed-out during recv call."
  "Query Id ... not found."

To do this, I changed them from logging implicitly via Status(...)
to logging explicitly.

The snippet I used to identify these was:

  find . -type f -name '*IMPALAD*.gz' | xargs gzcat  | awk '/^I/ { if(x) { 
print x; } x = "" } /status.cc/ { x=" "; } { if(x) { x=x  $0 } }'  | sed -e 
's/0x[0-9a-fx]* //g' | sed -e 's/[0-9a-f]\{16\}:[0-9a-f]*/QUERYID/g' |  tr -s 
'\t' ' ' | tr '[0-9]' 'N' | sort | uniq -c  | sort -n | tee output.txt

I also analyzed some logs using SQL, against a pre-processed logs table:

  with v as (
select regexp_replace(
regexp_replace(
  translate(substr(message, 42), "\n\t", "  "),
  "[a-zA-Z0-9.-]*[.][a-zA-Z0-9-]*:[0-9]*",
  ""),
"@.*$", "@@@...") as m
from logs_table where `class`="status.cc")
  select m, count(*) from v group by 1 order by 2 desc limit 100

Testing:
* Automated tests.
* Manual testing for one of the new back-trace-suppressed paths:

  $ impala-python shell/gen-py/ImpalaService/ImpalaService-remote -h 
localhost:21000 GetRuntimeProfile 'beeswaxd.ttypes.QueryHandle()'
  Traceback (most recent call last):
File "shell/gen-py/ImpalaService/ImpalaService-remote", line 106, in 

  pp.pprint(client.GetRuntimeProfile(eval(args[0]),))
File "/home/philip/src/impala/shell/gen-py/ImpalaService/ImpalaService.py", 
line 161, in GetRuntimeProfile
  return self.recv_GetRuntimeProfile()
File "/home/philip/src/impala/shell/gen-py/ImpalaService/ImpalaService.py", 
line 184, in recv_GetRuntimeProfile
  raise result.error
  beeswaxd.ttypes.BeeswaxException: 
BeeswaxException(handle=QueryHandle(log_context='', id=''), log_context='', 
SQLState='HY000', _message='GetRuntimeProfile error: Query id 0:0 not 
found.\n', errorCode=0)

  $ grep 'Query id' logs/cluster/impalad.INFO | tail -n 1
  I0926 20:29:09.44  6787 impala-server.cc:642] Query id 0:0 not found.

Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/runtime/client-cache.h
M be/src/runtime/tuple.cc
M be/src/service/impala-server.cc
M be/src/util/tuple-row-compare.cc
7 files changed, 23 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8100/3
--
To view, visit http://gerrit.cloudera.org:8080/8100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330
Gerrit-Change-Number: 8100
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients

2017-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8132 )

Change subject: IMPALA-5975: Work around broken beeline clients
..

IMPALA-5975: Work around broken beeline clients

To make statements execute, some clients require always appending
a semi-colon to the end.  The workaround is quite simple.

Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada
Reviewed-on: http://gerrit.cloudera.org:8080/8132
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins
---
M testdata/bin/create-load-data.sh
M testdata/bin/create-table-many-blocks.sh
2 files changed, 4 insertions(+), 4 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada
Gerrit-Change-Number: 8132
Gerrit-PatchSet: 3
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients

2017-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8132 )

Change subject: IMPALA-5975: Work around broken beeline clients
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada
Gerrit-Change-Number: 8132
Gerrit-PatchSet: 2
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 27 Sep 2017 03:27:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.

2017-09-26 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8100 )

Change subject: IMPALA-5940: Avoid log spew by using Status::Expected.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8100/2/be/src/runtime/client-cache.h
File be/src/runtime/client-cache.h:

http://gerrit.cloudera.org:8080/#/c/8100/2/be/src/runtime/client-cache.h@243
PS2, Line 243: return Status::NoTrace(TErrorCode::RPC_RECV_TIMEOUT, 
strings::Substitute(
> Is it necessary to have Status write the message to the log?  is it not suf
Yep, you're right. Much lighter change. I'll update.



--
To view, visit http://gerrit.cloudera.org:8080/8100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330
Gerrit-Change-Number: 8100
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 27 Sep 2017 03:20:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-09-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 5:

(1 comment)

> The Parquet writer does something similar for the Parquet
 > statistics, can any of that code be reused with regards to codegen?

This was discussed in the original design thread, and we felt that there was 
too much parquet specific stuff current in/going to be added soon in the 
parquet statistics code, so we decided to just keep it separate.

http://gerrit.cloudera.org:8080/#/c/7793/2/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/7793/2/common/thrift/ImpalaInternalService.thrift@740
PS2, Line 740:   4: optional i64 queue_timeout_ms;
 :
 :   // Default query options that are applied to requests mapped 
to this pool.
 :
> This could be a reasonable approach, or another idea is:
Done



--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 27 Sep 2017 03:08:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-09-26 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 6:

I totally agree about doing anything fancy as a followup.  In particular, I did 
not find anything at all useful in AVX to help.  There is no vectored multiply 
large enough for this, and FMA operations won't help either as they basically 
only help with precision loss on floating point types.

Considering the perf difference is so extreme in some cases, I think we should 
strongly consider either living with the broken behavior until we can come up 
with a solution, or else verifying with users making use of the DECIMAL_V2 
feature that this will not be a problem.


--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 6
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 27 Sep 2017 03:00:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-09-26 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#5).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter and/or a min-max filter,
depending on if it has HDFS and/or Kudu targets, respectively.

Min-max filters are generated by the PartitionedHashJoinBuilder. For
now, min-max filters are only applied at the KuduScanner, which passes
them into the Kudu client. Because the Kudu client doesn't provide a
way to specify generic filter exprs, min-max filters are only
generated when the target expr is a bare Kudu column ref.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Codegen is used to eliminate branching on the type of the min-max
filter.

Testing:
- Updated planner tests.
- Ran existing runtime filter tests.
- Ran preliminary perf tests to demonstrate that it works. Will update
  with more specific results.
- Still needs more e2e tests.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
37 files changed, 1,464 insertions(+), 146 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7793/5
--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] DRAFT - IMPALA-4252: Min-max runtime filters for Kudu

2017-09-26 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#4).

Change subject: DRAFT - IMPALA-4252: Min-max runtime filters for Kudu
..

DRAFT - IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter and/or a min-max filter,
depending on if it has HDFS and/or Kudu targets, respectively.

Min-max filters are generated by the PartitionedHashJoinBuilder. For
now, min-max filters are only applied at the KuduScanner, which passes
them into the Kudu client. Because the Kudu client doesn't provide a
way to specify generic filter exprs, min-max filters are only
generated when the target expr is a bare Kudu column ref.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Codegen is used to eliminate branching on the type of the min-max
filter.

Testing:
- Updated planner tests.
- Ran existing runtime filter tests.
- Ran preliminary perf tests to demonstrate that it works. Will update
  with more specific results.
- Still needs more e2e tests.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
37 files changed, 1,464 insertions(+), 146 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7793/4
--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-4252: Move runtime filters to ScanNode

2017-09-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8148


Change subject: IMPALA-4252: Move runtime filters to ScanNode
..

IMPALA-4252: Move runtime filters to ScanNode

As a preliminary step towards adding runtime filters for Kudu scans,
this patch moves runtime filter related code from HdfsScanNodeBase to
ScanNode so that it's available to KuduScanNodeBase.

Testing:
- Ran existing runtime filter tests.

Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
4 files changed, 126 insertions(+), 107 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/8148/1
--
To view, visit http://gerrit.cloudera.org:8080/8148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086
Gerrit-Change-Number: 8148
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-09-26 Thread Tim Wood (Code Review)
Tim Wood has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/8140 )

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..

IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

Main source for TPCDS query and result definitions: 
https://github.com/gregrahn/tpcds-kit.
TPC-DS v2.5.0 qualification queries from G. Rahn, Cloudera, Inc.
Data set constructed in mini-cluster using $IMPALA_HOME/buildall.sh 
-testdata
This commit continues previous work on IMPALA-5376 in the ASF Impala repo
and the Cloudera Gerrit service.

This commit splits multi-query tests in the TPC-DS suite definition into one
query and result set per test file, as the test framework requires.  Names for
such files have -1, -2... inner suffixes.

The complete TPC-DS test suite runs with passes, skips and xfails,
but no failures, as reflected by runs of
$IMPALA_HOME/tests/run-tests.py query_test/test_tpcds_queries.py ...
Expected result sets come from the TPC-DS kit.  Some TPC-DS test cases
in this commit have been modified in sematically-neutral ways so as to pass
on Impala; others are marked to skip or xfail due to bugs.  Tests that flap
are marked to skip, with a bug ID, since they don't reliably pass or xfail.
The tests/query_test/test_tpcds_queries.py driver file is authoritative for the
active/skip/xfail status for each case and a brief reason.  The following list
describes the current status as:
--- test-name
deviance from TPC-DS spec
changes made

--- tpcds-q22a.test
RESULT MISMATCH in LSD of AVG() values
Fixed AVG()s
--- tpcds-q30.test
UNRECOGNIZED CHARACTER
MARKED XFAIL, IMPALA-5961.
--- tpcds-q31.test
RESULT MISMATCH in LSD of DECIMAL values
ADDED TRUNCATE(2)s AROUND LAST 4 COLUMNS. MARKED SKIP, IMPALA-5956
--- tpcds-q35a.test
RESULT MISMATCH
MARKED XFAIL, IMPALA-5950.
--- tpcds-q36a.test
RESULT MISMATCH
MARKED XFAIL, IMPALA-4741
--- tpcds-q39.test
MULTIPLE RESULT SET not recognized by test framework
MARKED XFAIL.
--- tpcds-q47.test
RESULT MISMATCH in LSD of DECIMAL values
ADDED TRUNCATE(2) TO 8th COLUMN OF WITH TABLE, TAKE ACTUAL RESULT AS EXPECTED.
--- tpcds-q49.test
RESULT MISMATCH in LSD of DECIMAL values
MARKED XFAIL, IMPALA-5945
--- tpcds-q57.test
RESULT MISMATCH, excess scale in DECIMAL values
FIXED, ADDED TRUNCATE(2) AROUND 6th COLUMN.
--- tpcds-q58.test
RESULT MISMATCH in DECIMAL values
MARKED XFAIL. IMPALA-5946
--- tpcds-q59.test
RESULT MISMATCH, excess scale in DECIMAL values
FIXED, ADDED TRUNCATE(2) AROUND 4th-10th COLUMNS.
--- tpcds-q61.test
RESULT MISMATCH in DECIMAL value
FIXED. CAST RESULT QUOTIENT TO DECIMAL(15, 4), TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q63.test
RESULT MISMATCH, excess scale in DECIMAL values
ADDED TRUNCATE(2) TO 3rd COLUMN
--- tpcds-q64.test
RESULT MISMATCH
ADDED ORDER BY COLUMNS.
--- tpcds-q66.test
RESULT MISMATCH
MARKED XFAIL, IMPALA-4741
--- tpcds-q77a.test
RESULT MISMATCH
FIXED. TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q78.test
RESULT MISMATCH
FIXED. TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q83.test
RESULT MISMATCH
MARKED XFAIL. IMPALA-5945.
--- tpcds-q85.test
MISSING TABLE "reason"
MARKED XFAIL, IMPALA-5960
--- tpcds-q86a.test
RESULT MISMATCH
FIXED. TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q89.test
RESULT MISMATCH, DECIMAL values flap
MARKED XFAIL. ADDED ROUND(2) TO 8th COLUMN, TAKE ACTUAL RESULTS AS EXPECTED, 
IMPALA-5956.
--- tpcds-q90.test
RESULT MISMATCH
MARKED XFAIL, IMPALA-5945.
--- tpcds-q93.test
MISSING TABLE "reason"
MARKED XFAIL, IMPALA-5960
--- tpcds-q98.test
RESULT MISMATCH
FIXED, ADDED ROUND() TO LAST COLUMN

IMPALA-5986: Allow SET option names to contain digits when resetting them 
between queries.

Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb
---
A testdata/workloads/tpcds/queries/tpcds-q10.test
A testdata/workloads/tpcds/queries/tpcds-q10a.test
A testdata/workloads/tpcds/queries/tpcds-q11.test
A testdata/workloads/tpcds/queries/tpcds-q12.test
A testdata/workloads/tpcds/queries/tpcds-q13.test
A testdata/workloads/tpcds/queries/tpcds-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q14a-1.test
A testdata/workloads/tpcds/queries/tpcds-q14a-2.test
A testdata/workloads/tpcds/queries/tpcds-q15.test
A testdata/workloads/tpcds/queries/tpcds-q16.test
A testdata/workloads/tpcds/queries/tpcds-q17.test
A testdata/workloads/tpcds/queries/tpcds-q18.test
A testdata/workloads/tpcds/queries/tpcds-q18a.test
A testdata/workloads/tpcds/queries/tpcds-q20.test
A testdata/workloads/tpcds/queries/tpcds-q21.test
A testdata/workloads/tpcds/queries/tpcds-q22.test
A testdata/workloads/tpcds/queries/tpcds-q22a.test
M testdata/workloads/tpcds/queries/tpcds-q23-1.test
M testdata/workloads/tpcds/queries/tpcds-q23-2.test
A testdata/workloads/tpcds/queries/tpcds-q24-1.test
A testdata/workloads/tpcds/queries/tpcds-q24-2.test
A testdata/workloads/tpcds/queries/tpcds-q25.test
A 

[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-09-26 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 6:

I think we should explore your suggestion to try SEE/AVX in a follow on patch. 
(to optimze both multiplication and division if it's possible)


--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 6
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 27 Sep 2017 01:24:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-09-26 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
a 256 bit integer intermediate value, which we then attempt to scale down
and round.

Performance:

I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost
identical. For TPCH Q1, there was an improvement from 21 seconds to 16
seconds. I did not see any regressions.

The performance improvement is due to the way we check for overflows
after this patch (by counting the leading zeros instead of dividing).
It can be clealy seen in this query:
  select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1))
  before: 7.85s
  after:  2.03s

I noticed performance regressions in the following cases:
- When we need to convert to a 256 bit integer before multiplying,
  which was introduced in this patch. Whether this happens depends on
  the resulting precision and the value of the inputs. In the following
  extreme case, the intermediate value is converted to a 256 bit integer
  every time.

  select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37))
  before: 14.56s (returns null)
  after:  126.17s

- When we need to scale down the intermediate value. In the following
  query the result is decimal(38,6) after the patch, so the
  intermediate needs to be scaled down.

  select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19))
  before: 7.25s
  after:  13.06s

These regressions are possible only when the resulting precision is 38
which is not common in typical workloads.

Note: The actual queries that I ran for the benchmark are not exactly as
  above. I constructed tables with millions of rows with those values. I
  ran the queries with DECIMAL_v2=1 option before and after the patch.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
4 files changed, 289 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/6
--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 6
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5986: Allow SET option names to contain digits when resetting them between queries.

2017-09-26 Thread Tim Wood (Code Review)
Tim Wood has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/8140 )

Change subject: IMPALA-5986: Allow SET option names to contain digits when 
resetting them between queries.
..

IMPALA-5986: Allow SET option names to contain digits when resetting them 
between queries.

Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb
---
M tests/common/impala_test_suite.py
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/8140/3
--
To view, visit http://gerrit.cloudera.org:8080/8140
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb
Gerrit-Change-Number: 8140
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Wood 


[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort

2017-09-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8123 )

Change subject: IMPALA-5870: Improve runtime profile for partial sort
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 27 Sep 2017 00:08:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-09-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc@245
PS5, Line 245:   context_->ReleaseCompletedResources(nullptr, true);
> I fixed it while I was checking the callsites of ReleaseCompletedResources(
I can obviously fix it if you feel strongly but it just doesn't seem that 
important to me.



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 26 Sep 2017 23:50:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-09-26 Thread Tim Armstrong (Code Review)
Hello Lars Volker, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8085

to look at the new patch set (#6).

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..

IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

This change only affects uncompressed plain-encoded Parquet where
RowBatches may directly reference strings stored in the I/O
buffers. The proposed fix is to simply copy the data pages if
needed then use the same logic that we use for decompressed data
pages.

This copy inevitably adds some CPU overhead, but I believe this is
acceptable because:
* We generally recommend using compression, and optimize for that
  case.
* Copying memory is cheaper than decompressing data.
* Scans of uncompressed data are very likely to be I/O bound.

This allows several major simplifications:
* The resource management for compressed and uncompressed
  scans is much more similar.
* We don't need to attach Disk I/O buffers to RowBatches.
* We don't need to deal with attaching I/O buffers in
  ScannerContext.
* Column readers can release each I/O buffer *before* advancing to
  the next one, making it easier to reason about resource
  consumption. E.g. each Parquet column only needs one I/O buffer at
  a time to make progress.

Future changes will apply to Avro, Sequence Files and Text. Once
all scanners are converted, ScannerContext::contains_tuple_data_
will always be false and we can remove some dead code.

Testing
===
Ran core ASAN and exhaustive debug builds.

Perf

No difference in most cases when scanning uncompressed parquet.
There is a significant regression (50% increase in runtime) in
targeted perf tests scanning non-dictionary-encoded strings (see
benchmark output below).  After the regression performance is
comparable to Snappy compression.

I also did a TPC-H run but ran into some issues with the report
generator. I manually compared times and there were no regressions.

++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
++---+-++++
| TARGETED-PERF(_61) | parquet / none / none | 23.02   | +0.60% | 4.23  
 | +5.97% |
++---+-++++

+++---++-++++-+---+
| Workload   | Query  | File Format   | Avg(s) | 
Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
+++---++-++++-+---+
| TARGETED-PERF(_61) | PERF_STRING-Q2 | parquet / none / none | 3.00   | 
1.98| R +52.10%  |   0.97%|   1.25%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q1 | parquet / none / none | 2.86   | 
1.92| R +49.11%  |   0.34%|   2.34%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q3 | parquet / none / none | 3.16   | 
2.15| R +47.04%  |   1.03%|   0.72%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q4 | parquet / none / none | 3.16   | 
2.17| R +45.60%  |   0.14%|   1.11%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q5 | parquet / none / none | 3.51   | 
2.55| R +37.88%  |   0.83%|   0.49%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_AGG-Q5| parquet / none / none | 0.79   | 
0.61| R +30.86%  |   1.54%|   4.10%| 1   | 5 |
| TARGETED-PERF(_61) | primitive_top-n_al | parquet / none / none | 39.45  | 
35.07   |   +12.51%  |   0.29%|   0.29%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q7 | parquet / none / none | 6.78   | 
6.10|   +11.13%  |   0.99%|   0.74%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q6 | parquet / none / none | 8.83   | 
8.14|   +8.52%   |   0.15%|   0.32%| 1   | 5 |
...

Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/scanner-context.h
4 files changed, 64 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/8085/6
--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: 

[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-09-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG@7
PS5, Line 7: IMPALA-5307
> Can you add a line at the bottom what the other part(s) would look like?
Done


http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG@56
PS5, Line 56: 
+++---++-++++-+---+
> Nit: You could make the second column smaller to make this more readable, a
Done


http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc@245
PS5, Line 245:   context_->ReleaseCompletedResources(nullptr, true);
> I think it's best to change the whole file at once, or only change occurren
I fixed it while I was checking the callsites of ReleaseCompletedResources().


http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@476
PS5, Line 476:   Status AllocateUncompressedDataPage(
> Should we call this "AllocateUncompressedDataBuffer"? Otherwise it sounds t
That seems vaguer. It would also be confusing since the pool is 
data_page_pool_. I updated the comment to mention that the contents are 
uncompressed rather than the page.


http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@477
PS5, Line 477:   int64_t size, const std::string& desc, uint8_t** buffer);
> Maybe err_desc, err_detail, or detail? "desc" reminds me of descriptors.
Done. Also switch to const char* so we don't need to create a std::string for 
every call.


http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@485
PS5, Line 485: IsStringType
> This does not say "VarLenStringType" but above in a comment you refer to va
Fixed it. We don't need to copy if it's a CHAR.


http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc@1075
PS5, Line 1075:   uncompressed_size, "uncompressed variable-length 
data", _buffer));
> DCHECK(copy_buffer != nullptr); And maybe initialize it to nullptr, so that
It doesn't seem that necessary to me since it's part of the contract of the 
function and it will crash immediately anyway.


http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc@1105
PS5, Line 1105: *buffer = data_page_pool_->TryAllocate(size);
Bad indentation.



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 26 Sep 2017 23:49:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.

2017-09-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8136 )

Change subject: IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.
..


Patch Set 1:

(26 comments)

First pass. High level approach seems reasonable to me. Will take a look at the 
tests next.

http://gerrit.cloudera.org:8080/#/c/8136/1/be/src/exec/catalog-op-executor.cc
File be/src/exec/catalog-op-executor.cc:

http://gerrit.cloudera.org:8080/#/c/8136/1/be/src/exec/catalog-op-executor.cc@218
PS1, Line 218: // The first column is the COUNT(*) expr of the original query.
Can you put a similar comment above L206. I had a hard time figuring out what 
rows[0].colVals[0] represented in L208.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@58
PS1, Line 58: Existing partition-level row counts are not modified.
Mention what happens to the partitions that don't have the row count set.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@60
PS1, Line 60: unmodified
remove


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@61
PS1, Line 61: extrapolated
Maybe add some details about how extrapolation is computed. I haven't looked at 
the entire patch so you may mention it elsewhere. If so, you can just point to 
the file that provides that description.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@62
PS1, Line 62:  so as not to confuse other engines like Hive/SparkSQL which may 
rely on
:  *   the shared HMS fields being accurate.
That part may be confusing to someone with no background knowledge. For 
instance, the fact that other engines require "accurate" statistics which we 
update using sampling and extrapolation :), kind of oxymoron. I think it is ok 
to just say that we store the extrapolated stats and remove that part. Thoughts?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@64
PS1, Line 64: Independently, row-count extrapolation is also done during 
planning based on the
:  *   numRows / totalSize ratio because the table contents may 
have changed since the
:  *   last time COMPUTE STATS was run.
Remove and put it near the relevant code (planning).


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@77
PS1, Line 77: Not supported for now to keep the logic/code simple.
Ha, this is the opposite in the COMPUTE STATS case.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@106
PS1, Line 106: totalFileBytes_
Can't you use the HdfsTable.totalFileBytes_ instead?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@108
PS1, Line 108: Only set when
 :   // TABLESAMPLE was specified. Set to -1 for non-HDFS tables
Maybe "Set to -1 for non-HDFS tables or when TABLESAMPLE is not specified"?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@305
PS1, Line 305: expectAllPartitions_ = !updateTableStatsOnly()
So if I do not update table stats only I should expect to compute stats on all 
partitions? Essentially, is isIncremental check blended in this function? I 
think it may make sense to extract that check.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@344
PS1, Line 344: expectAllPartitions_ = false;
Why is this needed? Isn't L305 sufficient?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@440
PS1, Line 440: // Only add group by clause for HdfsTables.
Comment doesn't seem relevant to the code that follows.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@442
PS1, Line 442: !updateTableStatsOnly()
expectAllPartitions_?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452
PS1, Line 452: !updateTableStatsOnly()
expectAllPartitions_?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@506
PS1, Line 506: sets 'expectAllPartitions_' to false
Hm, I don't see that in the code.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@511
PS1, Line 511: base
remove?



[Impala-ASF-CR] IMPALA-5448: fix invalid number of splits reported in Parquet scan node

2017-09-26 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8147


Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet 
scan node
..

IMPALA-5448: fix invalid number of splits reported in Parquet scan node

Parquet splits with multi columns are marked as completed by using
HdfsScanNodeBase::RangeComplete(). It duplicately counts the file types
as column codec types. Thus the number of parquet splits are the real count
multiplies number of materialized columns.

Furthermore, according to the Parquet definition, it allows mixed compression
codecs on different columns. This’s handled in this patch as well. A parquet 
file
using gzip and snappy compression codec will be reported as:
FileFormats: PARQUET/(GZIP,SNAPPY):1

This patch introduces a compression types set for the above cases.

Testing:
Add end-to-end tests handling parquet files with all columns compressed in
snappy, and handling parquet files with multi compression codec.

Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/multi_compression_parquet_data/README
A testdata/multi_compression_parquet_data/tinytable_0_gzip_snappy.parq
A testdata/multi_compression_parquet_data/tinytable_1_snappy_gzip.parq
A 
testdata/workloads/functional-query/queries/QueryTest/hdfs_parquet_scan_node_profile.test
M tests/query_test/test_scanners.py
9 files changed, 85 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8147/1
--
To view, visit http://gerrit.cloudera.org:8080/8147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Gerrit-Change-Number: 8147
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 


[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients

2017-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8132 )

Change subject: IMPALA-5975: Work around broken beeline clients
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1268/


--
To view, visit http://gerrit.cloudera.org:8080/8132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada
Gerrit-Change-Number: 8132
Gerrit-PatchSet: 2
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Tue, 26 Sep 2017 23:26:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients

2017-09-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8132 )

Change subject: IMPALA-5975: Work around broken beeline clients
..


Patch Set 2: Code-Review+2

Carrying +2


--
To view, visit http://gerrit.cloudera.org:8080/8132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada
Gerrit-Change-Number: 8132
Gerrit-PatchSet: 2
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Tue, 26 Sep 2017 23:25:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.

2017-09-26 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8100 )

Change subject: IMPALA-5940: Avoid log spew by using Status::Expected.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8100/2/be/src/runtime/client-cache.h
File be/src/runtime/client-cache.h:

http://gerrit.cloudera.org:8080/#/c/8100/2/be/src/runtime/client-cache.h@243
PS2, Line 243: return Status::NoTrace(TErrorCode::RPC_RECV_TIMEOUT, 
strings::Substitute(
Is it necessary to have Status write the message to the log?  is it not 
sufficient to write an info message via LOG.info.



--
To view, visit http://gerrit.cloudera.org:8080/8100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330
Gerrit-Change-Number: 8100
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 26 Sep 2017 23:08:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.

2017-09-26 Thread Philip Zeyliger (Code Review)
Hello Alex Behm, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8100

to look at the new patch set (#2).

Change subject: IMPALA-5940: Avoid log spew by using Status::Expected.
..

IMPALA-5940: Avoid log spew by using Status::Expected.

In IMPALA-5926, we fixed a case where closing the session triggered a
stack trace in the logs which impacted performance for short-running
queries. Looking at log files from several active clusters, I identified a few
other cases where we could clean up log spew with the same (trivial)
approach.

The following messages will no longer log:

  "Failed to codegen MaterializeTuple() due to unsupported type: $0"
  "Not implemented for this format."
  "ScalarFnCall Codegen not supported for CHAR"
  "Could not codegen CodegenMaterializeExprs: "
  "Could not codegen TupleRowComparator::Compare(): $0"

These codegen related messages were happening at every execution node, and were
therefore very common.

In addition, the following messages, which were very frequent in the logs, now
will log, but without the back trace:

  "... Client ... timed-out during recv call."
  "Query Id ... not found."

To preserve the source of the messages, I used preprocessor macros __FILE__ and
__LINE__ and stuffed them into the error string. As you can see in the worked
example in the testing section below, __FILE__ includes the full source code
path, which is uglier than the logging equivalent. (glog has a helper
called "const_basename" which handles this at
https://github.com/google/glog/blob/2a6df66252d266080489c310b8146e63b66b2add/src/utilities.cc#L279)

The snippet I used to identify these was:

  find . -type f -name '*IMPALAD*.gz' | xargs gzcat  | awk '/^I/ { if(x) { 
print x; } x = "" } /status.cc/ { x=" "; } { if(x) { x=x  $0 } }'  | sed -e 
's/0x[0-9a-fx]* //g' | sed -e 's/[0-9a-f]\{16\}:[0-9a-f]*/QUERYID/g' |  tr -s 
'\t' ' ' | tr '[0-9]' 'N' | sort | uniq -c  | sort -n | tee output.txt

I also analyzed some logs using SQL, against a pre-processed logs table:

  with v as (
select regexp_replace(
regexp_replace(
  translate(substr(message, 42), "\n\t", "  "),
  "[a-zA-Z0-9.-]*[.][a-zA-Z0-9-]*:[0-9]*",
  ""),
"@.*$", "@@@...") as m
from logs_table where `class`="status.cc")
  select m, count(*) from v group by 1 order by 2 desc limit 100

Testing:
* Automated tests.
* Manual testing for one NoTrace code path:

  $ impala-python shell/gen-py/ImpalaService/ImpalaService-remote -h 
localhost:21000 GetRuntimeProfile 'beeswaxd.ttypes.QueryHandle()'
  Traceback (most recent call last):
File "shell/gen-py/ImpalaService/ImpalaService-remote", line 106, in 

  pp.pprint(client.GetRuntimeProfile(eval(args[0]),))
File "/home/philip/src/impala/shell/gen-py/ImpalaService/ImpalaService.py", 
line 161, in GetRuntimeProfile
  return self.recv_GetRuntimeProfile()
File "/home/philip/src/impala/shell/gen-py/ImpalaService/ImpalaService.py", 
line 184, in recv_GetRuntimeProfile
  raise result.error
  beeswaxd.ttypes.BeeswaxException: 
BeeswaxException(handle=QueryHandle(log_context='', id=''), log_context='', 
SQLState='HY000', _message='GetRuntimeProfile error: 
/home/philip/src/impala/be/src/service/impala-server.cc:641: Query id 0:0 not 
found.\n', errorCode=0)

  $ tail -n 1 logs/cluster/impalad.INFO
  I0926 14:51:19.867837 30582 status.cc:147] 
/home/philip/src/impala/be/src/service/impala-server.cc:641: Query id 0:0 not 
found.

Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330
---
M be/src/common/status.cc
M be/src/common/status.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/runtime/client-cache.h
M be/src/runtime/tuple.cc
M be/src/service/impala-server.cc
M be/src/util/tuple-row-compare.cc
9 files changed, 70 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8100/2
--
To view, visit http://gerrit.cloudera.org:8080/8100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330
Gerrit-Change-Number: 8100
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-5940. Avoid log spew by using Status::Expected.

2017-09-26 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8100 )

Change subject: IMPALA-5940. Avoid log spew by using Status::Expected.
..


Patch Set 1:

(1 comment)

Thanks for the review!

I added in the two messages that Mostafa mentioned on the JIRA. Doing so 
introduced two complexities:

* Whereas I believe we could never log the messages in the first draft, the new 
messages being silenced here (especially RPC timeout) should be logged.
* Now that there's no stack trace, the fact that the messages come as being 
from "status.cc" is not useful.

I see the following ways forward:

* Use draft 1 to silence the easy messages; ignore this new draft 2.
* Use draft 2. Notably, it uses the __FILE__ preprocessor macro. __func__ is 
also a possibility.
* Change the usage pattern. For example STATUS_AND_LOG(...) could be a macro 
that does VLOG and the Status generation. I was very hesitant to add macros for 
this so didn't go that route.
* Something else?

I want to highlight a few more things about patch 2:
* I started off with a signature like Status(..., bool silent, bool 
omit_backtrace), but there were only 3 possible good states. I had bugs in my 
implementation, so decided on an enum instead. It's all private, so there's 
little API risk here.
* There's a copy of ErrorMsg going on when Status::NoTrace() takes an error 
code and a string. We do several of them in various Status signatures. Given 
that we used to also compute an entire back trace, I think this is ok.
* I didn't add a new test. I didn't find any tests that look like they're 
testing our logging behavior. Happy for suggestions, though I think it's 
overkill.
* I don't know how to reproduce RPC timeouts. I "kill -STOP"ed an impalad, but 
that wasn't enough. Curious for pointers.

http://gerrit.cloudera.org:8080/#/c/8100/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8100/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-5940. Avoid log spew by using Status::Expected.
> IMPALA-5940: (colon not dot)
Done



--
To view, visit http://gerrit.cloudera.org:8080/8100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330
Gerrit-Change-Number: 8100
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 26 Sep 2017 23:03:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients

2017-09-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8132 )

Change subject: IMPALA-5975: Work around broken beeline clients
..


Patch Set 1:

Thanks for testing! Still lgtm.


--
To view, visit http://gerrit.cloudera.org:8080/8132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada
Gerrit-Change-Number: 8132
Gerrit-PatchSet: 1
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Tue, 26 Sep 2017 22:39:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients

2017-09-26 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8132 )

Change subject: IMPALA-5975: Work around broken beeline clients
..


Patch Set 1:

>From private build and test 6418,

14:21:05 Executing: create-load-data.sh
14:21:05 Loading Hive Builtins (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-hive-builtins.log)...
14:23:20 OK (Took: 2 min 15 sec)
14:23:20 Generating HBase data (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/create-hbase.log)...
 
14:23:49 OK (Took: 0 min 29 sec)
14:23:49 Creating /test-warehouse HDFS directory (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/create-test-warehouse-dir.log)...
14:23:53 OK (Took: 0 min 4 sec)
14:23:53 Derived params for create-load-data.sh:
14:23:53 EXPLORATION_STRATEGY=exhaustive
14:23:53 SKIP_METADATA_LOAD=0
14:23:53 SKIP_SNAPSHOT_LOAD=0
14:23:53 SNAPSHOT_FILE=
14:23:53 CM_HOST=
14:23:53 REMOTE_LOAD=
14:23:53 Starting Impala cluster (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/start-impala-cluster.log)...
14:24:06 OK (Took: 0 min 13 sec)
14:24:06 Setting up HDFS environment (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/setup-hdfs-env.log)...
14:24:21 OK (Took: 0 min 15 sec)
14:24:21 Loading custom schemas (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-custom-schemas.log)...
14:25:24 OK (Took: 1 min 3 sec)
14:25:24 Loading functional-query data (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-functional-query.log)...
 
15:17:32 OK (Took: 52 min 8 sec)
15:17:32 Loading TPC-H data (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-tpch.log)...
15:42:56 OK (Took: 25 min 24 sec)
15:42:56 Loading nested data (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-nested.log)...
15:48:33 OK (Took: 5 min 37 sec)
15:48:33 Loading TPC-DS data (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-tpcds.log)...
16:16:20 OK (Took: 27 min 46 sec)
16:16:20 Loading auxiliary workloads (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-aux-workloads.log)...
16:27:10 OK (Took: 10 min 51 sec)
16:27:10 Loading dependent tables (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/copy-and-load-dependent-tables.log)...
16:28:18 OK (Took: 1 min 7 sec)
16:28:18 Loading custom data (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-custom-data.log)...
16:30:16 OK (Took: 1 min 58 sec)
16:30:16 Creating many block table (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/create-table-many-blocks.log)...
16:31:18 OK (Took: 1 min 2 sec)
16:31:18 Loading Kudu functional (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-kudu.log)...
16:35:57 OK (Took: 4 min 39 sec)
16:35:57 Loading Kudu TPCH (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-kudu-tpch.log)...
16:42:17 OK (Took: 6 min 20 sec)
16:42:17 Loading Hive UDFs (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/build-and-copy-hive-udfs.log)...
16:43:46 OK (Took: 1 min 29 sec)
16:43:46 Running custom post-load steps (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/custom-post-load-steps.log)...
16:44:20 OK (Took: 0 min 34 sec)
16:44:20 Caching test tables (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/cache-test-tables.log)...
16:44:28 OK (Took: 0 min 8 sec)
16:44:28 Loading external data sources (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-ext-data-source.log)...
16:44:39 OK (Took: 0 min 11 sec)
16:44:39 Splitting HBase (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/create-hbase.log)...
16:45:15 OK (Took: 0 min 36 sec)
16:45:15 Creating internal HBase table (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/create-internal-hbase-table.log)...
16:46:01 OK (Took: 0 min 46 sec)
16:46:01 Computing table stats (logging to 
/data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/compute-table-stats.log)...
16:58:20 OK 

[Impala-ASF-CR] IMPALA-4682 Fix IllegalStateException issue

2017-09-26 Thread anujphadke (Code Review)
anujphadke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8143 )

Change subject: IMPALA-4682 Fix IllegalStateException issue
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8143/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8143/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-4682 Fix IllegalStateException issue
Can you state the issue in short?


http://gerrit.cloudera.org:8080/#/c/8143/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8143/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@731
PS1, Line 731: + selectListItem.toSql());
Can  we add a test  for this?



--
To view, visit http://gerrit.cloudera.org:8080/8143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7
Gerrit-Change-Number: 8143
Gerrit-PatchSet: 1
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 26 Sep 2017 21:44:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4682 Fix IllegalStateException issue

2017-09-26 Thread Zoram Thanga (Code Review)
Zoram Thanga has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8143


Change subject: IMPALA-4682 Fix IllegalStateException issue
..

IMPALA-4682 Fix IllegalStateException issue

When one runs a query like

'select * from t order by count(a)'

we are throwing an IllegalStateException. The desired behavior here
is to handle the problem as an AnalysisExeption with a more
meaningful error message.

This patch fixes the handling of '*' in this context, so that the
error becomes

"ERROR: AnalysisException: select list expression not produced by
aggregation output (missing from GROUP BY clause?): *"

Note that the second changed line is required because
selectListItem.Expr_ is null when SelectListItem.isStar_ is true.

Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
1 file changed, 1 insertion(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/8143/1
--
To view, visit http://gerrit.cloudera.org:8080/8143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7
Gerrit-Change-Number: 8143
Gerrit-PatchSet: 1
Gerrit-Owner: Zoram Thanga 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Reviewed-on: http://gerrit.cloudera.org:8080/7731
Reviewed-by: Dimitris Tsirogiannis 
Tested-by: Impala Public Jenkins
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M tests/statestore/test_statestore.py
21 files changed, 480 insertions(+), 441 deletions(-)

Approvals:
  Dimitris Tsirogiannis: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 9
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 8
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 26 Sep 2017 20:20:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort

2017-09-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8123 )

Change subject: IMPALA-5870: Improve runtime profile for partial sort
..


Patch Set 3:

The GVO failed because there was a kudu insert test that checked the profile 
for the 'SpilledRuns: 0' line. Since that is now gone, the test checks for 
'SortType: Partial' (the real point of the test was that the query wouldn't 
have run previously due to the mem limit, which still applies, so we're not 
particularly losing any coverage).


--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 26 Sep 2017 19:46:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort

2017-09-26 Thread Thomas Tauber-Marshall (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8123

to look at the new patch set (#3).

Change subject: IMPALA-5870: Improve runtime profile for partial sort
..

IMPALA-5870: Improve runtime profile for partial sort

A recent change (IMPALA-5498) added the ability to do partial sorts,
which divide their input up into runs each of which is sorted
individually, avoiding the need to spill. Some of the debug output
wasn't updated vs. regular sorts, leading to confusion.

This patch removes the counters 'SpilledRuns' and 'MergesPerformed'
since they will always be 0, and it renames the 'IntialRunsCreated'
counter to 'RunsCreated' since the 'Initial' refers to the fact that
in a regular sort those runs may be spilled or merged.

It also adds a profile info string 'SortType' that can take the values
'Total', 'TopN', or 'Partial' to reflect the type of exec node being
used.

Example profile snippet for a partial sort:
SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%)
 SortType: Partial
 ExecOption: Codegen Enabled
- NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of 
samples: 1)
- InMemorySortTime: 34.201us
- PeakMemoryUsage: 2.02 MB (2117632)
- RowsReturned: 44 (44)
- RowsReturnedRate: 109.11 K/sec
- RunsCreated: 1 (1)
- SortDataSize: 572.00 B (572)

Testing:
- Manually ran several sorting queries and inspected their profiles
- Updated a kudu_insert test that relied on the 'SpilledRuns' counter
  to be 0 for a partial sort.

Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
---
M be/src/exec/partial-sort-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/runtime/sorter.cc
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
5 files changed, 11 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8123/3
--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-09-26 Thread Tim Wood (Code Review)
Tim Wood has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8140 )

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..

IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

Main source for TPCDS query and result definitions: 
https://github.com/gregrahn/tpcds-kit.
TPC-DS v2.5.0 qualification queries from G. Rahn, Cloudera, Inc.
Data set constructed in mini-cluster using $IMPALA_HOME/buildall.sh 
-testdata
This commit continues previous work on IMPALA-5376 in the ASF Impala repo
and the Cloudera Gerrit service.

This commit splits multi-query tests in the TPC-DS suite definition into one
query and result set per test file, as the test framework requires.  Names for
such files have -1, -2... inner suffixes.

The complete TPC-DS test suite runs with passes, skips and xfails,
but no failures, as reflected by runs of
$IMPALA_HOME/tests/run-tests.py query_test/test_tpcds_queries.py ...
Expected result sets come from the TPC-DS kit.  Some TPC-DS test cases
in this commit have been modified in sematically-neutral ways so as to pass
on Impala; others are marked to skip or xfail due to bugs.  Tests that flap
are marked to skip, with a bug ID, since they don't reliably pass or xfail.
The tests/query_test/test_tpcds_queries.py driver file is authoritative for the
active/skip/xfail status for each case and a brief reason.  The following list
describes the current status as:
--- test-name
deviance from TPC-DS spec
changes made

--- tpcds-q22a.test
RESULT MISMATCH in LSD of AVG() values
Fixed AVG()s
--- tpcds-q30.test
UNRECOGNIZED CHARACTER
MARKED XFAIL, IMPALA-5961.
--- tpcds-q31.test
RESULT MISMATCH in LSD of DECIMAL values
ADDED TRUNCATE(2)s AROUND LAST 4 COLUMNS. MARKED SKIP, IMPALA-5956
--- tpcds-q35a.test
RESULT MISMATCH
MARKED XFAIL, IMPALA-5950.
--- tpcds-q36a.test
RESULT MISMATCH
MARKED XFAIL, IMPALA-4741
--- tpcds-q39.test
MULTIPLE RESULT SET not recognized by test framework
MARKED XFAIL.
--- tpcds-q47.test
RESULT MISMATCH in LSD of DECIMAL values
ADDED TRUNCATE(2) TO 8th COLUMN OF WITH TABLE, TAKE ACTUAL RESULT AS EXPECTED.
--- tpcds-q49.test
RESULT MISMATCH in LSD of DECIMAL values
MARKED XFAIL, IMPALA-5945
--- tpcds-q57.test
RESULT MISMATCH, excess scale in DECIMAL values
FIXED, ADDED TRUNCATE(2) AROUND 6th COLUMN.
--- tpcds-q58.test
RESULT MISMATCH in DECIMAL values
MARKED XFAIL. IMPALA-5946
--- tpcds-q59.test
RESULT MISMATCH, excess scale in DECIMAL values
FIXED, ADDED TRUNCATE(2) AROUND 4th-10th COLUMNS.
--- tpcds-q61.test
RESULT MISMATCH in DECIMAL value
FIXED. CAST RESULT QUOTIENT TO DECIMAL(15, 4), TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q63.test
RESULT MISMATCH, excess scale in DECIMAL values
ADDED TRUNCATE(2) TO 3rd COLUMN
--- tpcds-q64.test
RESULT MISMATCH
ADDED ORDER BY COLUMNS.
--- tpcds-q66.test
RESULT MISMATCH
MARKED XFAIL, IMPALA-4741
--- tpcds-q77a.test
RESULT MISMATCH
FIXED. TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q78.test
RESULT MISMATCH
FIXED. TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q83.test
RESULT MISMATCH
MARKED XFAIL. IMPALA-5945.
--- tpcds-q85.test
MISSING TABLE "reason"
MARKED XFAIL, IMPALA-5960
--- tpcds-q86a.test
RESULT MISMATCH
FIXED. TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q89.test
RESULT MISMATCH, DECIMAL values flap
MARKED XFAIL. ADDED ROUND(2) TO 8th COLUMN, TAKE ACTUAL RESULTS AS EXPECTED, 
IMPALA-5956.
--- tpcds-q90.test
RESULT MISMATCH
MARKED XFAIL, IMPALA-5945.
--- tpcds-q93.test
MISSING TABLE "reason"
MARKED XFAIL, IMPALA-5960
--- tpcds-q98.test
RESULT MISMATCH
FIXED, ADDED ROUND() TO LAST COLUMN

Trial run to suppress fractional comparison diffs.
Still not sure of the general approach for DOUBLE & DECIMAL in all tests.

Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb
---
A testdata/workloads/tpcds/queries/tpcds-q10.test
A testdata/workloads/tpcds/queries/tpcds-q10a.test
A testdata/workloads/tpcds/queries/tpcds-q11.test
A testdata/workloads/tpcds/queries/tpcds-q12.test
A testdata/workloads/tpcds/queries/tpcds-q13.test
A testdata/workloads/tpcds/queries/tpcds-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q14a-1.test
A testdata/workloads/tpcds/queries/tpcds-q14a-2.test
A testdata/workloads/tpcds/queries/tpcds-q15.test
A testdata/workloads/tpcds/queries/tpcds-q16.test
A testdata/workloads/tpcds/queries/tpcds-q17.test
A testdata/workloads/tpcds/queries/tpcds-q18.test
A testdata/workloads/tpcds/queries/tpcds-q18a.test
A testdata/workloads/tpcds/queries/tpcds-q20.test
A testdata/workloads/tpcds/queries/tpcds-q21.test
A testdata/workloads/tpcds/queries/tpcds-q22.test
A testdata/workloads/tpcds/queries/tpcds-q22a.test
M testdata/workloads/tpcds/queries/tpcds-q23-1.test
M testdata/workloads/tpcds/queries/tpcds-q23-2.test
A testdata/workloads/tpcds/queries/tpcds-q24-1.test
A testdata/workloads/tpcds/queries/tpcds-q24-2.test
A 

[Impala-ASF-CR] Trial run to suppress fractional comparison diffs. Still not sure of the general approach for DOUBLE & DECIMAL in all tests.

2017-09-26 Thread Tim Wood (Code Review)
Tim Wood has restored this change. ( http://gerrit.cloudera.org:8080/8140 )

Change subject: Trial run to suppress fractional comparison diffs. Still not 
sure of the general approach for DOUBLE & DECIMAL in all tests.
..


Restored
--
To view, visit http://gerrit.cloudera.org:8080/8140
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb
Gerrit-Change-Number: 8140
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Wood 


[Impala-ASF-CR] [DOCS] Fill in release note subtopics for Apache Impala 2.10

2017-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/7958 )

Change subject: [DOCS] Fill in release note subtopics for Apache Impala 2.10
..

[DOCS] Fill in release note subtopics for Apache Impala 2.10

Primarily just pointing to the list of issues in the changelog.
Those cover the different use cases for the different parts
of the release notes -- fixed issues, new features, and
incompatible changes.

Change-Id: Ide38c1e1c64dac287b180b39ebb8e7735d457ce3
Reviewed-on: http://gerrit.cloudera.org:8080/7958
Reviewed-by: Michael Brown 
Tested-by: Impala Public Jenkins
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_fixed_issues.xml
M docs/topics/impala_incompatible_changes.xml
M docs/topics/impala_new_features.xml
4 files changed, 54 insertions(+), 1 deletion(-)

Approvals:
  Michael Brown: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/7958
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ide38c1e1c64dac287b180b39ebb8e7735d457ce3
Gerrit-Change-Number: 7958
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] [DOCS] Fill in release note subtopics for Apache Impala 2.10

2017-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7958 )

Change subject: [DOCS] Fill in release note subtopics for Apache Impala 2.10
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/7958
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide38c1e1c64dac287b180b39ebb8e7735d457ce3
Gerrit-Change-Number: 7958
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Tue, 26 Sep 2017 18:44:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Fill in release note subtopics for Apache Impala 2.10

2017-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7958 )

Change subject: [DOCS] Fill in release note subtopics for Apache Impala 2.10
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-docs-submit/155/


--
To view, visit http://gerrit.cloudera.org:8080/7958
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide38c1e1c64dac287b180b39ebb8e7735d457ce3
Gerrit-Change-Number: 7958
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Tue, 26 Sep 2017 18:39:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-09-26 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8023 )

Change subject: IMPALA-4856: Port data stream service to KRPC
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.cc@198
PS1, Line 198: AddData
Isn't the point of the deserialize pool to deserialize the payload early?
Here, we're just calling AddData() on the payloads for early senders after the 
corresponding receiver has been created.


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc@165
PS1, Line 165: Either we'll consume a row batch from batch_queue_, or it's empty
Shouldn't there always be something in the batch_queue_ if there's something in 
the blocked_senders_ list? Since we fill the blocked_senders_ only if the queue 
is at its limit.

And we also logically service the batches from batch_queue_ first before 
servicing the batches from the blocked_senders_ list.


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc@166
PS1, Line 166: There is a window
Just to make things clearer, could you specify what there's a window for?


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc@225
PS1, Line 225:
There is a problem here. When we release lock_here, an arbitrary number of 
senders could call AddBatch(), and all their batches would get enqueued even 
though the ExceedsLimit() would be true. This breaks the guarantee of the queue 
not being over committed more than a single batch.


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc@284
PS1, Line 284:   for (const auto& queue_entry: batch_queue_) delete 
queue_entry.second;
batch_queue_.clear() ?



--
To view, visit http://gerrit.cloudera.org:8080/8023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Gerrit-Change-Number: 8023
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 26 Sep 2017 18:11:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Trial run to suppress fractional comparison diffs. Still not sure of the general approach for DOUBLE & DECIMAL in all tests.

2017-09-26 Thread Tim Wood (Code Review)
Tim Wood has abandoned this change. ( http://gerrit.cloudera.org:8080/8140 )

Change subject: Trial run to suppress fractional comparison diffs. Still not 
sure of the general approach for DOUBLE & DECIMAL in all tests.
..


Abandoned

Wrong Gerrit changeID.
--
To view, visit http://gerrit.cloudera.org:8080/8140
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb
Gerrit-Change-Number: 8140
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Wood 


[Impala-ASF-CR] Trial run to suppress fractional comparison diffs. Still not sure of the general approach for DOUBLE & DECIMAL in all tests.

2017-09-26 Thread Tim Wood (Code Review)
Tim Wood has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8140


Change subject: Trial run to suppress fractional comparison diffs. Still not 
sure of the general approach for DOUBLE & DECIMAL in all tests.
..

Trial run to suppress fractional comparison diffs.
Still not sure of the general approach for DOUBLE & DECIMAL in all tests.

Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb
---
M testdata/workloads/tpcds/queries/tpcds-q26.test
M testdata/workloads/tpcds/queries/tpcds-q53.test
2 files changed, 3 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/8140/1
--
To view, visit http://gerrit.cloudera.org:8080/8140
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb
Gerrit-Change-Number: 8140
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Wood 


[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet

2017-09-26 Thread Pranay Singh (Code Review)
Hello Joe McDonnell,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8056

to look at the new patch set (#5).

Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed 
parquet
..

IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet

test_scanners_fuzz.py currently tests compressed parquet but
does not test uncompressed parquet. This fix adds a new test
case for uncompressed parquet.

Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc
---
M tests/query_test/test_scanners_fuzz.py
1 file changed, 42 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/8056/5
--
To view, visit http://gerrit.cloudera.org:8080/8056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc
Gerrit-Change-Number: 8056
Gerrit-PatchSet: 5
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh


[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet

2017-09-26 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8056 )

Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed 
parquet
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8056/2/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/8056/2/tests/query_test/test_scanners_fuzz.py@97
PS2, Line 97: if table_format.file_format != 'parquet': pytest.sk
> Another option is to keep the clone code in run_fuzz_test, but change the a
Done


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@101
PS4, Line 101: """ Clone an existing parquet table with codec as none in the
 : unique database. This cloned table is passed to 
run_fuzz_test
 : which clones the table and corrupts the table. The test 
later
 : checks that there is no crash while performing SQL 
queries on
 : a corrupt table.
 : """
> I think this comment should focus on why this test is different from the ot
Done


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@111
PS4, Line 111: db_name = unique_database
> I would prefer to emphasize that the source and destination are the unique_
Done


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@117
PS4, Line 117: functional_parquet.alltypes
> Can we extend this to do fuzzing on decimal_tbl as well? I was thinking thi
Done


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@118
PS4, Line 118: .format(fq_tbl_name))
> This indentation is a bit awkward. I don't think .format should be on its o
Moved the format in the same line as select.



--
To view, visit http://gerrit.cloudera.org:8080/8056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc
Gerrit-Change-Number: 8056
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Tue, 26 Sep 2017 17:29:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1267/


--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 8
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 26 Sep 2017 16:20:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 8: Code-Review+2

Rebase and keep Dan's +2


--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 8
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 26 Sep 2017 16:20:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5983: Fix crash in to/from utc timestamp("10:00:00", 'MSK')

2017-09-26 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8139 )

Change subject: IMPALA-5983: Fix crash in to/from_utc_timestamp("10:00:00", 
'MSK')
..


Patch Set 1:

Thanks for fixing this. Can you add some tests?


--
To view, visit http://gerrit.cloudera.org:8080/8139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I592389333a16a15a680beed389e2702dfc16fe1d
Gerrit-Change-Number: 8139
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Tue, 26 Sep 2017 16:16:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5983: Fix crash in to/from utc timestamp("10:00:00", 'MSK')

2017-09-26 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8139


Change subject: IMPALA-5983: Fix crash in to/from_utc_timestamp("10:00:00", 
'MSK')
..

IMPALA-5983: Fix crash in to/from_utc_timestamp("10:00:00", 'MSK')

Moscow timezone is handled differrently than other timezones,
and it was possible to cause unhandled boost exception by
trying to convert "dateless" timestamps like "10:00:00".

These timestamps cannot be handled by timestamp conversion
generally, because daylight saving time logic needs month
and day to work correctly. The condition HasDateOrTime()
incorrectly suggested that these timestamps can be handled,
so I made the condition stricter.

Change-Id: I592389333a16a15a680beed389e2702dfc16fe1d
---
M be/src/exprs/timestamp-functions.cc
1 file changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/8139/1
--
To view, visit http://gerrit.cloudera.org:8080/8139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I592389333a16a15a680beed389e2702dfc16fe1d
Gerrit-Change-Number: 8139
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR] IMPALA-4786: Clean up how ImpalaServers are created

2017-09-26 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8076 )

Change subject: IMPALA-4786: Clean up how ImpalaServers are created
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8076/2/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8076/2/be/src/service/impala-server.h@999
PS2, Line 999:   /// Init() were <= 0.
> add to comment:
Done


http://gerrit.cloudera.org:8080/#/c/8076/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8076/2/be/src/service/impala-server.cc@2078
PS2, Line 2078:   }
> i think we still need to reset these pointers since they hold a shared_ptr
Done



--
To view, visit http://gerrit.cloudera.org:8080/8076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If388c5618258a9c4529cd1d63e956566b92bd0d8
Gerrit-Change-Number: 8076
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 26 Sep 2017 14:38:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4786: Clean up how ImpalaServers are created

2017-09-26 Thread Sailesh Mukil (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8076

to look at the new patch set (#3).

Change subject: IMPALA-4786: Clean up how ImpalaServers are created
..

IMPALA-4786: Clean up how ImpalaServers are created

ImpalaServer had to be created via an awkward CreateImpalaServer()
method that took a lot of arguments. This patch refactors that code (in
anticipation of some KRPC changes) to follow a more normal lifecycle:

1. ImpalaServer* server = new ImpalaServer(ExecEnv*)
2. RETURN_IF_ERROR(server->Init()) // for error-returning init operations
3. RETURN_IF_ERROR(server->Start())
4. server->Join()

Also add ExecEnv::Init(), and move calls to ExecEnv::StartServices() to
ImpalaServer::StartServices(). This captures a dependency that KRPC will
rely on - where initialization of both ExecEnv and ImpalaServer need to
happen before services are started.

This sets up a clean-up of InProcessImpalaServer, which is too heavy for
the work that it does. That work is deferred to a follow-on patch.

This is a slightly cleaned up version of Henry's abandoned patch:
https://gerrit.cloudera.org/#/c/7673/

Change-Id: If388c5618258a9c4529cd1d63e956566b92bd0d8
---
M be/src/exprs/expr-test.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/testutil/in-process-servers.cc
M be/src/testutil/in-process-servers.h
M be/src/util/hdfs-util-test.cc
10 files changed, 128 insertions(+), 137 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/8076/3
--
To view, visit http://gerrit.cloudera.org:8080/8076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If388c5618258a9c4529cd1d63e956566b92bd0d8
Gerrit-Change-Number: 8076
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil