[Impala-ASF-CR] IMPALA-5717: Support for reading ORC data files

2018-03-25 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9134 )

Change subject: IMPALA-5717: Support for reading ORC data files
..


Patch Set 8:

Just rebase the patch


--
To view, visit http://gerrit.cloudera.org:8080/9134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
Gerrit-Change-Number: 9134
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 26 Mar 2018 04:55:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] [experimental] Clang Tidy Diff trial balloon

2018-03-25 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9751 )

Change subject: [experimental] Clang Tidy Diff trial balloon
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG@22
PS1, Line 22: The main change is to find a way to get the
> The backstory on this (which is somewhat redundant given Phil's comment) is
I like your last suggestion as an alternative to doing #3 first. Maybe it would 
be good to, in this patch, keep both systems but label them differently and 
maybe even preemptively add some more of those checks we'll eventually want but 
that we are not willing to apply to the whole codebase.



--
To view, visit http://gerrit.cloudera.org:8080/9751
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2fb6a13400367fd3d12a4738bbb2dfc944466a7
Gerrit-Change-Number: 9751
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 26 Mar 2018 02:39:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [experimental] Clang Tidy Diff trial balloon

2018-03-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9751 )

Change subject: [experimental] Clang Tidy Diff trial balloon
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG@22
PS1, Line 22: The main change is to find a way to get the
> I see - I always buildall, so I never noticed that myself.
The backstory on this (which is somewhat redundant given Phil's comment) is 
that I have had some changes that passed Gerrit review that then failed on GVO 
due to clang-tidy. The code changes passed the jenkins.impala.io's 
pre-review-test job (which doesn't do clang-tidy). My experience is that 
failure during GVO is the worst possible time.

There are a few observations:
A. I should have run clang-tidy.
B. pre-review-test should do clang-tidy
C. There should be an automated process that flags clang-tidy problems on 
Gerrit reviews.

"A" is true. I should have run clang-tidy. clang-tidy-1604 on jenkins.impala.io 
takes about 20 minutes (17-22 minutes or so). My guess is that this fits pretty 
well with how long it takes on my development machine. I would like "A" to be 
faster. Processing only the changed files speeds this up. On Kudu, 
clang-tidy-diff.py usually runs in a couple minutes. When something takes 
longer and ties up a development environment, developers get lazy.

We fixed "B" by adding a gerrit-verify-dryrun-external, which does exactly what 
GVO does without the +1 verify. This is great. I use it all the time.

I recently did some code changes on Kudu, and "C" is a very nice place to be. 
Quite a few clang-tidy issues won't be caught by reviewers. The clang-tidy 
issue that hit me on Kudu was this:
string long_string(INT_MAX, 'a');
Clang-tidy fails due to the string initialization being too big. I didn't know 
this check existed. "C" found it within an hour of posting to Gerrit without 
tying up my development system. It could have failed much later.

We should have "C" for Impala. I'm agnostic about how this happens. The script 
in this code change has logic to interpret the diff and post to Gerrit. We can 
use that or adapt it to the existing buildall.sh system (which doesn't sound 
too hard if the diff is similar).

Jim, you listed #3 before #1,#2. I want to make sure I'm not misinterpreting 
your preferences. What would you see as steps we need to take for #3?

I don't see very much cost to having two parallel systems, so long as they are 
labeled appropriately. Having some experience from actual code changes trumps 
any of my arm-chair analysis about which is better. That's why I'm in favor of 
merging this before figuring out whether it can fully replace the existing 
system. In other words, I put strong emphasis on the "if we can disable one" 
part of #3 when I put it last. I'm also generally ok with having both for an 
extended period of time (or forever).

Making the two systems distinct and clear to developers is important. A 
developer should know whether this is a diff-based clang-tidy or a full 
clang-tidy, and the scripts should be clear about what they do and how they are 
named.



--
To view, visit http://gerrit.cloudera.org:8080/9751
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2fb6a13400367fd3d12a4738bbb2dfc944466a7
Gerrit-Change-Number: 9751
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 26 Mar 2018 01:38:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5717: Support for reading ORC data files

2018-03-25 Thread Quanlong Huang (Code Review)
Hello Tim Armstrong, Joe McDonnell, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9134

to look at the new patch set (#8).

Change subject: IMPALA-5717: Support for reading ORC data files
..

IMPALA-5717: Support for reading ORC data files

This patch integrates the orc library into Impala and implements
HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner
supplies input needed from the orc-reader, tracks memory consumption of
the reader and transfers the reader's output (orc::ColumnVectorBatch)
into impala::RowBatch. The ORC version we used is release-1.4.3.

Currently, we only support reading primitive types. Writing into ORC
table has not been supported neither.

Tests
 - Most of the end-to-end tests can run on ORC format.
 - Add tpcds, tpch tests for ORC.
 - Add some ORC specific tests.
 - Haven't enabled test_scanner_fuzz for ORC yet, since the ORC library
   is not robust for corrupt files (ORC-315).

Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-orc-scanner.cc
A be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindOrc.cmake
M common/thrift/CatalogObjects.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/jflex/sql-scanner.flex
M testdata/LineItemMultiBlock/README.dox
A testdata/LineItemMultiBlock/lineitem_orc_multiblock_one_stripe.orc
A testdata/LineItemMultiBlock/lineitem_sixblocks.orc
A testdata/LineItemMultiBlock/lineitem_threeblocks.orc
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
M testdata/bin/run-hive-server.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
A testdata/data/chars-formats.orc
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test
M testdata/workloads/functional-query/functional-query_core.csv
M testdata/workloads/functional-query/functional-query_dimensions.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M testdata/workloads/functional-query/functional-query_pairwise.csv
A 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/tpcds/tpcds_core.csv
M testdata/workloads/tpcds/tpcds_dimensions.csv
M testdata/workloads/tpcds/tpcds_exhaustive.csv
M testdata/workloads/tpcds/tpcds_pairwise.csv
M testdata/workloads/tpch/tpch_core.csv
M testdata/workloads/tpch/tpch_dimensions.csv
M testdata/workloads/tpch/tpch_exhaustive.csv
M testdata/workloads/tpch/tpch_pairwise.csv
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
M tests/common/test_vector.py
M tests/comparison/cli_options.py
M tests/query_test/test_chars.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_scanners.py
M tests/query_test/test_tpch_queries.py
55 files changed, 1,637 insertions(+), 162 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/9134/8
--
To view, visit http://gerrit.cloudera.org:8080/9134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
Gerrit-Change-Number: 9134
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7631: Use private index in bootstrap virtualenv

2018-03-25 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9798 )

Change subject: IMPALA-7631: Use private index in bootstrap_virtualenv
..


Patch Set 1:

Passed dry-run here: https://jenkins.impala.io/job/gerrit-verify-dryrun/2172/


--
To view, visit http://gerrit.cloudera.org:8080/9798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I85f75f1f1a305f3043e0910ab88a880eeb30f00b
Gerrit-Change-Number: 9798
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Mon, 26 Mar 2018 00:25:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7631: Use private index in bootstrap virtualenv

2018-03-25 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9798 )

Change subject: IMPALA-7631: Use private index in bootstrap_virtualenv
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/9798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I85f75f1f1a305f3043e0910ab88a880eeb30f00b
Gerrit-Change-Number: 9798
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Mon, 26 Mar 2018 00:25:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7631: Use private index in bootstrap virtualenv

2018-03-25 Thread Lars Volker (Code Review)
Lars Volker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9798


Change subject: IMPALA-7631: Use private index in bootstrap_virtualenv
..

IMPALA-7631: Use private index in bootstrap_virtualenv

This change switches to using a private pypi index url when using a
private pypi mirror. This allows to run the tests without relying on the
public Python pypi mirrors.

Some packages can not detect their dependencies correctly when they get
installed together with the dependencies in the same call to pip. This
change adds a second stage of package installation to separate these
packages from their dependencies.

It also adds a few missing packages and updates some packages to newer
versions.

Testing: Ran this on a box where I blocked DNS resolution to Python's
upstream pypi.

Change-Id: I85f75f1f1a305f3043e0910ab88a880eeb30f00b
---
M infra/python/bootstrap_virtualenv.py
M infra/python/deps/compiled-requirements.txt
M infra/python/deps/pip_download.py
M infra/python/deps/requirements.txt
A infra/python/deps/stage2-requirements.txt
5 files changed, 65 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/9798/1
--
To view, visit http://gerrit.cloudera.org:8080/9798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I85f75f1f1a305f3043e0910ab88a880eeb30f00b
Gerrit-Change-Number: 9798
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 


[Impala-ASF-CR] [experimental] Clang Tidy Diff trial balloon

2018-03-25 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9751 )

Change subject: [experimental] Clang Tidy Diff trial balloon
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG@22
PS1, Line 22: The main change is to find a way to get the
> I don't want to speak with Joe, but there are two benefits here that I see:
I see - I always buildall, so I never noticed that myself.

As long as the codebase remains tidy-clean, run_clang_tidy will show the diff 
from upstream, since upstream would produce no tidy violations. That said, I'm 
now understanding Joe's point that "it might let us push things in the 
direction we want to go without rewriting".

Given how simplistic the Jenkins job is, I'd lean towards #3, #1, #2 for the 
ordering.



--
To view, visit http://gerrit.cloudera.org:8080/9751
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2fb6a13400367fd3d12a4738bbb2dfc944466a7
Gerrit-Change-Number: 9751
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 25 Mar 2018 20:25:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6728: Always use Kudu based kinit if FLAGS use krpc=true

2018-03-25 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9797


Change subject: IMPALA-6728: Always use Kudu based kinit if FLAGS_use_krpc=true
..

IMPALA-6728: Always use Kudu based kinit if FLAGS_use_krpc=true

We rely on the KPRC logic to do the Kerberos authentication
when KRPC is enabled. Therefore, when FLAGS_ues_krpc=true,
we must always call kudu::security::InitKerberosForServer()
to initialize the Kerberos related logic. This change makes
Impala ignore FLAGS_use_kudu_kinit=false when FLAGS_use_krpc=true.

Change-Id: Ia7086e5c9b460233e9e957f886141b3e6bba414b
---
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication.cc
M be/src/rpc/rpc-mgr-kerberized-test.cc
M be/src/rpc/rpc-mgr-test-base.h
M be/src/rpc/thrift-server-test.cc
M be/src/testutil/mini-kdc-wrapper.cc
M be/src/testutil/mini-kdc-wrapper.h
7 files changed, 136 insertions(+), 104 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/9797/1
--
To view, visit http://gerrit.cloudera.org:8080/9797
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia7086e5c9b460233e9e957f886141b3e6bba414b
Gerrit-Change-Number: 9797
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] [experimental] Clang Tidy Diff trial balloon

2018-03-25 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9751 )

Change subject: [experimental] Clang Tidy Diff trial balloon
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG@22
PS1, Line 22: The main change is to find a way to get the
> I don't think I quite understand what facts about the status quo make it so
I don't want to speak with Joe, but there are two benefits here that I see:

I ran run_clang_tidy.sh once and it blew away my CMake files, so I couldn't 
iterate with "make impalad" any more. I ended up having to run buildall to get 
myself out of a confusing state. This, um, discouraged me from running tidy 
ever again.

Kudu's stuff is also showing us the tidy diff from upstream, rather than all 
the tidy output. That would let us integrate it with Gerrit and such, and 
produce a smaller amount of output to sift through for developers.

In practice, I've seen plenty of reviews where fixing up the clang-tidy thing 
is the last thing that happens, because Gerrit-Verify failed. It's usually 
harmless, but anything we can do to speed up those fixes seems like a nice 
time-savings.



--
To view, visit http://gerrit.cloudera.org:8080/9751
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2fb6a13400367fd3d12a4738bbb2dfc944466a7
Gerrit-Change-Number: 9751
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 25 Mar 2018 16:56:50 +
Gerrit-HasComments: Yes