[Impala-ASF-CR] IMPALA-5717: Support for reading ORC data files
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9134 ) Change subject: IMPALA-5717: Support for reading ORC data files .. Patch Set 8: Just rebase the patch -- To view, visit http://gerrit.cloudera.org:8080/9134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 Gerrit-Change-Number: 9134 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong HuangGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 26 Mar 2018 04:55:16 + Gerrit-HasComments: No
[Impala-ASF-CR] [experimental] Clang Tidy Diff trial balloon
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/9751 ) Change subject: [experimental] Clang Tidy Diff trial balloon .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG@22 PS1, Line 22: The main change is to find a way to get the > The backstory on this (which is somewhat redundant given Phil's comment) is I like your last suggestion as an alternative to doing #3 first. Maybe it would be good to, in this patch, keep both systems but label them differently and maybe even preemptively add some more of those checks we'll eventually want but that we are not willing to apply to the whole codebase. -- To view, visit http://gerrit.cloudera.org:8080/9751 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2fb6a13400367fd3d12a4738bbb2dfc944466a7 Gerrit-Change-Number: 9751 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 26 Mar 2018 02:39:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [experimental] Clang Tidy Diff trial balloon
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/9751 ) Change subject: [experimental] Clang Tidy Diff trial balloon .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG@22 PS1, Line 22: The main change is to find a way to get the > I see - I always buildall, so I never noticed that myself. The backstory on this (which is somewhat redundant given Phil's comment) is that I have had some changes that passed Gerrit review that then failed on GVO due to clang-tidy. The code changes passed the jenkins.impala.io's pre-review-test job (which doesn't do clang-tidy). My experience is that failure during GVO is the worst possible time. There are a few observations: A. I should have run clang-tidy. B. pre-review-test should do clang-tidy C. There should be an automated process that flags clang-tidy problems on Gerrit reviews. "A" is true. I should have run clang-tidy. clang-tidy-1604 on jenkins.impala.io takes about 20 minutes (17-22 minutes or so). My guess is that this fits pretty well with how long it takes on my development machine. I would like "A" to be faster. Processing only the changed files speeds this up. On Kudu, clang-tidy-diff.py usually runs in a couple minutes. When something takes longer and ties up a development environment, developers get lazy. We fixed "B" by adding a gerrit-verify-dryrun-external, which does exactly what GVO does without the +1 verify. This is great. I use it all the time. I recently did some code changes on Kudu, and "C" is a very nice place to be. Quite a few clang-tidy issues won't be caught by reviewers. The clang-tidy issue that hit me on Kudu was this: string long_string(INT_MAX, 'a'); Clang-tidy fails due to the string initialization being too big. I didn't know this check existed. "C" found it within an hour of posting to Gerrit without tying up my development system. It could have failed much later. We should have "C" for Impala. I'm agnostic about how this happens. The script in this code change has logic to interpret the diff and post to Gerrit. We can use that or adapt it to the existing buildall.sh system (which doesn't sound too hard if the diff is similar). Jim, you listed #3 before #1,#2. I want to make sure I'm not misinterpreting your preferences. What would you see as steps we need to take for #3? I don't see very much cost to having two parallel systems, so long as they are labeled appropriately. Having some experience from actual code changes trumps any of my arm-chair analysis about which is better. That's why I'm in favor of merging this before figuring out whether it can fully replace the existing system. In other words, I put strong emphasis on the "if we can disable one" part of #3 when I put it last. I'm also generally ok with having both for an extended period of time (or forever). Making the two systems distinct and clear to developers is important. A developer should know whether this is a diff-based clang-tidy or a full clang-tidy, and the scripts should be clear about what they do and how they are named. -- To view, visit http://gerrit.cloudera.org:8080/9751 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2fb6a13400367fd3d12a4738bbb2dfc944466a7 Gerrit-Change-Number: 9751 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 26 Mar 2018 01:38:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5717: Support for reading ORC data files
Hello Tim Armstrong, Joe McDonnell, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9134 to look at the new patch set (#8). Change subject: IMPALA-5717: Support for reading ORC data files .. IMPALA-5717: Support for reading ORC data files This patch integrates the orc library into Impala and implements HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner supplies input needed from the orc-reader, tracks memory consumption of the reader and transfers the reader's output (orc::ColumnVectorBatch) into impala::RowBatch. The ORC version we used is release-1.4.3. Currently, we only support reading primitive types. Writing into ORC table has not been supported neither. Tests - Most of the end-to-end tests can run on ORC format. - Add tpcds, tpch tests for ORC. - Add some ORC specific tests. - Haven't enabled test_scanner_fuzz for ORC yet, since the ORC library is not robust for corrupt files (ORC-315). Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/exec/CMakeLists.txt A be/src/exec/hdfs-orc-scanner.cc A be/src/exec/hdfs-orc-scanner.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M bin/bootstrap_toolchain.py M bin/impala-config.sh A cmake_modules/FindOrc.cmake M common/thrift/CatalogObjects.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/jflex/sql-scanner.flex M testdata/LineItemMultiBlock/README.dox A testdata/LineItemMultiBlock/lineitem_orc_multiblock_one_stripe.orc A testdata/LineItemMultiBlock/lineitem_sixblocks.orc A testdata/LineItemMultiBlock/lineitem_threeblocks.orc M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/bin/run-hive-server.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl A testdata/data/chars-formats.orc M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test M testdata/workloads/functional-query/functional-query_core.csv M testdata/workloads/functional-query/functional-query_dimensions.csv M testdata/workloads/functional-query/functional-query_exhaustive.csv M testdata/workloads/functional-query/functional-query_pairwise.csv A testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/tpcds/tpcds_core.csv M testdata/workloads/tpcds/tpcds_dimensions.csv M testdata/workloads/tpcds/tpcds_exhaustive.csv M testdata/workloads/tpcds/tpcds_pairwise.csv M testdata/workloads/tpch/tpch_core.csv M testdata/workloads/tpch/tpch_dimensions.csv M testdata/workloads/tpch/tpch_exhaustive.csv M testdata/workloads/tpch/tpch_pairwise.csv M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/common/test_vector.py M tests/comparison/cli_options.py M tests/query_test/test_chars.py M tests/query_test/test_decimal_queries.py M tests/query_test/test_scanners.py M tests/query_test/test_tpch_queries.py 55 files changed, 1,637 insertions(+), 162 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/9134/8 -- To view, visit http://gerrit.cloudera.org:8080/9134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 Gerrit-Change-Number: 9134 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong HuangGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7631: Use private index in bootstrap virtualenv
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9798 ) Change subject: IMPALA-7631: Use private index in bootstrap_virtualenv .. Patch Set 1: Passed dry-run here: https://jenkins.impala.io/job/gerrit-verify-dryrun/2172/ -- To view, visit http://gerrit.cloudera.org:8080/9798 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85f75f1f1a305f3043e0910ab88a880eeb30f00b Gerrit-Change-Number: 9798 Gerrit-PatchSet: 1 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 26 Mar 2018 00:25:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7631: Use private index in bootstrap virtualenv
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9798 ) Change subject: IMPALA-7631: Use private index in bootstrap_virtualenv .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/9798 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85f75f1f1a305f3043e0910ab88a880eeb30f00b Gerrit-Change-Number: 9798 Gerrit-PatchSet: 1 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 26 Mar 2018 00:25:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7631: Use private index in bootstrap virtualenv
Lars Volker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9798 Change subject: IMPALA-7631: Use private index in bootstrap_virtualenv .. IMPALA-7631: Use private index in bootstrap_virtualenv This change switches to using a private pypi index url when using a private pypi mirror. This allows to run the tests without relying on the public Python pypi mirrors. Some packages can not detect their dependencies correctly when they get installed together with the dependencies in the same call to pip. This change adds a second stage of package installation to separate these packages from their dependencies. It also adds a few missing packages and updates some packages to newer versions. Testing: Ran this on a box where I blocked DNS resolution to Python's upstream pypi. Change-Id: I85f75f1f1a305f3043e0910ab88a880eeb30f00b --- M infra/python/bootstrap_virtualenv.py M infra/python/deps/compiled-requirements.txt M infra/python/deps/pip_download.py M infra/python/deps/requirements.txt A infra/python/deps/stage2-requirements.txt 5 files changed, 65 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/9798/1 -- To view, visit http://gerrit.cloudera.org:8080/9798 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I85f75f1f1a305f3043e0910ab88a880eeb30f00b Gerrit-Change-Number: 9798 Gerrit-PatchSet: 1 Gerrit-Owner: Lars Volker
[Impala-ASF-CR] [experimental] Clang Tidy Diff trial balloon
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/9751 ) Change subject: [experimental] Clang Tidy Diff trial balloon .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG@22 PS1, Line 22: The main change is to find a way to get the > I don't want to speak with Joe, but there are two benefits here that I see: I see - I always buildall, so I never noticed that myself. As long as the codebase remains tidy-clean, run_clang_tidy will show the diff from upstream, since upstream would produce no tidy violations. That said, I'm now understanding Joe's point that "it might let us push things in the direction we want to go without rewriting". Given how simplistic the Jenkins job is, I'd lean towards #3, #1, #2 for the ordering. -- To view, visit http://gerrit.cloudera.org:8080/9751 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2fb6a13400367fd3d12a4738bbb2dfc944466a7 Gerrit-Change-Number: 9751 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 25 Mar 2018 20:25:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6728: Always use Kudu based kinit if FLAGS use krpc=true
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9797 Change subject: IMPALA-6728: Always use Kudu based kinit if FLAGS_use_krpc=true .. IMPALA-6728: Always use Kudu based kinit if FLAGS_use_krpc=true We rely on the KPRC logic to do the Kerberos authentication when KRPC is enabled. Therefore, when FLAGS_ues_krpc=true, we must always call kudu::security::InitKerberosForServer() to initialize the Kerberos related logic. This change makes Impala ignore FLAGS_use_kudu_kinit=false when FLAGS_use_krpc=true. Change-Id: Ia7086e5c9b460233e9e957f886141b3e6bba414b --- M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/rpc-mgr-kerberized-test.cc M be/src/rpc/rpc-mgr-test-base.h M be/src/rpc/thrift-server-test.cc M be/src/testutil/mini-kdc-wrapper.cc M be/src/testutil/mini-kdc-wrapper.h 7 files changed, 136 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/9797/1 -- To view, visit http://gerrit.cloudera.org:8080/9797 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia7086e5c9b460233e9e957f886141b3e6bba414b Gerrit-Change-Number: 9797 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] [experimental] Clang Tidy Diff trial balloon
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/9751 ) Change subject: [experimental] Clang Tidy Diff trial balloon .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9751/1//COMMIT_MSG@22 PS1, Line 22: The main change is to find a way to get the > I don't think I quite understand what facts about the status quo make it so I don't want to speak with Joe, but there are two benefits here that I see: I ran run_clang_tidy.sh once and it blew away my CMake files, so I couldn't iterate with "make impalad" any more. I ended up having to run buildall to get myself out of a confusing state. This, um, discouraged me from running tidy ever again. Kudu's stuff is also showing us the tidy diff from upstream, rather than all the tidy output. That would let us integrate it with Gerrit and such, and produce a smaller amount of output to sift through for developers. In practice, I've seen plenty of reviews where fixing up the clang-tidy thing is the last thing that happens, because Gerrit-Verify failed. It's usually harmless, but anything we can do to speed up those fixes seems like a nice time-savings. -- To view, visit http://gerrit.cloudera.org:8080/9751 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2fb6a13400367fd3d12a4738bbb2dfc944466a7 Gerrit-Change-Number: 9751 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 25 Mar 2018 16:56:50 + Gerrit-HasComments: Yes