[2/2] incubator-impala git commit: IMPALA-3530: Clean up test_ddl.py. Part 1.

2016-06-10 Thread tarmstrong
IMPALA-3530: Clean up test_ddl.py. Part 1. This is the first in a series of patches to clean up test_ddl.py Summary of changes: - Break up test_create() and corresponding .test files into: * test_create_database() * test_create_table() * test_create_table_like_table() *

incubator-impala git commit: IMPALA-3656: Hitting DCHECK/CHECK does not write minidumps

2016-06-11 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 19ff47091 -> c69cd15a0 IMPALA-3656: Hitting DCHECK/CHECK does not write minidumps When hitting a DCHECK/CHECK the daemons do not write minidumps. This is caused by glog's own stack unwinding mechanism, which catches SIGABRT and

incubator-impala git commit: IMPALA-3682: Don't retry unrecoverable socket creation errors

2016-06-14 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 01287a3ba -> 3dff390e4 IMPALA-3682: Don't retry unrecoverable socket creation errors If a thrift client can't create a socket, all subsequent calls to Open() should fail fast since socket creation errors are treated as

incubator-impala git commit: Use toolchain binutils.

2016-06-14 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master b1af24556 -> d8d3e2391 Use toolchain binutils. This ensures that old assemblers that don't understand AVX2 instructions don't break compilation with some of our in-flight code reviews. Gated on IMPALA-3507. Change-Id:

incubator-impala git commit: IMPALA-3736: Move Impala HTTP handlers to a separate class

2016-06-14 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 3dff390e4 -> b1af24556 IMPALA-3736: Move Impala HTTP handlers to a separate class HTTP handler callbacks take a lot of header room, partly because they usually have their JSON output documented literally. This patch moves

incubator-impala git commit: Add kill cluster marker to KMS

2016-06-13 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master f77284099 -> bee537550 Add kill cluster marker to KMS If PID files of each process in the mini cluster get deleted for some reason, it should still possible to kill them because each process is marked with

incubator-impala git commit: IMPALA-3491: Use unique_database fixture in test_catalog_service_client.py.

2016-06-13 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master bee537550 -> fc444c102 IMPALA-3491: Use unique_database fixture in test_catalog_service_client.py. Even though this is just a single test, this change introduces the unique_database test fixture that was initially created to

incubator-impala git commit: IMPALA-3507: update binutils version to fix slow linking

2016-06-13 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master c69cd15a0 -> f77284099 IMPALA-3507: update binutils version to fix slow linking Change-Id: Idc1206e881d8c781ede1a85eab79d99f5a5adf7e Reviewed-on: http://gerrit.cloudera.org:8080/3353 Reviewed-by: Tim Armstrong

[3/4] incubator-impala git commit: Refactor RuntimeState and ExecEnv dependencies

2016-05-25 Thread tarmstrong
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/6198d926/be/src/runtime/disk-io-mgr-reader-context.cc -- diff --git a/be/src/runtime/disk-io-mgr-reader-context.cc b/be/src/runtime/disk-io-mgr-reader-context.cc index

[4/4] incubator-impala git commit: Refactor RuntimeState and ExecEnv dependencies

2016-05-25 Thread tarmstrong
Refactor RuntimeState and ExecEnv dependencies Previously including runtime-state.h or exec-env.h pulled in a huge number of headers. By replacing all of those includes with forward declarations, we can reduce the number of headers included when building each source file. This required various

incubator-impala git commit: IMPALA-1633: GetOperationStatus should set errorMessage and sqlState

2016-06-01 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 585ee48dc -> 523130108 IMPALA-1633: GetOperationStatus should set errorMessage and sqlState Currently, we never populate the errorMessage or sqlState fields of TGetOperationStatusResp when the GetOperationStatus HiveServer2 rpc

[03/23] incubator-impala git commit: IMPALA-3223: Relocate squeasel and mustache directories

2016-06-01 Thread tarmstrong
IMPALA-3223: Relocate squeasel and mustache directories This change moves the source and header files of squeasel and mustache to be/src/thirdparty. This is a step towards removing thirdparty as a preparation to move to ASF. There is also corresponding change to Impala-lzo to update its include

[02/23] incubator-impala git commit: IMPALA-3223: Relocate squeasel and mustache directories

2016-06-01 Thread tarmstrong
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/0b7ae6e4/be/src/thirdparty/squeasel/squeasel.c -- diff --git a/be/src/thirdparty/squeasel/squeasel.c b/be/src/thirdparty/squeasel/squeasel.c new file mode 100644 index

[11/23] incubator-impala git commit: IMPALA-3577, IMPALA-3486: Partitions on multiple filesystems breaks with S3_SKIP_INSERT_STAGING

2016-06-01 Thread tarmstrong
IMPALA-3577, IMPALA-3486: Partitions on multiple filesystems breaks with S3_SKIP_INSERT_STAGING The HdfsTableSink usualy creates a HDFS connection to the filesystem that the base table resides in. However, if we create a partition in a FS different than that of the base table and set

incubator-impala git commit: download_requirements should download kudu-python and virtualenv

2016-06-13 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master fc444c102 -> ec3a1c786 download_requirements should download kudu-python and virtualenv This is required for the ASF migration, since we don't want to include all of the tarballs in the repo and we want to allow developers to

[2/3] incubator-impala git commit: IMPALA-3441, IMPALA-3659: check for malformed Avro data

2016-06-13 Thread tarmstrong
IMPALA-3441, IMPALA-3659: check for malformed Avro data This patch adds error checking to the Avro scanner (both the codegen'd and interepted paths), including out-of-bounds checks and data validity checks. I ran a local benchmark using the following queries: set num_scanner_threads=1;

[1/3] incubator-impala git commit: IMPALA-3441, IMPALA-3659: check for malformed Avro data

2016-06-13 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master ec3a1c786 -> 01287a3ba http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/01287a3b/testdata/bad_avro_snap/README -- diff --git

[3/3] incubator-impala git commit: IMPALA-3491: Use unique_database fixture in test_shell_commandline.py.

2016-06-13 Thread tarmstrong
IMPALA-3491: Use unique_database fixture in test_shell_commandline.py. Before this change, a single test database was created for the entire suite, and each test was marked to run serially. With the addition of a test fixture in tests/conftest.py to create a unique database per each individual

[2/2] incubator-impala git commit: IMPALA-3753: Disable create table test for old aggs and joins

2016-06-23 Thread tarmstrong
IMPALA-3753: Disable create table test for old aggs and joins The "IMPALA-3530: Clean up test_ddl.py. Part 1." (7ed744e) commit made it so that a query referencing nested types is run as part of the create table test, which caused the OldAggsJoins build to fail. This commit disables the create

incubator-impala git commit: IMPALA-3762: Download Python requirements before they are needed.

2016-06-22 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 893e6f498 -> a5ae2bfd8 IMPALA-3762: Download Python requirements before they are needed. This is needed for ASF builds. It sounds expensive, but takes less than 10 seconds if the packages are already present. Change-Id:

incubator-impala git commit: IMPALA-3745: parquet invalid data handling

2016-06-15 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 2dad444c8 -> 547be27e7 IMPALA-3745: parquet invalid data handling Added checks/error handling: * Negative string lengths while decoding dictionary or data page. * Buffer overruns while decoding dictionary or data page. * Some

[1/2] incubator-impala git commit: IMPALA-3587: Get rid of not_default_fs skip marker

2016-06-16 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 547be27e7 -> 2100b3a34 IMPALA-3587: Get rid of not_default_fs skip marker The SkipIf.not_default_fs used in test_compute_stats(), was meant to skip non-HDFS filesystems. However, since we've changed our test infrastructure to

[2/2] incubator-impala git commit: IMPALA-3751: fix clang build errors and warnings

2016-06-16 Thread tarmstrong
IMPALA-3751: fix clang build errors and warnings Fix misc clang errors and warnings. Change-Id: Ie71a483789d3be06248036ab3bbee82d66580973 Reviewed-on: http://gerrit.cloudera.org:8080/3391 Reviewed-by: Tim Armstrong Tested-by: Internal Jenkins Project:

[09/50] incubator-impala git commit: Consolidate test and cluster logs under a single directory.

2016-04-12 Thread tarmstrong
Consolidate test and cluster logs under a single directory. All logs, test results and SQL files generated during data loading and testing are now consolidated under a single new directory $IMPALA_HOME/logs. The goal is to simplify archiving in Jenkins runs and debugging. The new structure is as

[05/50] incubator-impala git commit: IMPALA-3226: Increase timeout for runtime filter tests

2016-04-12 Thread tarmstrong
IMPALA-3226: Increase timeout for runtime filter tests When running with ASAN enabled, runtime filters may take a lot longer to be produced, triggering timeouts in the filter tests. This patch triples the timeout time. We still want the timeout to be reasonable as protection against excessive

[42/50] incubator-impala git commit: IMPALA-3269: Remove authz checks on default table location in CTAS queries

2016-04-12 Thread tarmstrong
IMPALA-3269: Remove authz checks on default table location in CTAS queries Bug: In CreateTableAsSelectStmt.analyze(), we set the default location of table if the query doesn't explicitly set a table location. However this is an issue with CTAS with subqueries as they follow a two pass analysis

[40/50] incubator-impala git commit: IMPALA-3324: Hive server does not start for S3 builds.

2016-04-12 Thread tarmstrong
IMPALA-3324: Hive server does not start for S3 builds. The hive server does not start for S3 builds because HDFS is marked as an unsupported service in testdata/cluster/admin; and so HDFS is not started at all, and so the Hive server is unable to start as well. Due to this, all our S3 builds

[27/50] incubator-impala git commit: Log EE XUnit test result in IMPALA_HOME/logs/ee_tests/results.

2016-04-12 Thread tarmstrong
Log EE XUnit test result in IMPALA_HOME/logs/ee_tests/results. I had missed this in my original logs consolidation patch. This change is needed for Jenkins to pick up the EE test results for reporting purposes. Change-Id: I58e6a4a6392223de87ea2ce50a36dd35cafa5b86 Reviewed-on:

[46/50] incubator-impala git commit: Fix typo in load-test-warehouse-snapshot.sh

2016-04-12 Thread tarmstrong
Fix typo in load-test-warehouse-snapshot.sh Change-Id: I2ef9b32cbc56819f80db864a6590a9a7b2732c9c Reviewed-on: http://gerrit.cloudera.org:8080/2310 Reviewed-by: Lars Volker Tested-by: Internal Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo

[26/50] incubator-impala git commit: Regenerate complextypestbl files to include nested_struct.g field

2016-04-12 Thread tarmstrong
Regenerate complextypestbl files to include nested_struct.g field This field was included in the schema and data files, but the checked-in generated parquet files didn't include it. It's not referenced in any tests so we didn't catch it. Change-Id: I5d394f074e7082fa12fafb7e57a144a83b3099a6

[32/50] incubator-impala git commit: IMPALA-3274: Always start Kudu for testing

2016-04-12 Thread tarmstrong
IMPALA-3274: Always start Kudu for testing Previously Kudu would only be started when the test configuration was the standard mini-cluster. That led to failures during data loading when testing without the mini-cluster (ex: local file system). Kudu doesn't require any other services so now it'll

[16/17] incubator-impala git commit: IMPALA-3439: Only convert decimal literals in convertNumericLiteralsFromDecimal().

2016-05-23 Thread tarmstrong
IMPALA-3439: Only convert decimal literals in convertNumericLiteralsFromDecimal(). Numeric literals with a decimal point are typed as DECIMAL, if possible. Since decimals have a higher processing cost than FLOAT/DOUBLE we have special casting rules to convert DECIMAL operations to DOUBLE in

[13/17] incubator-impala git commit: IMPALA-3579: Strict handling of numeric overflow in text parsing

2016-05-23 Thread tarmstrong
IMPALA-3579: Strict handling of numeric overflow in text parsing Adds a query option 'strict_mode' which treats integer and floating pt overflows as parse errors. In the past, overflows were ignored and the max value was returned. When this query option is set, overflowing values are treated as

[02/17] incubator-impala git commit: IMPALA-3332: Free local allocations in sorter.

2016-05-23 Thread tarmstrong
IMPALA-3332: Free local allocations in sorter. Sorter can have runaway memory consumption as it never frees local allocations made in comparator_.Less(). In addition, it doesn't check for errors generated during expression evaluation so it may keep sorting even after failures have occurred. This

[1/2] incubator-impala git commit: Revert "Revert "Add Kudu test helpers""

2016-05-24 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master d70ffa455 -> 5112e65be Revert "Revert "Add Kudu test helpers"" This reverts commit f8dd5413b65d30646c3745dfc738ed812d50a51f and effectively re-adds commit 9248dcb70478b8f93f022893776a0960f45fdc28. The difference between this

[2/2] incubator-impala git commit: Bump Impala version to 2.7.0

2016-05-24 Thread tarmstrong
Bump Impala version to 2.7.0 Change-Id: Ibf67f61936260e66a5bb0d7fe63e4930850252c6 Reviewed-on: http://gerrit.cloudera.org:8080/3167 Reviewed-by: Bharath Vissapragada Tested-by: Internal Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit:

[4/4] incubator-impala git commit: IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-17 Thread tarmstrong
IMPALA-3286: prefetching for PartitionedAggregationNode This patch builds on top of the prefetching infrastructure to add prefetching to PartitionedAggregationNode. Input batches are evaluated in prefetch groups and hash table buckets are prefetched if the prefetch_mode query option is set to

[03/10] incubator-impala git commit: IMPALA-3480: Add query options for min/max filter sizes

2016-05-13 Thread tarmstrong
IMPALA-3480: Add query options for min/max filter sizes This patch adds two query options for runtime filters: RUNTIME_FILTER_MAX_SIZE RUNTIME_FILTER_MIN_SIZE These options define the minimum and maximum filter sizes for a filter, no matter what the estimates produced by the planner are.

[10/10] incubator-impala git commit: IMPALA-3232: Allow not-exists uncorrelated subqueries

2016-05-13 Thread tarmstrong
IMPALA-3232: Allow not-exists uncorrelated subqueries Before this patch, correlated exists and not exists subqueries were rewritten as as left semi and anti joins respectively. Uncorrelated exists subqueries were rewritten as cross joins, and uncorrelated not-exists subqueries were not supported

[09/10] incubator-impala git commit: IMPALA-3534: allow overriding of CMAKE_CXX_COMPILER for ASAN

2016-05-13 Thread tarmstrong
IMPALA-3534: allow overriding of CMAKE_CXX_COMPILER for ASAN This makes it consistent with the regular toolchain and makes it easier to use wrapper scripts like distcc. Change-Id: I3ab488182c46f9ccb1850a0a2b064653e7e3da26 Reviewed-on: http://gerrit.cloudera.org:8080/3050 Reviewed-by: Jim Apple

[01/10] incubator-impala git commit: IMPALA-3527: use codegen'd ProcessProbeBatch() when spilling.

2016-05-13 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 14cdb0497 -> 46c3e43ed IMPALA-3527: use codegen'd ProcessProbeBatch() when spilling. Change-Id: I92ebfb01e370d0a842270771c9e5f1a4610dc16a Reviewed-on: http://gerrit.cloudera.org:8080/3035 Reviewed-by: Tim Armstrong

[4/4] incubator-impala git commit: IMPALA-2809: Improve scalar ByteSwap().

2016-05-13 Thread tarmstrong
IMPALA-2809: Improve scalar ByteSwap(). This patch improves our ByteSwap() function by handling more byte sizes in the fast path, as opposed to the loop-based slow path. ByteSwap() is used heavily in when scanning Parquet decimals. Before this patch, VTune showed ByteSwap() among the top three

[3/5] incubator-impala git commit: IMPALA-3539: Return error status if def/rep level caches failed to allocate.

2016-05-14 Thread tarmstrong
IMPALA-3539: Return error status if def/rep level caches failed to allocate. The information in the JIRA is consistent with a failure to allocate memory for the def level cache. There was a bug where this failure status was not properly propagated, so eventually a DCHECK was hit that expected the

[1/5] incubator-impala git commit: Renamed conjunct_ordering.test to primitive_conjunct_ordering.test in targeted-perf

2016-05-14 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master ff0cd823c -> 4c9c74dd3 Renamed conjunct_ordering.test to primitive_conjunct_ordering.test in targeted-perf This is needed because the workload runner required a prefix of query names to run. Change-Id:

[21/50] [abbrv] incubator-impala git commit: MT: Planner for multi-threaded execution

2016-05-12 Thread tarmstrong
MT: Planner for multi-threaded execution New classes: - ParallelPlanner: creates build plans, assigns plans to cohorts - JoinBuildSink: DataSink for plan fragments that materialize build sides - ids for plans, hash tables, plan fragments Tests: this adds a new test file section PARALLELPLANS and

[49/50] [abbrv] incubator-impala git commit: IMPALA-3490: Add flag to reduce minidump size

2016-05-12 Thread tarmstrong
IMPALA-3490: Add flag to reduce minidump size IMPALA-2686 added the breakpad library to all impala daemons, thus enabling them to write minidump files. This change introduces a flag 'minidump_size_limit_hint_kb', which causes breakpad to reduce the amount of thread stack memory it includes in a

[14/50] [abbrv] incubator-impala git commit: IMPALA-3397: Source query files from shell.

2016-05-12 Thread tarmstrong
IMPALA-3397: Source query files from shell. This patch allows you to write SOURCE or SRC , and have the shell read the file and execute all the queries in it. Change-Id: Ib05df3e755cd12e9e9562de6b353857940eace03 Reviewed-on: http://gerrit.cloudera.org:8080/2663 Reviewed-by: Henry Robinson

[01/50] [abbrv] incubator-impala git commit: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems

2016-05-12 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master f915d59aa -> 14cdb0497 http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ed7f5ebf/tests/metadata/test_ddl.py -- diff --git

[22/50] [abbrv] incubator-impala git commit: Fix Kudu hole punch check to work if /tmp is on different fs

2016-05-12 Thread tarmstrong
Fix Kudu hole punch check to work if /tmp is on different fs /tmp isn't necessarily on the same filesystem as the Kudu data directory. Fix the check so that it checks the actual Kudu directory. Change-Id: Ic6aa27569a0650db7dcf5759952cd50c8e47f8c9 Reviewed-on: http://gerrit.cloudera.org:8080/2967

[38/50] [abbrv] incubator-impala git commit: IMPALA-3491: Use unique_database fixture in test_recover_partitions.py.

2016-05-12 Thread tarmstrong
IMPALA-3491: Use unique_database fixture in test_recover_partitions.py. Testing: I ran the test 10 times in a loop locally and ran a private core/hdfs run. Change-Id: I5be5fa5d20bc6ed5b7830e0ce90201431d6aa008 Reviewed-on: http://gerrit.cloudera.org:8080/3003 Reviewed-by: Alex Behm

[03/50] [abbrv] incubator-impala git commit: Use unique_database fixture in test_compute_stats.py.

2016-05-12 Thread tarmstrong
Use unique_database fixture in test_compute_stats.py. This patch makes it a little easier to use the unique_database fixture with .test files. The RESULTS section can now contain $DATABASE which is replaced with the current database by the test framework. Testing: - ran the test locally on

[40/50] [abbrv] incubator-impala git commit: IMPALA-2660: Respect auth_to_local configs from hdfs configs

2016-05-12 Thread tarmstrong
IMPALA-2660: Respect auth_to_local configs from hdfs configs This patch implements a new feature to read the auth_to_local configs from hdfs configuration files, using the parameter hadoop.security.auth_to_local. This is done by modifying the User#getShortName() method to use its hdfs equivalent.

[39/50] [abbrv] incubator-impala git commit: IMPALA-3507: Use toolchain linker only if using gold

2016-05-12 Thread tarmstrong
IMPALA-3507: Use toolchain linker only if using gold This is a workaround for extremely slow linking when not using gold. Change-Id: I822a78642993e95abc279944f454fdf67dd8e1d5 Reviewed-on: http://gerrit.cloudera.org:8080/3014 Reviewed-by: Jim Apple Reviewed-by: Tim

[06/50] [abbrv] incubator-impala git commit: IMPALA-3462: Fix exec option text for old HJ w/ runtime filters

2016-05-12 Thread tarmstrong
IMPALA-3462: Fix exec option text for old HJ w/ runtime filters Change-Id: I737e261ce251b05dd89bce939ad5df8d95d39b61 Reviewed-on: http://gerrit.cloudera.org:8080/2933 Reviewed-by: Henry Robinson Reviewed-by: Dan Hecht Tested-by: Internal Jenkins

[15/50] [abbrv] incubator-impala git commit: Reuse session for executing queries (Hive on Spark)

2016-05-12 Thread tarmstrong
Reuse session for executing queries (Hive on Spark) Change-Id: I06c798dc311d63eb0a875450fd26d06db4e84a03 Reviewed-on: http://gerrit.cloudera.org:8080/2374 Reviewed-by: Taras Bobrovytsky Tested-by: Internal Jenkins Project:

[16/50] [abbrv] incubator-impala git commit: IMPALA-3460: test_grant_revoke: remove S3-specific workload

2016-05-12 Thread tarmstrong
IMPALA-3460: test_grant_revoke: remove S3-specific workload Now that we functionally support writes to S3 via Impala, test_grant_revoke should not have a special case for S3, which till this patch did the test without INSERTs. Change-Id: Id981e7f83bf86b32d1a5b267ad3781db02337e86 Reviewed-on:

[19/50] [abbrv] incubator-impala git commit: MT: Planner for multi-threaded execution

2016-05-12 Thread tarmstrong
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3b7d5b7c/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test -- diff --git a/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test

[08/50] [abbrv] incubator-impala git commit: Enable BOOST_NO_EXCEPTIONS for codegened code

2016-05-12 Thread tarmstrong
Enable BOOST_NO_EXCEPTIONS for codegened code BOOST_NO_EXCEPTIONS lets us provide an handler for errors instead of having boost throw exceptions. This lets us crash the process in a slightly nicer way and also greatly reduces the number of static exception objects littering the cross-compiled IR

[07/50] [abbrv] incubator-impala git commit: IMPALA-3286: Software prefetching for hash table build.

2016-05-12 Thread tarmstrong
IMPALA-3286: Software prefetching for hash table build. This change pipelines the code which builds the hash table. This is based on the idea which Mostafa presented earlier. Essentially, the pipelined code will first evaluate all the rows to be inserted, compute their hash values and prefetch

[05/50] [abbrv] incubator-impala git commit: IMPALA-2198: Differentiate queries in exceptional states in web UI

2016-05-12 Thread tarmstrong
IMPALA-2198: Differentiate queries in exceptional states in web UI In order to make the query life-cycle clearer to users, added a new section to the /queries webui page for queries that are 'waiting', not actively running either due to an error or to returning all of their results, but that have

[02/50] [abbrv] incubator-impala git commit: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems

2016-05-12 Thread tarmstrong
IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems Previously Impala disallowed LOAD DATA and INSERT on S3. This patch functionally enables LOAD DATA and INSERT on S3 without making major changes for the sake of improving performance over S3. This patch also enables both

[28/50] [abbrv] incubator-impala git commit: IMPALA-3488: test_ddl.py failure on LocalFS run

2016-05-12 Thread tarmstrong
IMPALA-3488: test_ddl.py failure on LocalFS run Our test_ddl.py always had a bug where in the _cleanup() function, we used the hdfs_client on local FS runs. It always ended up passing because we caught generic exceptions in hdfs_client.delete_file_dir() while checking if a file existed which

[1/3] incubator-impala git commit: IMPALA-3864: qgen: reduce likelihood of create_query() exceptions

2016-07-22 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master e2a70388f -> e0fb432b8 IMPALA-3864: qgen: reduce likelihood of create_query() exceptions 1. Fix a bug in which the computation to produce the string for an exception was raising a TypeError. We fix the bug by changing how the

incubator-impala git commit: Adjust ASF push script names to match our repo setup.

2016-07-21 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master b94f88a69 -> e2a70388f Adjust ASF push script names to match our repo setup. Our gerrit repo is not Impala, but ImpalaASF: https://gerrit.cloudera.org/#/q/status:open+project:ImpalaASF Some Impala devlopers already use

incubator-impala git commit: IMPALA-1240: add back spilling sort now that sorter is not flaky

2016-07-28 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 2c9b4a9ba -> 68e9eed81 IMPALA-1240: add back spilling sort now that sorter is not flaky This test coverage was removed because of flakiness. Instead of mem_limit, use max_block_mgr_memory, which is more deterministic since it

incubator-impala git commit: IMPALA-3914: SKIP_TOOLCHAIN_BOOTSTRAP skips Python package downloads

2016-07-27 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master f60b2beb8 -> a7963e6b0 IMPALA-3914: SKIP_TOOLCHAIN_BOOTSTRAP skips Python package downloads SKIP_TOOLCHAIN_BOOTSTRAP is meant to control download of third-party components to speed up builds and allow builds to be less tied to

[2/4] incubator-impala git commit: Enable TPC-H workload for Kudu tables

2016-07-27 Thread tarmstrong
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/6fbd35fa/testdata/workloads/tpch/queries/tpch-kudu-q16.test -- diff --git a/testdata/workloads/tpch/queries/tpch-kudu-q16.test

[1/4] incubator-impala git commit: Enable TPC-H workload for Kudu tables

2016-07-27 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master a7963e6b0 -> c1d70f814 http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/6fbd35fa/testdata/workloads/tpch/queries/tpch-kudu-q17.test -- diff --git

[3/4] incubator-impala git commit: Enable TPC-H workload for Kudu tables

2016-07-27 Thread tarmstrong
Enable TPC-H workload for Kudu tables With this commit we enable loading of TPC-H data in Kudu tables and running the 22 TPC-H queries against Kudu. Since Kudu doesn't support the decimal data type, we had to modify the queries by using round() function and update the test results. Change-Id:

[4/4] incubator-impala git commit: IMPALA-3227: generate test TPC data sets during data load

2016-07-27 Thread tarmstrong
IMPALA-3227: generate test TPC data sets during data load The generated data is identical to the pregenerated tpch.tar.gz and tpcds.tar.gz data that was used previously and were not publically accessible. This adds a "preload" hook to bin/load-data.py that can execute custom logic for each data

[2/2] incubator-impala git commit: IMPALA-3969: stress test: add option to set common query options

2016-08-12 Thread tarmstrong
IMPALA-3969: stress test: add option to set common query options It can be useful for debugging purposes to run the stress test with custom query options, for example with codegen disabled. This patch adds a command line option to the stress test entry point that allows a caller to set query

incubator-impala git commit: IMPALA-3946: fix MemPool integrity issues with empty chunks

2016-08-10 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master ac4f22b1b -> 88b89b872 IMPALA-3946: fix MemPool integrity issues with empty chunks There were various rare code paths that results in the MemPool failing its own internal integrity checks. This required various small fixes to

incubator-impala git commit: IMPALA-3886: Improve log of pip_download.py

2016-07-21 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 65806e200 -> b94f88a69 IMPALA-3886: Improve log of pip_download.py pip_download.py prints the following line for each dependency that is already up-to-date: File with matching md5sum already exists, skipping download. This

incubator-impala git commit: IMPALA-3729: batch_size=1 coverage for avro scanner

2016-07-20 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 3fbd9c338 -> bc8c55afc IMPALA-3729: batch_size=1 coverage for avro scanner Also fix a stale comment in the avro scanner header. The main work here is to fix the handling of empty result sets in the test result verifier. This is

[3/8] incubator-impala git commit: IMPALA-3253: Modify gen_build_version.sh to always output the right version

2016-07-05 Thread tarmstrong
IMPALA-3253: Modify gen_build_version.sh to always output the right version gen_build_version.sh previously had a --noclean option which did not overwrite the version information if it was already populated. Since --noclean was the default option, it always never updated the version information.

[1/8] incubator-impala git commit: IMPALA-3800: Python 2.6 support for collect_minidumps.py

2016-07-05 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 3dccb0125 -> cd2ee9ecf IMPALA-3800: Python 2.6 support for collect_minidumps.py Python 2.6 is the default python version shipped with CentOS 6.6 and the minidump collection script needs to run there, too. However, the tarfile

[5/8] incubator-impala git commit: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

2016-07-05 Thread tarmstrong
IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads Currently we don't reset the file read offset if ZCR fails. Due to this, when we switch to the normal read path, we hit the eosr of the scan-range even before reading the expected data length. If both the ReadFromCache() and

incubator-impala git commit: IMPALA-2767: Web UI call to force expire sessions

2016-07-05 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 0dde1c2f8 -> d60b70769 IMPALA-2767: Web UI call to force expire sessions This change adds a "Close session" button in the sessions Web UI which destroys the session with the client when clicked. Change-Id:

incubator-impala git commit: IMPALA-3628: Fix cancellation from shell when security is enabled

2016-07-05 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master cd2ee9ecf -> 0dde1c2f8 IMPALA-3628: Fix cancellation from shell when security is enabled To cancel a query, the shell will create a separate connection inside it's SIGINT handler, and send the cancellation RPC. However this

[04/11] incubator-impala git commit: Add JQuery 1.12.4, upgrade Bootstrap to 3.3.6

2016-07-06 Thread tarmstrong
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/36b4f88b/www/bootstrap/fonts/glyphicons-halflings-regular.svg -- diff --git a/www/bootstrap/fonts/glyphicons-halflings-regular.svg

[06/11] incubator-impala git commit: Add JQuery 1.12.4, upgrade Bootstrap to 3.3.6

2016-07-06 Thread tarmstrong
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/36b4f88b/www/bootstrap/css/bootstrap.min.css -- diff --git a/www/bootstrap/css/bootstrap.min.css b/www/bootstrap/css/bootstrap.min.css index 679272d..4cf729e 100644 ---

[02/11] incubator-impala git commit: Add JQuery 1.12.4, upgrade Bootstrap to 3.3.6

2016-07-06 Thread tarmstrong
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/36b4f88b/www/bootstrap/js/bootstrap.min.js -- diff --git a/www/bootstrap/js/bootstrap.min.js b/www/bootstrap/js/bootstrap.min.js index b04a0e8..e79c065 100644 ---

[05/11] incubator-impala git commit: Add JQuery 1.12.4, upgrade Bootstrap to 3.3.6

2016-07-06 Thread tarmstrong
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/36b4f88b/www/bootstrap/css/bootstrap.min.css.map -- diff --git a/www/bootstrap/css/bootstrap.min.css.map b/www/bootstrap/css/bootstrap.min.css.map new file mode 100644

[11/11] incubator-impala git commit: Add JQuery 1.12.4, upgrade Bootstrap to 3.3.6

2016-07-06 Thread tarmstrong
Add JQuery 1.12.4, upgrade Bootstrap to 3.3.6 Change-Id: Ia6427a4ce7cb606a03cf7e4e5c4250a845fc2145 Reviewed-on: http://gerrit.cloudera.org:8080/3322 Reviewed-by: Henry Robinson Tested-by: Henry Robinson Project:

[08/11] incubator-impala git commit: Add JQuery 1.12.4, upgrade Bootstrap to 3.3.6

2016-07-06 Thread tarmstrong
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/36b4f88b/www/bootstrap/css/bootstrap.css -- diff --git a/www/bootstrap/css/bootstrap.css b/www/bootstrap/css/bootstrap.css index 7f36651..42c79d6 100644 ---

incubator-impala git commit: IMPALA-3774: fix download_requirements for older Python versions

2016-07-06 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 36b4f88bd -> a07021775 IMPALA-3774: fix download_requirements for older Python versions Pip always runs the setup.py file in downloaded tarballs to get metadata. Impyla's setup.py does not work in some older python installations

incubator-impala git commit: IMPALA-3799: Make MAX_SCAN_RANGE_LENGTH accept formatted quantities

2016-07-07 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master a07021775 -> f407288d7 IMPALA-3799: Make MAX_SCAN_RANGE_LENGTH accept formatted quantities This patch changes MAX_SCAN_RANGE_LENGTH to accept formatted quantities like 4MB. Change-Id: I2703f7ddaa74c4256a3d4a545012332dfbf5fed8

incubator-impala git commit: IMPALA-1619: Support 64-bit allocations.

2016-07-08 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 667a778af -> ed5ec6772 IMPALA-1619: Support 64-bit allocations. This change extends MemPool, FreePool and StringBuffer to support 64-bit allocations, fixes a bug in decompressor and extends various places in the code to support

[1/2] incubator-impala git commit: IMPALA-3727: Change microbenchmarks to use percentile-based reporting

2016-07-08 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master ed5ec6772 -> 476f687b4 IMPALA-3727: Change microbenchmarks to use percentile-based reporting This doesn't make each run more robust, but by running the benchmark 60 times and reporting the 10th, 50th, and 90th percentile, it

incubator-impala git commit: Remove some code in like-predicate-ir.cc from cross-compilation

2016-07-07 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master c1da1409b -> c7b7c3ece Remove some code in like-predicate-ir.cc from cross-compilation like-predicate-ir.cc contains a lot of code which won't be called and inlined by other IR functions. (e.g. the prepare functions). To reduce

[2/2] incubator-impala git commit: Move all benchmarks to benchmark/ folder

2016-07-08 Thread tarmstrong
Move all benchmarks to benchmark/ folder This is just a cleanup patch. The immediate motivation is to exclude them from code coverage reports. Change-Id: I16d706a4f3f9f1c75f3047fca570d9fc86a46dc9 Reviewed-on: http://gerrit.cloudera.org:8080/3589 Reviewed-by: Michael Ho

incubator-impala git commit: IMPALA-3611: track unused Disk IO buffer memory

2016-08-05 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 45d059855 -> 17bf14417 IMPALA-3611: track unused Disk IO buffer memory Track I/O buffers against separate MemTrackers. This gives us better visibility into memory consumption from the debug webpage and from MemTracker

incubator-impala git commit: IMPALA-2878: Fix Base64Decode error and remove duplicate codes.

2016-08-05 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master 17bf14417 -> a46b73191 IMPALA-2878: Fix Base64Decode error and remove duplicate codes. Original impala::Base64Decode() method wouldn't return original string if there were trailing '\0'. For example, string "a\0" would be

[2/3] incubator-impala git commit: IMPALA-3499: Split catalog update

2016-06-20 Thread tarmstrong
IMPALA-3499: Split catalog update JNI does not support writing java byte array larger than 2GB. Instead of passing a single serialized update to frontend, this patch splits the update into a vector of updates less than 500MB each. Then they are serialized, sent to frontend, deserialized and

[2/3] incubator-impala git commit: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not supported.

2017-02-02 Thread tarmstrong
IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not supported. This test is failing on distros that don't support Kudu, but it shouldn't even be run. Tested by setting KUDU_IS_SUPPORTED to false, and then trying to run the test, confirming that it gets skipped. When the env var

[1/3] incubator-impala git commit: IMPALA-4829: Change default Kudu read behavior for "RYW"

2017-02-02 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master e3566ac04 -> 6251d8b4d IMPALA-4829: Change default Kudu read behavior for "RYW" Currently the default Kudu read mode is set to "READ_LATEST", which essentially provides no guarantees on reading except that any read issued will

[3/3] incubator-impala git commit: IMPALA-3909: Populate min/max statistics in Parquet writer

2017-02-02 Thread tarmstrong
IMPALA-3909: Populate min/max statistics in Parquet writer Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Reviewed-on: http://gerrit.cloudera.org:8080/5611 Reviewed-by: Lars Volker Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong

incubator-impala git commit: IMPALA-4617: remove IsConstant() analysis from be

2017-01-31 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master a4206d3f1 -> a4eb4705c IMPALA-4617: remove IsConstant() analysis from be This change avoids the need to duplicate the logic in Expr.getConstant() in the frontend and Expr::GetConstant() in the backend. Instead it is plumbed

[1/3] incubator-impala git commit: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order.

2017-02-01 Thread tarmstrong
Repository: incubator-impala Updated Branches: refs/heads/master a4eb4705c -> e3566ac04 [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order. Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28 Reviewed-on: http://gerrit.cloudera.org:8080/5835 Reviewed-by: Lars Volker

[3/3] incubator-impala git commit: IMPALA-4808: old hash join can reference invalid memory

2017-02-01 Thread tarmstrong
IMPALA-4808: old hash join can reference invalid memory The bug was that 'probe_rows_exist' could be true even if there was no current probe row. The node can get into this state if it takes the branch at line 390. I tried to reproduce the crash but was unable to after a few attempts.

  1   2   3   4   5   6   7   >