Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/23626
to look at the new patch set (#9).
Change subject: IMPALA-9715: Load testdata with Impala
......................................................................
IMPALA-9715: Load testdata with Impala
Switches most data loading to Impala. Adds LOAD_HIVE for a few
operations that still don't work in Impala. Adds casts to a few types
where Impala failed the query due to loss of precision.
Moves uploading data to storage to load-data.py via
copy_workload_data_to_hdfs. Then uses Impala's LOAD DATA INPATH to
populate the tables. LOAD DATA INPATH moves the table files into place,
so multiple loads that used the same source file now load into one table
then others select from that table.
Moves most table data under testdata/data to simplify uploads.
functional-query data now live in testdata/data and testdata/target (for
generated data).
Removes passing scale_factor around in generate-schema-statements.py to
simplify function signatures as it always uses the CLI input.
This improves '-devdata', as most data is now loaded with Impala and we
can theoretically do a dev dataload without Hive. Reduces devdata load
by ~15s.
Change-Id: I43d681a89d49fde9562ea67fd250fad2edd308ae
---
M bin/create_testdata.sh
M bin/load-data.py
M bin/rat_exclude_files.txt
M testdata/bin/create-hbase.sh
M testdata/bin/generate-schema-statements.py
M testdata/common/text_delims_table.py
M testdata/common/widetable.py
R testdata/data/AllTypesError/0901.txt
R testdata/data/AllTypesError/0902.txt
R testdata/data/AllTypesError/0903.txt
R testdata/data/AllTypesErrorNoNulls/0901.txt
R testdata/data/AllTypesErrorNoNulls/0902.txt
R testdata/data/AllTypesErrorNoNulls/0903.txt
R testdata/data/ComplexTypesTbl/README
R testdata/data/ComplexTypesTbl/arrays.orc
R testdata/data/ComplexTypesTbl/arrays.parq
R testdata/data/ComplexTypesTbl/arrays_big.parq
R testdata/data/ComplexTypesTbl/nonnullable.avsc
R testdata/data/ComplexTypesTbl/nonnullable.json
R testdata/data/ComplexTypesTbl/nonnullable.orc
R testdata/data/ComplexTypesTbl/nonnullable.parq
R testdata/data/ComplexTypesTbl/nullable.avsc
R testdata/data/ComplexTypesTbl/nullable.json
R testdata/data/ComplexTypesTbl/nullable.orc
R testdata/data/ComplexTypesTbl/nullable.parq
R testdata/data/ComplexTypesTbl/structs.orc
R testdata/data/ComplexTypesTbl/structs.parq
R testdata/data/ComplexTypesTbl/structs_nested.orc
R testdata/data/ComplexTypesTbl/structs_nested.parq
R testdata/data/CustomerMultiBlock/README
R testdata/data/CustomerMultiBlock/customer_multiblock.parquet
R testdata/data/DimTbl/data.csv
R testdata/data/ImpalaDemoDataset/DEC_00_SF3_P077_with_ann_noheader.csv
R testdata/data/JoinTbl/data.csv
R testdata/data/LikeTbl/data.csv
R testdata/data/NullRows/data.csv
R testdata/data/NullTable/data.csv
R testdata/data/TblWithRaggedColumns/data.csv
R testdata/data/TinyIntTable/data.csv
R testdata/data/TinyTable/data.csv
R testdata/data/avro_null_char/000000_0
R testdata/data/bad_avro_snap/README
R testdata/data/bad_avro_snap/hive2_pre_gregorian_date.avro
R testdata/data/bad_avro_snap/hive3_pre_gregorian_date.avro
R testdata/data/bad_avro_snap/invalid_decimal_schema.avro
R testdata/data/bad_avro_snap/invalid_union.avro
R testdata/data/bad_avro_snap/negative_string_len.avro
R testdata/data/bad_avro_snap/out_of_range_date.avro
R testdata/data/bad_avro_snap/truncated_float.avro
R testdata/data/bad_avro_snap/truncated_string.avro
R testdata/data/bad_parquet_data/README
R testdata/data/bad_parquet_data/dict-encoded-negative-len.parq
R testdata/data/bad_parquet_data/dict-encoded-out-of-bounds.parq
R testdata/data/bad_parquet_data/illegal_decimals.parq
R testdata/data/bad_parquet_data/plain-encoded-negative-len.parq
R testdata/data/bad_parquet_data/plain-encoded-out-of-bounds.parq
R testdata/data/bad_seq_snap/bad_file
R testdata/data/bad_text_gzip/file_not_finished.gz
R testdata/data/empty_parquet_page_source_impala10186/data.csv
R testdata/data/hive_benchmark/grepTiny/part-00000
R testdata/data/hive_benchmark/htmlTiny/Rankings.dat
R testdata/data/hive_benchmark/htmlTiny/UserVisits.dat
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/tpcds/tpcds_schema_template.sql
M testdata/datasets/tpch/tpch_schema_template.sql
65 files changed, 304 insertions(+), 291 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/23626/9
--
To view, visit http://gerrit.cloudera.org:8080/23626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I43d681a89d49fde9562ea67fd250fad2edd308ae
Gerrit-Change-Number: 23626
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>