[jira] [Commented] (HIVE-6131) New columns after table alter result in null values despite data
[ https://issues.apache.org/jira/browse/HIVE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139468#comment-16139468 ] Hive QA commented on HIVE-6131: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12637927/HIVE-6131.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 11000 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_partition_change_col] (batchId=24) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_cascade] (batchId=84) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_data_after_schema_update] (batchId=17) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat11] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat12] (batchId=79) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat13] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat14] (batchId=73) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_complex] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_primitive] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_complex] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_primitive] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_complex] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_primitive] (batchId=158) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6504/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6504/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6504/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12637927 - PreCommit-HIVE-Build > New columns after table alter result in null values despite data > > > Key: HIVE-6131 > URL: https://issues.apache.org/jira/browse/HIVE-6131 > Project: Hive > Issue Type: Bug >Affects Versions: 0.11.0, 0.12.0, 0.13.0, 1.2.1 >Reporter: James Vaughan >Priority: Critical > Attachments: HIVE-6131.1.patch > > > Hi folks, > I found and verified a bug on our CDH 4.0.3 install of Hive when adding > columns to tables with Partitions using 'REPLACE COLUMNS'. I dug through the > Jira a little bit and didn't see anything for it so hopefully this isn't just > noise on the radar. > Basically, when you alter a table with partitions and then reupload data to > that partition, it doesn't seem to recognize the extra data that actually > exists in HDFS- as in, returns NULL values on the new column despite having > the data and recognizing the new column in the metadata. > Here's some steps to reproduce using a basic table: > 1. Run this hive command: CREATE TABLE jvaughan_test (col1 string) > partitioned by (day string); > 2. Create a simple file on the system with a couple of entries, something > like "hi" and "hi2" separated by newlines. > 3. Run this hive command, pointing it at the file: LOAD DATA LOCAL INPATH > '' OVERWRITE INTO TABLE jvaughan_test PARTITION (day = '2014-01-02'); > 4. Confirm the data with: SELECT * FROM jvaughan_test WHERE day = > '2014-01-02'; > 5. Alter the column definitions: ALTER TABLE jvaughan_test REPLACE COLUMNS
[jira] [Commented] (HIVE-6131) New columns after table alter result in null values despite data
[ https://issues.apache.org/jira/browse/HIVE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732234#comment-15732234 ] Hive QA commented on HIVE-6131: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12637927/HIVE-6131.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 10723 tests executed *Failed tests:* {noformat} TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=143) [vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q] TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=148) [auto_sortmerge_join_13.q,union_top_level.q,vector_left_outer_join2.q,schema_evol_text_vecrow_part_all_primitive.q,constprog_semijoin.q,update_where_partitioned.q,drop_partition_with_stats.q,smb_mapjoin_14.q,skiphf_aggr.q,vectorized_ptf.q,auto_join_filters.q,join0.q,insert_orig_table.q,mergejoin.q,join_filters.q,orc_split_elimination.q,subquery_in.q,vector_outer_join0.q,schema_evol_text_vec_part_all_primitive.q,vector_complex_all.q,auto_sortmerge_join_4.q,bucket_many.q,vectorization_15.q,union3.q,vectorization_nested_udf.q,windowing_windowspec2.q,auto_smb_mapjoin_14.q,vector_mr_diff_schema_alias.q,vector_join_filters.q,reduce_deduplicate_extended.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_partition_change_col] (batchId=23) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_cascade] (batchId=79) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_data_after_schema_update] (batchId=17) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat11] (batchId=6) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat12] (batchId=75) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat13] (batchId=58) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat14] (batchId=69) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_complex] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_primitive] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_complex] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=150) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=92) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2487/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2487/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2487/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12637927 - PreCommit-HIVE-Build > New columns after table alter result in null values despite data > > >
[jira] [Commented] (HIVE-6131) New columns after table alter result in null values despite data
[ https://issues.apache.org/jira/browse/HIVE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731797#comment-15731797 ] Tianwan Zhao commented on HIVE-6131: Hive provides new DDL to handle such situation in later versions after HIVE-3833 is applied in 0.11.0. See: [Language Manual CASCADE|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Add/ReplaceColumns] ??The CASCADE|RESTRICT clause is available in Hive 0.15.0. ALTER TABLE ADD|REPLACE COLUMNS with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column changes only to table metadata.?? In this case you can change your step 5 to 5. Alter the column definitions: ALTER TABLE jvaughan_test REPLACE COLUMNS (col1 string, col2 string) *CASCADE*; > New columns after table alter result in null values despite data > > > Key: HIVE-6131 > URL: https://issues.apache.org/jira/browse/HIVE-6131 > Project: Hive > Issue Type: Bug >Affects Versions: 0.11.0, 0.12.0, 0.13.0, 1.2.1 >Reporter: James Vaughan >Priority: Critical > Attachments: HIVE-6131.1.patch > > > Hi folks, > I found and verified a bug on our CDH 4.0.3 install of Hive when adding > columns to tables with Partitions using 'REPLACE COLUMNS'. I dug through the > Jira a little bit and didn't see anything for it so hopefully this isn't just > noise on the radar. > Basically, when you alter a table with partitions and then reupload data to > that partition, it doesn't seem to recognize the extra data that actually > exists in HDFS- as in, returns NULL values on the new column despite having > the data and recognizing the new column in the metadata. > Here's some steps to reproduce using a basic table: > 1. Run this hive command: CREATE TABLE jvaughan_test (col1 string) > partitioned by (day string); > 2. Create a simple file on the system with a couple of entries, something > like "hi" and "hi2" separated by newlines. > 3. Run this hive command, pointing it at the file: LOAD DATA LOCAL INPATH > '' OVERWRITE INTO TABLE jvaughan_test PARTITION (day = '2014-01-02'); > 4. Confirm the data with: SELECT * FROM jvaughan_test WHERE day = > '2014-01-02'; > 5. Alter the column definitions: ALTER TABLE jvaughan_test REPLACE COLUMNS > (col1 string, col2 string); > 6. Edit your file and add a second column using the default separator > (ctrl+v, then ctrl+a in Vim) and add two more entries, such as "hi3" on the > first row and "hi4" on the second > 7. Run step 3 again > 8. Check the data again like in step 4 > For me, this is the results that get returned: > hive> select * from jvaughan_test where day = '2014-01-01'; > OK > hiNULL2014-01-02 > hi2 NULL2014-01-02 > This is despite the fact that there is data in the file stored by the > partition in HDFS. > Let me know if you need any other information. The only workaround for me > currently is to drop partitions for any I'm replacing data in and THEN > reupload the new data file. > Thanks, > -James -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6131) New columns after table alter result in null values despite data
[ https://issues.apache.org/jira/browse/HIVE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289556#comment-15289556 ] Hive QA commented on HIVE-6131: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12637927/HIVE-6131.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 69 failed/errored test(s), 9899 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestMiniLlapCliDriver - did not produce a TEST-*.xml file TestMiniTezCliDriver-auto_sortmerge_join_7.q-orc_merge9.q-tez_union_dynamic_partition.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-join1.q-mapjoin_decimal.q-union5.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-load_dyn_part2.q-selectDistinctStar.q-vector_decimal_5.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-mapjoin_mapjoin.q-insert_into1.q-vector_decimal_2.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_distinct_2.q-tez_joins_explain.q-cte_mat_1.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_interval_2.q-schema_evol_text_nonvec_mapwork_part_all_primitive.q-tez_fsstat.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vectorized_parquet.q-insert_values_non_partitioned.q-schema_evol_orc_nonvec_mapwork_part.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join4.q-groupby_cube1.q-auto_join20.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-parallel_join1.q-escape_distributeby1.q-auto_sortmerge_join_7.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_change_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_cascade org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_data_after_schema_update org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_nonvec_mapwork_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_nonvec_mapwork_part_all_complex org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_nonvec_mapwork_part_all_primitive org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_vec_mapwork_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_vec_mapwork_part_all_complex org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_vec_mapwork_part_all_primitive org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_vecrow_mapwork_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_vecrow_mapwork_part_all_complex org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_vecrow_mapwork_part_all_primitive org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_nonvec_mapwork_part org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_nonvec_mapwork_part_all_complex org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_vec_mapwork_part org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_vec_mapwork_part_all_complex org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_vec_mapwork_part_all_primitive org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_vecrow_mapwork_part org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_vecrow_mapwork_part_all_complex org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_vecrow_mapwork_part_all_primitive org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_many org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_reordering_values org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_simple_select org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_udf_udaf
[jira] [Commented] (HIVE-6131) New columns after table alter result in null values despite data
[ https://issues.apache.org/jira/browse/HIVE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285949#comment-15285949 ] niklaus xiao commented on HIVE-6131: This is not a bug, you can use `alter table t1 replace columns (c1 string, c2 string) cascade`, see https://issues.apache.org/jira/browse/HIVE-8839 > New columns after table alter result in null values despite data > > > Key: HIVE-6131 > URL: https://issues.apache.org/jira/browse/HIVE-6131 > Project: Hive > Issue Type: Bug >Affects Versions: 0.11.0, 0.12.0, 0.13.0, 1.2.1 >Reporter: James Vaughan >Priority: Critical > Attachments: HIVE-6131.1.patch > > > Hi folks, > I found and verified a bug on our CDH 4.0.3 install of Hive when adding > columns to tables with Partitions using 'REPLACE COLUMNS'. I dug through the > Jira a little bit and didn't see anything for it so hopefully this isn't just > noise on the radar. > Basically, when you alter a table with partitions and then reupload data to > that partition, it doesn't seem to recognize the extra data that actually > exists in HDFS- as in, returns NULL values on the new column despite having > the data and recognizing the new column in the metadata. > Here's some steps to reproduce using a basic table: > 1. Run this hive command: CREATE TABLE jvaughan_test (col1 string) > partitioned by (day string); > 2. Create a simple file on the system with a couple of entries, something > like "hi" and "hi2" separated by newlines. > 3. Run this hive command, pointing it at the file: LOAD DATA LOCAL INPATH > '' OVERWRITE INTO TABLE jvaughan_test PARTITION (day = '2014-01-02'); > 4. Confirm the data with: SELECT * FROM jvaughan_test WHERE day = > '2014-01-02'; > 5. Alter the column definitions: ALTER TABLE jvaughan_test REPLACE COLUMNS > (col1 string, col2 string); > 6. Edit your file and add a second column using the default separator > (ctrl+v, then ctrl+a in Vim) and add two more entries, such as "hi3" on the > first row and "hi4" on the second > 7. Run step 3 again > 8. Check the data again like in step 4 > For me, this is the results that get returned: > hive> select * from jvaughan_test where day = '2014-01-01'; > OK > hiNULL2014-01-02 > hi2 NULL2014-01-02 > This is despite the fact that there is data in the file stored by the > partition in HDFS. > Let me know if you need any other information. The only workaround for me > currently is to drop partitions for any I'm replacing data in and THEN > reupload the new data file. > Thanks, > -James -- This message was sent by Atlassian JIRA (v6.3.4#6332)