[jira] [Created] (HIVE-18142) Data corruption can cause SerializationUtils.readRemainingLongs() function hang
Dustin created HIVE-18142: - Summary: Data corruption can cause SerializationUtils.readRemainingLongs() function hang Key: HIVE-18142 URL: https://issues.apache.org/jira/browse/HIVE-18142 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Dustin Fix For: 2.1.0, 2.0.1 Similar to the SerializationUtils.readLongBE() function reported in [HIVE-13255|https://issues.apache.org/jira/browse/HIVE-13255], when Instream is corrupted, the following loop can become infinite, too. {code:java} private void readRemainingLongs(long[] buffer, int offset, InStream input, int remainder, int numBytes) throws IOException { final int toRead = remainder * numBytes; // bulk read to buffer int bytesRead = input.read(readBuffer, 0, toRead); while (bytesRead != toRead) { bytesRead += input.read(readBuffer, bytesRead, toRead - bytesRead); } ... } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18141) Fix StatsUtils.combineRange to combine intervals
Zoltan Haindrich created HIVE-18141: --- Summary: Fix StatsUtils.combineRange to combine intervals Key: HIVE-18141 URL: https://issues.apache.org/jira/browse/HIVE-18141 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich the current [combinedRange implementation|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L1984] in its current form "combines" only ranges which contain eachother but the comments suggests that the intention was to capture the case when the 2 intervals are overlap; can be checked with the following testcase: {code} @Test public void test11() { Range r1 = new Range(0, 1); Range r2 = new Range(1, 11); Range r3 = StatsUtils.combineRange(r1, r2); assertNotNull(r3); } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18140) Partitioned tables statistics can go wrong in basic stats mixed case
Zoltan Haindrich created HIVE-18140: --- Summary: Partitioned tables statistics can go wrong in basic stats mixed case Key: HIVE-18140 URL: https://issues.apache.org/jira/browse/HIVE-18140 Project: Hive Issue Type: Sub-task Reporter: Zoltan Haindrich suppose the following scenario: * part1 has basic stats {{RC=10,DS=1K}} * all other partition has no basic stats (and a bunch of rows) then [this|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L378] condition would be false; which in turn produces estimations for the whole partitioned table: {{RC=10,DS=1K}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18139) spark may miss results in case column stats are gathered
Zoltan Haindrich created HIVE-18139: --- Summary: spark may miss results in case column stats are gathered Key: HIVE-18139 URL: https://issues.apache.org/jira/browse/HIVE-18139 Project: Hive Issue Type: Bug Reporter: Zoltan Haindrich add {{set hive.stats.column.autogather=true;}} at the beginning of {{ql/src/test/queries/clientpositive/auto_sortmerge_join_13.q}} to repro. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18138) Fix columnstats problem in case schema evolution
Zoltan Haindrich created HIVE-18138: --- Summary: Fix columnstats problem in case schema evolution Key: HIVE-18138 URL: https://issues.apache.org/jira/browse/HIVE-18138 Project: Hive Issue Type: Sub-task Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich column stats are kept in case the main table schema is altered; and this causes all kind of problems. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18137) Schema evolution: newly inserted column value in pre-existing partition is masked to null
Zoltan Haindrich created HIVE-18137: --- Summary: Schema evolution: newly inserted column value in pre-existing partition is masked to null Key: HIVE-18137 URL: https://issues.apache.org/jira/browse/HIVE-18137 Project: Hive Issue Type: Bug Reporter: Zoltan Haindrich {code} set hive.explain.user=false; set hive.fetch.task.conversion=none; set hive.mapred.mode=nonstrict; set hive.cli.print.header=true; SET hive.exec.schema.evolution=true; SET hive.vectorized.use.vectorized.input.format=true; SET hive.vectorized.use.vector.serde.deserialize=false; SET hive.vectorized.use.row.serde.deserialize=false; SET hive.vectorized.execution.enabled=false; set hive.exec.dynamic.partition.mode=nonstrict; set hive.metastore.disallow.incompatible.col.type.changes=true; set hive.default.fileformat=textfile; set hive.llap.io.enabled=false; CREATE TABLE part_add_int_permute_select(insert_num int, a INT, b STRING) PARTITIONED BY(part INT); insert into table part_add_int_permute_select partition(part=1) VALUES (1, , 'new'); alter table part_add_int_permute_select add columns(c int); insert into table part_add_int_permute_select partition(part=1) VALUES (2, , 'new', ); select insert_num,part,a,b,c from part_add_int_permute_select; {code} results for the last select: {code} 1 1 new NULL 2 1 new NULL {code} I think the following result should be expected: {code} 1 1 new NULL 2 1 new {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)