[jira] [Created] (HIVE-18142) Data corruption can cause SerializationUtils.readRemainingLongs() function hang

2017-11-23 Thread Dustin (JIRA)
Dustin created HIVE-18142:
-

 Summary: Data corruption can cause 
SerializationUtils.readRemainingLongs() function hang
 Key: HIVE-18142
 URL: https://issues.apache.org/jira/browse/HIVE-18142
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dustin
 Fix For: 2.1.0, 2.0.1


Similar to the SerializationUtils.readLongBE() function reported in 
[HIVE-13255|https://issues.apache.org/jira/browse/HIVE-13255], when Instream is 
corrupted, the following loop can become infinite, too.


{code:java}
  private void readRemainingLongs(long[] buffer, int offset, InStream input, 
int remainder,
  int numBytes) throws IOException {
final int toRead = remainder * numBytes;
// bulk read to buffer
int bytesRead = input.read(readBuffer, 0, toRead);
while (bytesRead != toRead) {
  bytesRead += input.read(readBuffer, bytesRead, toRead - bytesRead);
}
...
}
{code}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18141) Fix StatsUtils.combineRange to combine intervals

2017-11-23 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-18141:
---

 Summary: Fix StatsUtils.combineRange to combine intervals
 Key: HIVE-18141
 URL: https://issues.apache.org/jira/browse/HIVE-18141
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


the current [combinedRange 
implementation|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L1984]
 in its current form "combines" only ranges which contain eachother

but the comments suggests that the intention was to capture the case when the 2 
intervals are overlap; can be checked with the following testcase:

{code}
  @Test
  public void test11() {
Range r1 = new Range(0, 1);
Range r2 = new Range(1, 11);
Range r3 = StatsUtils.combineRange(r1, r2);
assertNotNull(r3);
  }
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18140) Partitioned tables statistics can go wrong in basic stats mixed case

2017-11-23 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-18140:
---

 Summary: Partitioned tables statistics can go wrong in basic stats 
mixed case
 Key: HIVE-18140
 URL: https://issues.apache.org/jira/browse/HIVE-18140
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich


suppose the following scenario:

* part1 has basic stats {{RC=10,DS=1K}}
* all other partition has no basic stats (and a bunch of rows)

then 
[this|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L378]
 condition would be false; which in turn produces estimations for the whole 
partitioned table: {{RC=10,DS=1K}}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18139) spark may miss results in case column stats are gathered

2017-11-23 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-18139:
---

 Summary: spark may miss results in case column stats are gathered
 Key: HIVE-18139
 URL: https://issues.apache.org/jira/browse/HIVE-18139
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


add {{set hive.stats.column.autogather=true;}} at the beginning of 
{{ql/src/test/queries/clientpositive/auto_sortmerge_join_13.q}}  to repro.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18138) Fix columnstats problem in case schema evolution

2017-11-23 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-18138:
---

 Summary: Fix columnstats problem in case schema evolution
 Key: HIVE-18138
 URL: https://issues.apache.org/jira/browse/HIVE-18138
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


column stats are kept in case the main table schema is altered; and this causes 
all kind of problems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18137) Schema evolution: newly inserted column value in pre-existing partition is masked to null

2017-11-23 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-18137:
---

 Summary: Schema evolution: newly inserted column value in 
pre-existing partition is masked to null
 Key: HIVE-18137
 URL: https://issues.apache.org/jira/browse/HIVE-18137
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich



{code}
set hive.explain.user=false;
set hive.fetch.task.conversion=none;
set hive.mapred.mode=nonstrict;
set hive.cli.print.header=true;
SET hive.exec.schema.evolution=true;
SET hive.vectorized.use.vectorized.input.format=true;
SET hive.vectorized.use.vector.serde.deserialize=false;
SET hive.vectorized.use.row.serde.deserialize=false;
SET hive.vectorized.execution.enabled=false;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.metastore.disallow.incompatible.col.type.changes=true;
set hive.default.fileformat=textfile;
set hive.llap.io.enabled=false;

CREATE TABLE part_add_int_permute_select(insert_num int, a INT, b STRING) 
PARTITIONED BY(part INT);

insert into table part_add_int_permute_select partition(part=1) VALUES (1, 
, 'new');

alter table part_add_int_permute_select add columns(c int);

insert into table part_add_int_permute_select partition(part=1) VALUES (2, 
, 'new', );

select insert_num,part,a,b,c from part_add_int_permute_select;
{code}

results for the last select:
{code}
1  1   new NULL
2  1   new NULL
{code}

I think the following result should be expected:
{code}
1  1   new NULL
2  1   new 
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)