[jira] [Comment Edited] (ORC-539) Exception in double to timestamp schema evolution

2019-09-12 Thread Laszlo Bodor (Jira)


[ 
https://issues.apache.org/jira/browse/ORC-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928308#comment-16928308
 ] 

Laszlo Bodor edited comment on ORC-539 at 9/12/19 10:57 AM:


seems like that internal branch needs some changes, and then this exception is 
gone, but I found another issue, which is about floats, and it produces the 
same result at apache/master with applying 
https://issues.apache.org/jira/secure/attachment/12980169/ORC-539.repro.patch

{code}
org.junit.ComparisonFailure: row 0 expected:<1960-01-27 12:3[4:56.1] 
Australia/Sydney> but was:<1960-01-27 12:3[5:12.0] Australia/Sydney>
at org.junit.Assert.assertEquals(Assert.java:115)
at 
org.apache.orc.impl.TestSchemaEvolution.testEvolutionToTimestamp(TestSchemaEvolution.java:2224)
...
{code}

filed ORC-554 about this as it's another issue


was (Author: abstractdog):
seems like that internal branch needs some changes, and then this exception is 
gone, but I found another issue, which is about floats, and it produces the 
same result at apache/master with applying 
https://issues.apache.org/jira/secure/attachment/12980169/ORC-539.repro.patch

{code}
org.junit.ComparisonFailure: row 0 expected:<1960-01-27 12:3[4:56.1] 
Australia/Sydney> but was:<1960-01-27 12:3[5:12.0] Australia/Sydney>
at org.junit.Assert.assertEquals(Assert.java:115)
at 
org.apache.orc.impl.TestSchemaEvolution.testEvolutionToTimestamp(TestSchemaEvolution.java:2224)
...
{code}

> Exception in double to timestamp schema evolution
> -
>
> Key: ORC-539
> URL: https://issues.apache.org/jira/browse/ORC-539
> Project: ORC
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: ORC-539.repro.patch
>
>
> I backported ORC-189 to my own branch and run tests in Hive. I am getting the 
> following exception in a test related to schema evolution from double to 
> timestamp after applying ORC-189:
> {noformat}
> Caused by: java.io.IOException: Error reading file: 
> file:/Users/jcamachorodriguez/src/workspaces/hive/itests/qtest/target/localfs/warehouse/part_change_various_various_timestamp_n6/part=1/00_0
> at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1289)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:87)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:103)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:252)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:227)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361)
> ... 23 more
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream 
> for column 7 kind DATA position: 15 length: 15 range: 0 offset: 122 limit: 
> 122 range 0 = 0 to 15 uncompressed: 12 to 12
> at 
> org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:125)
> at 
> org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:108)
> at 
> org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:104)
> at 
> org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:783)
> at 
> org.apache.orc.impl.ConvertTreeReaderFactory$TimestampFromDoubleTreeReader.nextVector(ConvertTreeReaderFactory.java:1883)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2012)
> at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1282)
> ... 28 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (ORC-539) Exception in double to timestamp schema evolution

2019-09-11 Thread Laszlo Bodor (Jira)


[ 
https://issues.apache.org/jira/browse/ORC-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927546#comment-16927546
 ] 

Laszlo Bodor edited comment on ORC-539 at 9/11/19 3:00 PM:
---

small repro without partitions and with a single column

{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean, 
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
decimal(38,18), float1 float, double1 double, string1 string, string2 string, 
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data_n41;

CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);

insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM 
schema_evolution_data_n41;

alter table part_change_various_various_timestamp_n6 replace columns (c6 
TIMESTAMP);

select c6 from part_change_various_various_timestamp_n6;
{code}

the problem is that on the internal branch ORC-531 cannot be found, which is 
responsible for handling float / double types in the convert tree reader:
https://github.com/apache/orc/blame/master/java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java#L1397-L1399
so it probably tries to read float as it was double, hence the error

with this check the issue disappears (however I got result mismatch, still 
checking), but I think TestSchemaEvolution#testEvolutionToTimestamp still needs 
to be improved for testing float evolution, because it can reproduce the error 
even without an "external" hive test
https://github.com/apache/orc/commit/a7255f3669146e7697215e75720c74ca831b374c#diff-a6311862d24b863a3d394b89ed9d0495R2158-R2159


was (Author: abstractdog):
small repro without partitions and with a single column

{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean, 
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
decimal(38,18), float1 float, double1 double, string1 string, string2 string, 
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data_n41;

CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);

insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM 
schema_evolution_data_n41;

alter table part_change_various_various_timestamp_n6 replace columns (c6 
TIMESTAMP);

select c6 from part_change_various_various_timestamp_n6;
{code}

the problem is that on the internal branch ORC-531 cannot be found, which is 
responsible for handling float / double types in the convert tree reader:
https://github.com/apache/orc/blame/master/java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java#L828-L830
so it probably tries to read float as it was double, hence the error

with this check the issue disappears (however I got result mismatch, still 
checking), but I think TestSchemaEvolution#testEvolutionToTimestamp still needs 
to be improved for testing float evolution, because it can reproduce the error 
even without an "external" hive test
https://github.com/apache/orc/commit/a7255f3669146e7697215e75720c74ca831b374c#diff-a6311862d24b863a3d394b89ed9d0495R2158-R2159

> Exception in double to timestamp schema evolution
> -
>
> Key: ORC-539
> URL: https://issues.apache.org/jira/browse/ORC-539
> Project: ORC
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laszlo Bodor
>Priority: Major
>
> I backported ORC-189 to my own branch and run tests in Hive. I am getting the 
> following exception in a test related to schema evolution from double to 
> timestamp after applying ORC-189:
> {noformat}
> Caused by: java.io.IOException: Error reading file: 
> file:/Users/jcamachorodriguez/src/workspaces/hive/itests/qtest/target/localfs/warehouse/part_change_various_various_timestamp_n6/part=1/00_0
> at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1289)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:87)
> at 
> org.apache.hadoop.hive.ql.io.orc.Recor

[jira] [Comment Edited] (ORC-539) Exception in double to timestamp schema evolution

2019-09-11 Thread Laszlo Bodor (Jira)


[ 
https://issues.apache.org/jira/browse/ORC-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927546#comment-16927546
 ] 

Laszlo Bodor edited comment on ORC-539 at 9/11/19 2:52 PM:
---

small repro without partitions and with a single column

{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean, 
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
decimal(38,18), float1 float, double1 double, string1 string, string2 string, 
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data_n41;

CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);

insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM 
schema_evolution_data_n41;

alter table part_change_various_various_timestamp_n6 replace columns (c6 
TIMESTAMP);

select c6 from part_change_various_various_timestamp_n6;
{code}

the problem is that on the internal branch ORC-531 cannot be found, which is 
responsible for handling float / double types in the convert tree reader:
https://github.com/apache/orc/blame/master/java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java#L828-L830
so it probably tries to read float as it was double, hence the error

with this check the issue disappears (however I got result mismatch, still 
checking), but I think TestSchemaEvolution#testEvolutionToTimestamp still needs 
to be improved for testing float evolution, because it can reproduce the error 
even without an "external" hive test
https://github.com/apache/orc/commit/a7255f3669146e7697215e75720c74ca831b374c#diff-a6311862d24b863a3d394b89ed9d0495R2158-R2159


was (Author: abstractdog):
small repro without partitions and with a single column

{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean, 
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
decimal(38,18), float1 float, double1 double, string1 string, string2 string, 
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data_n41;

CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);

insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM 
schema_evolution_data_n41;

alter table part_change_various_various_timestamp_n6 replace columns (c6 
TIMESTAMP);

select c6 from part_change_various_various_timestamp_n6;
{code}

the problem is that on the internal branch ORC-531 cannot be found, which is 
responsible for handling float / double types in the convert tree reader:
https://github.com/apache/orc/blame/master/java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java#L828-L830
so it probably tries to read float as it was double, hence the error

with this check the issue disappears (however I got result mismatch, still 
checking), but I think TestSchemaEvolution#testEvolutionToTimestamp still needs 
to be improved for testing float evolution, because it can reproduce the error 
even without an "external" hive test


> Exception in double to timestamp schema evolution
> -
>
> Key: ORC-539
> URL: https://issues.apache.org/jira/browse/ORC-539
> Project: ORC
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laszlo Bodor
>Priority: Major
>
> I backported ORC-189 to my own branch and run tests in Hive. I am getting the 
> following exception in a test related to schema evolution from double to 
> timestamp after applying ORC-189:
> {noformat}
> Caused by: java.io.IOException: Error reading file: 
> file:/Users/jcamachorodriguez/src/workspaces/hive/itests/qtest/target/localfs/warehouse/part_change_various_various_timestamp_n6/part=1/00_0
> at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1289)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:87)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:103)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.n

[jira] [Comment Edited] (ORC-539) Exception in double to timestamp schema evolution

2019-09-11 Thread Laszlo Bodor (Jira)


[ 
https://issues.apache.org/jira/browse/ORC-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927546#comment-16927546
 ] 

Laszlo Bodor edited comment on ORC-539 at 9/11/19 2:51 PM:
---

small repro without partitions and with a single column

{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean, 
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
decimal(38,18), float1 float, double1 double, string1 string, string2 string, 
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data_n41;

CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);

insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM 
schema_evolution_data_n41;

alter table part_change_various_various_timestamp_n6 replace columns (c6 
TIMESTAMP);

select c6 from part_change_various_various_timestamp_n6;
{code}

the problem is that on the internal branch ORC-531 cannot be found, which is 
responsible for handling float / double types in the convert tree reader:
https://github.com/apache/orc/blame/master/java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java#L828-L830
so it probably tries to read float as it was double, hence the error

with this check the issue disappears (however I got result mismatch, still 
checking), but I think TestSchemaEvolution#testEvolutionToTimestamp still needs 
to be improved for testing float evolution, because it can reproduce the error 
even without an "external" hive test



was (Author: abstractdog):
small repro without partitions and with a single column

{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean, 
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
decimal(38,18), float1 float, double1 double, string1 string, string2 string, 
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data_n41;

CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);

insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM 
schema_evolution_data_n41;

alter table part_change_various_various_timestamp_n6 replace columns (c6 
TIMESTAMP);

select c6 from part_change_various_various_timestamp_n6;
{code}

the problem is that on the internal branch ORC-531 cannot be found, which is 
responsible for handling float / double types in the convert tree reader:
https://github.com/apache/orc/blame/master/java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java#L828-L830
so it probably tries to read float as it was double, hence the error

with this check the issue disappears (however I got result mismatch, still 
checking), but I think TestSchemaEvolution#testEvolutionToTimestamp still needs 
to be improved for testing float evolution


> Exception in double to timestamp schema evolution
> -
>
> Key: ORC-539
> URL: https://issues.apache.org/jira/browse/ORC-539
> Project: ORC
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laszlo Bodor
>Priority: Major
>
> I backported ORC-189 to my own branch and run tests in Hive. I am getting the 
> following exception in a test related to schema evolution from double to 
> timestamp after applying ORC-189:
> {noformat}
> Caused by: java.io.IOException: Error reading file: 
> file:/Users/jcamachorodriguez/src/workspaces/hive/itests/qtest/target/localfs/warehouse/part_change_various_various_timestamp_n6/part=1/00_0
> at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1289)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:87)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:103)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:252)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:227)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAware

[jira] [Comment Edited] (ORC-539) Exception in double to timestamp schema evolution

2019-09-11 Thread Laszlo Bodor (Jira)


[ 
https://issues.apache.org/jira/browse/ORC-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927546#comment-16927546
 ] 

Laszlo Bodor edited comment on ORC-539 at 9/11/19 2:49 PM:
---

small repro without partitions and with a single column

{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean, 
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
decimal(38,18), float1 float, double1 double, string1 string, string2 string, 
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data_n41;

CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);

insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM 
schema_evolution_data_n41;

alter table part_change_various_various_timestamp_n6 replace columns (c6 
TIMESTAMP);

select c6 from part_change_various_various_timestamp_n6;
{code}

the problem is that on the internal branch ORC-531 cannot be found, which is 
responsible for handling float / double types in the convert tree reader:
https://github.com/apache/orc/blame/master/java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java#L828-L830
so it probably tries to read float as it was double, hence the error

with this check the issue disappears (however I got result mismatch, still 
checking), but I think TestSchemaEvolution#testEvolutionToTimestamp still needs 
to be improved for testing float evolution



was (Author: abstractdog):
it fails for float and double too, simple repro which can be used with 
double/float source:
(1 column, no partitions)

{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean, 
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
decimal(38,18), float1 float, double1 double, string1 string, string2 string, 
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data_n41;

CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);

insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM 
schema_evolution_data_n41;

alter table part_change_various_various_timestamp_n6 replace columns (c6 
TIMESTAMP);

select c6 from part_change_various_various_timestamp_n6;
{code}

> Exception in double to timestamp schema evolution
> -
>
> Key: ORC-539
> URL: https://issues.apache.org/jira/browse/ORC-539
> Project: ORC
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laszlo Bodor
>Priority: Major
>
> I backported ORC-189 to my own branch and run tests in Hive. I am getting the 
> following exception in a test related to schema evolution from double to 
> timestamp after applying ORC-189:
> {noformat}
> Caused by: java.io.IOException: Error reading file: 
> file:/Users/jcamachorodriguez/src/workspaces/hive/itests/qtest/target/localfs/warehouse/part_change_various_various_timestamp_n6/part=1/00_0
> at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1289)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:87)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:103)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:252)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:227)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361)
> ... 23 more
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream 
> for column 7 kind DATA position: 15 length: 15 range: 0 offset: 122 limit: 
> 122 range 0 = 0 to 15 uncompressed: 12 to 12
> at 
> org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:125)
> at 
> org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:108)
> at 
> org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtil

[jira] [Comment Edited] (ORC-539) Exception in double to timestamp schema evolution

2019-09-11 Thread Laszlo Bodor (Jira)


[ 
https://issues.apache.org/jira/browse/ORC-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927546#comment-16927546
 ] 

Laszlo Bodor edited comment on ORC-539 at 9/11/19 1:15 PM:
---

it fails for float and double too, simple repro which can be used with 
double/float source:
(1 column, no partitions)

{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean, 
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
decimal(38,18), float1 float, double1 double, string1 string, string2 string, 
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data_n41;

CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);

insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM 
schema_evolution_data_n41;

alter table part_change_various_various_timestamp_n6 replace columns (c6 
TIMESTAMP);

select c6 from part_change_various_various_timestamp_n6;
{code}


was (Author: abstractdog):
it fails for float and double too, simple repro which can be used with 
double/float source:

{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean, 
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
decimal(38,18), float1 float, double1 double, string1 string, string2 string, 
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data_n41;

CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);

insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM 
schema_evolution_data_n41;

alter table part_change_various_various_timestamp_n6 replace columns (c6 
TIMESTAMP);

select c6 from part_change_various_various_timestamp_n6;
{code}

> Exception in double to timestamp schema evolution
> -
>
> Key: ORC-539
> URL: https://issues.apache.org/jira/browse/ORC-539
> Project: ORC
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laszlo Bodor
>Priority: Major
>
> I backported ORC-189 to my own branch and run tests in Hive. I am getting the 
> following exception in a test related to schema evolution from double to 
> timestamp after applying ORC-189:
> {noformat}
> Caused by: java.io.IOException: Error reading file: 
> file:/Users/jcamachorodriguez/src/workspaces/hive/itests/qtest/target/localfs/warehouse/part_change_various_various_timestamp_n6/part=1/00_0
> at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1289)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:87)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:103)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:252)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:227)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361)
> ... 23 more
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream 
> for column 7 kind DATA position: 15 length: 15 range: 0 offset: 122 limit: 
> 122 range 0 = 0 to 15 uncompressed: 12 to 12
> at 
> org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:125)
> at 
> org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:108)
> at 
> org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:104)
> at 
> org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:783)
> at 
> org.apache.orc.impl.ConvertTreeReaderFactory$TimestampFromDoubleTreeReader.nextVector(ConvertTreeReaderFactory.java:1883)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2012)
> at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1282)
> ... 28 more
> {noformat}



--
This me