[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception
[ https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563720#comment-17563720 ] Seonguk Kim edited comment on HIVE-24066 at 7/14/22 6:01 AM: - I am facing the same problem. I hope the null check for `context.os` works. (null check for struct column that not exists in file) was (Author: JIRAUSER292443): null check support for `context.os` would be useful. (null check for struct column that not exists in file) > Hive query on parquet data should identify if column is not present in file > schema and show NULL value instead of Exception > --- > > Key: HIVE-24066 > URL: https://issues.apache.org/jira/browse/HIVE-24066 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.5, 3.1.2 >Reporter: Jainik Vora >Priority: Major > Attachments: day_01.snappy.parquet > > > I created a hive table containing columns with struct data type > > {code:java} > CREATE EXTERNAL TABLE test_dwh.sample_parquet_table ( > `context` struct< > `app`: struct< > `build`: string, > `name`: string, > `namespace`: string, > `version`: string > >, > `device`: struct< > `adtrackingenabled`: boolean, > `advertisingid`: string, > `id`: string, > `manufacturer`: string, > `model`: string, > `type`: string > >, > `locale`: string, > `library`: struct< > `name`: string, > `version`: string > >, > `os`: struct< > `name`: string, > `version`: string > >, > `screen`: struct< > `height`: bigint, > `width`: bigint > >, > `network`: struct< > `carrier`: string, > `cellular`: boolean, > `wifi`: boolean > >, > `timezone`: string, > `userAgent`: string > > > ) PARTITIONED BY (day string) > STORED as PARQUET > LOCATION 's3://xyz/events'{code} > > All columns are nullable hence the parquet files read by the table don't > always contain all columns. If any file in a partition doesn't have > "context.os" struct and if "context.os.name" is queried, Hive throws an > exception as below. Same for "context.screen" as well. > > {code:java} > 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 > main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with > exception java.io.IOException:java.lang.RuntimeException: Primitive type > osshould not doesn't match typeos[name] > 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 > main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with > exception java.io.IOException:java.lang.RuntimeException: Primitive type > osshould not doesn't match typeos[name]java.io.IOException: > java.lang.RuntimeException: Primitive type osshould not doesn't match > typeos[name] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) > at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't > match typeos[name] > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330) > > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322) > > at > org.apache.hadoop.
[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception
[ https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563720#comment-17563720 ] Seonguk Kim edited comment on HIVE-24066 at 7/8/22 6:25 AM: null check support for `context.os` would be useful. (null check for struct column that not exists in file) was (Author: JIRAUSER292443): It would be useful if null check for context.os works. (null check for struct column that not exists in file) > Hive query on parquet data should identify if column is not present in file > schema and show NULL value instead of Exception > --- > > Key: HIVE-24066 > URL: https://issues.apache.org/jira/browse/HIVE-24066 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.5, 3.1.2 >Reporter: Jainik Vora >Priority: Major > Attachments: day_01.snappy.parquet > > > I created a hive table containing columns with struct data type > > {code:java} > CREATE EXTERNAL TABLE test_dwh.sample_parquet_table ( > `context` struct< > `app`: struct< > `build`: string, > `name`: string, > `namespace`: string, > `version`: string > >, > `device`: struct< > `adtrackingenabled`: boolean, > `advertisingid`: string, > `id`: string, > `manufacturer`: string, > `model`: string, > `type`: string > >, > `locale`: string, > `library`: struct< > `name`: string, > `version`: string > >, > `os`: struct< > `name`: string, > `version`: string > >, > `screen`: struct< > `height`: bigint, > `width`: bigint > >, > `network`: struct< > `carrier`: string, > `cellular`: boolean, > `wifi`: boolean > >, > `timezone`: string, > `userAgent`: string > > > ) PARTITIONED BY (day string) > STORED as PARQUET > LOCATION 's3://xyz/events'{code} > > All columns are nullable hence the parquet files read by the table don't > always contain all columns. If any file in a partition doesn't have > "context.os" struct and if "context.os.name" is queried, Hive throws an > exception as below. Same for "context.screen" as well. > > {code:java} > 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 > main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with > exception java.io.IOException:java.lang.RuntimeException: Primitive type > osshould not doesn't match typeos[name] > 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 > main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with > exception java.io.IOException:java.lang.RuntimeException: Primitive type > osshould not doesn't match typeos[name]java.io.IOException: > java.lang.RuntimeException: Primitive type osshould not doesn't match > typeos[name] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) > at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't > match typeos[name] > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330) > > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322) > > at > org.apache.hadoop.hive.ql.io.parquet.read
[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception
[ https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563720#comment-17563720 ] Seonguk Kim edited comment on HIVE-24066 at 7/7/22 11:52 AM: - It would be useful if null check for context.os works. (null check for struct column that not exists in file) was (Author: JIRAUSER292443): It seems null check for context.os (struct column that not exists in file) should work. > Hive query on parquet data should identify if column is not present in file > schema and show NULL value instead of Exception > --- > > Key: HIVE-24066 > URL: https://issues.apache.org/jira/browse/HIVE-24066 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.5, 3.1.2 >Reporter: Jainik Vora >Priority: Major > Attachments: day_01.snappy.parquet > > > I created a hive table containing columns with struct data type > > {code:java} > CREATE EXTERNAL TABLE test_dwh.sample_parquet_table ( > `context` struct< > `app`: struct< > `build`: string, > `name`: string, > `namespace`: string, > `version`: string > >, > `device`: struct< > `adtrackingenabled`: boolean, > `advertisingid`: string, > `id`: string, > `manufacturer`: string, > `model`: string, > `type`: string > >, > `locale`: string, > `library`: struct< > `name`: string, > `version`: string > >, > `os`: struct< > `name`: string, > `version`: string > >, > `screen`: struct< > `height`: bigint, > `width`: bigint > >, > `network`: struct< > `carrier`: string, > `cellular`: boolean, > `wifi`: boolean > >, > `timezone`: string, > `userAgent`: string > > > ) PARTITIONED BY (day string) > STORED as PARQUET > LOCATION 's3://xyz/events'{code} > > All columns are nullable hence the parquet files read by the table don't > always contain all columns. If any file in a partition doesn't have > "context.os" struct and if "context.os.name" is queried, Hive throws an > exception as below. Same for "context.screen" as well. > > {code:java} > 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 > main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with > exception java.io.IOException:java.lang.RuntimeException: Primitive type > osshould not doesn't match typeos[name] > 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 > main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with > exception java.io.IOException:java.lang.RuntimeException: Primitive type > osshould not doesn't match typeos[name]java.io.IOException: > java.lang.RuntimeException: Primitive type osshould not doesn't match > typeos[name] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) > at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't > match typeos[name] > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330) > > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322) > > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadS
[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception
[ https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563720#comment-17563720 ] Seonguk Kim edited comment on HIVE-24066 at 7/7/22 11:33 AM: - It seems null check for context.os (struct column that not exists in file) should work. was (Author: JIRAUSER292443): It seems null check for context.os (column that not exists in file) should work. > Hive query on parquet data should identify if column is not present in file > schema and show NULL value instead of Exception > --- > > Key: HIVE-24066 > URL: https://issues.apache.org/jira/browse/HIVE-24066 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.5, 3.1.2 >Reporter: Jainik Vora >Priority: Major > Attachments: day_01.snappy.parquet > > > I created a hive table containing columns with struct data type > > {code:java} > CREATE EXTERNAL TABLE test_dwh.sample_parquet_table ( > `context` struct< > `app`: struct< > `build`: string, > `name`: string, > `namespace`: string, > `version`: string > >, > `device`: struct< > `adtrackingenabled`: boolean, > `advertisingid`: string, > `id`: string, > `manufacturer`: string, > `model`: string, > `type`: string > >, > `locale`: string, > `library`: struct< > `name`: string, > `version`: string > >, > `os`: struct< > `name`: string, > `version`: string > >, > `screen`: struct< > `height`: bigint, > `width`: bigint > >, > `network`: struct< > `carrier`: string, > `cellular`: boolean, > `wifi`: boolean > >, > `timezone`: string, > `userAgent`: string > > > ) PARTITIONED BY (day string) > STORED as PARQUET > LOCATION 's3://xyz/events'{code} > > All columns are nullable hence the parquet files read by the table don't > always contain all columns. If any file in a partition doesn't have > "context.os" struct and if "context.os.name" is queried, Hive throws an > exception as below. Same for "context.screen" as well. > > {code:java} > 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 > main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with > exception java.io.IOException:java.lang.RuntimeException: Primitive type > osshould not doesn't match typeos[name] > 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 > main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with > exception java.io.IOException:java.lang.RuntimeException: Primitive type > osshould not doesn't match typeos[name]java.io.IOException: > java.lang.RuntimeException: Primitive type osshould not doesn't match > typeos[name] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) > at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't > match typeos[name] > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330) > > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322) > > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedSchema(Data
[jira] [Commented] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception
[ https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563720#comment-17563720 ] Seonguk Kim commented on HIVE-24066: It seems null check for context.os (column that not exists in file) should work. > Hive query on parquet data should identify if column is not present in file > schema and show NULL value instead of Exception > --- > > Key: HIVE-24066 > URL: https://issues.apache.org/jira/browse/HIVE-24066 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.5, 3.1.2 >Reporter: Jainik Vora >Priority: Major > Attachments: day_01.snappy.parquet > > > I created a hive table containing columns with struct data type > > {code:java} > CREATE EXTERNAL TABLE test_dwh.sample_parquet_table ( > `context` struct< > `app`: struct< > `build`: string, > `name`: string, > `namespace`: string, > `version`: string > >, > `device`: struct< > `adtrackingenabled`: boolean, > `advertisingid`: string, > `id`: string, > `manufacturer`: string, > `model`: string, > `type`: string > >, > `locale`: string, > `library`: struct< > `name`: string, > `version`: string > >, > `os`: struct< > `name`: string, > `version`: string > >, > `screen`: struct< > `height`: bigint, > `width`: bigint > >, > `network`: struct< > `carrier`: string, > `cellular`: boolean, > `wifi`: boolean > >, > `timezone`: string, > `userAgent`: string > > > ) PARTITIONED BY (day string) > STORED as PARQUET > LOCATION 's3://xyz/events'{code} > > All columns are nullable hence the parquet files read by the table don't > always contain all columns. If any file in a partition doesn't have > "context.os" struct and if "context.os.name" is queried, Hive throws an > exception as below. Same for "context.screen" as well. > > {code:java} > 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 > main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with > exception java.io.IOException:java.lang.RuntimeException: Primitive type > osshould not doesn't match typeos[name] > 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 > main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with > exception java.io.IOException:java.lang.RuntimeException: Primitive type > osshould not doesn't match typeos[name]java.io.IOException: > java.lang.RuntimeException: Primitive type osshould not doesn't match > typeos[name] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) > at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't > match typeos[name] > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330) > > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322) > > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedSchema(DataWritableReadSupport.java:249) > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:379) > > at > org.apache.