[jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match
[ https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386075#comment-14386075 ] Chaoyu Tang commented on HIVE-10086: Interesting, [~csun], I was not able to find the commit history in trunk git log. Hive throws error when accessing Parquet file schema using field name match --- Key: HIVE-10086 URL: https://issues.apache.org/jira/browse/HIVE-10086 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Fix For: 1.2.0 Attachments: HIVE-10086.5.patch, HiveGroup.parquet When Hive table schema contains a portion of the schema of a Parquet file, then the access to the values should work if the field names match the schema. This does not work when a struct data type is in the schema, and the Hive schema contains just a portion of the struct elements. Hive throws an error instead. This is the example and how to reproduce: First, create a parquet table, and add some values on it: {code} CREATE TABLE test1 (id int, name string, address structnumber:int,street:string,zip:string) STORED AS PARQUET; INSERT INTO TABLE test1 SELECT 1, 'Roger', named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM srcpart LIMIT 1; {code} Note: {{srcpart}} could be any table. It is just used to leverage the INSERT statement. The above table example generates the following Parquet file schema: {code} message hive_schema { optional int32 id; optional binary name (UTF8); optional group address { optional int32 number; optional binary street (UTF8); optional binary zip (UTF8); } } {code} Afterwards, I create a table that contains just a portion of the schema, and load the Parquet file generated above, a query will fail on that table: {code} CREATE TABLE test1 (name string, address structstreet:string) STORED AS PARQUET; LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1; hive SELECT name FROM test1; OK Roger Time taken: 0.071 seconds, Fetched: 1 row(s) hive SELECT address FROM test1; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable Time taken: 0.085 seconds {code} I would expect that Parquet can access the matched names, but Hive throws an error instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match
[ https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384228#comment-14384228 ] Szehon Ho commented on HIVE-10086: -- +1 thanks Hive throws error when accessing Parquet file schema using field name match --- Key: HIVE-10086 URL: https://issues.apache.org/jira/browse/HIVE-10086 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-10086.4.patch, HiveGroup.parquet When Hive table schema contains a portion of the schema of a Parquet file, then the access to the values should work if the field names match the schema. This does not work when a struct data type is in the schema, and the Hive schema contains just a portion of the struct elements. Hive throws an error instead. This is the example and how to reproduce: First, create a parquet table, and add some values on it: {code} CREATE TABLE test1 (id int, name string, address structnumber:int,street:string,zip:string) STORED AS PARQUET; INSERT INTO TABLE test1 SELECT 1, 'Roger', named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM srcpart LIMIT 1; {code} Note: {{srcpart}} could be any table. It is just used to leverage the INSERT statement. The above table example generates the following Parquet file schema: {code} message hive_schema { optional int32 id; optional binary name (UTF8); optional group address { optional int32 number; optional binary street (UTF8); optional binary zip (UTF8); } } {code} Afterwards, I create a table that contains just a portion of the schema, and load the Parquet file generated above, a query will fail on that table: {code} CREATE TABLE test1 (name string, address structstreet:string) STORED AS PARQUET; LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1; hive SELECT name FROM test1; OK Roger Time taken: 0.071 seconds, Fetched: 1 row(s) hive SELECT address FROM test1; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable Time taken: 0.085 seconds {code} I would expect that Parquet can access the matched names, but Hive throws an error instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match
[ https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380937#comment-14380937 ] Sergio Peña commented on HIVE-10086: [~xuefuz] [~szehon] Could you help me review this patch? Hive throws error when accessing Parquet file schema using field name match --- Key: HIVE-10086 URL: https://issues.apache.org/jira/browse/HIVE-10086 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-10086.1.patch, HiveGroup.parquet When Hive table schema contains a portion of the schema of a Parquet file, then the access to the values should work if the field names match the schema. This does not work when a struct data type is in the schema, and the Hive schema contains just a portion of the struct elements. Hive throws an error instead. This is the example and how to reproduce: First, create a parquet table, and add some values on it: {code} CREATE TABLE test1 (id int, name string, address structnumber:int,street:string,zip:string) STORED AS PARQUET; INSERT INTO TABLE test1 SELECT 1, 'Roger', named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM srcpart LIMIT 1; {code} Note: {{srcpart}} could be any table. It is just used to leverage the INSERT statement. The above table example generates the following Parquet file schema: {code} message hive_schema { optional int32 id; optional binary name (UTF8); optional group address { optional int32 number; optional binary street (UTF8); optional binary zip (UTF8); } } {code} Afterwards, I create a table that contains just a portion of the schema, and load the Parquet file generated above, a query will fail on that table: {code} CREATE TABLE test1 (name string, address structstreet:string) STORED AS PARQUET; LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1; hive SELECT name FROM test1; OK Roger Time taken: 0.071 seconds, Fetched: 1 row(s) hive SELECT address FROM test1; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable Time taken: 0.085 seconds {code} I would expect that Parquet can access the matched names, but Hive throws an error instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match
[ https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381006#comment-14381006 ] Sergio Peña commented on HIVE-10086: [~rdblue] Could you help me review this? Hive throws error when accessing Parquet file schema using field name match --- Key: HIVE-10086 URL: https://issues.apache.org/jira/browse/HIVE-10086 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-10086.1.patch, HiveGroup.parquet When Hive table schema contains a portion of the schema of a Parquet file, then the access to the values should work if the field names match the schema. This does not work when a struct data type is in the schema, and the Hive schema contains just a portion of the struct elements. Hive throws an error instead. This is the example and how to reproduce: First, create a parquet table, and add some values on it: {code} CREATE TABLE test1 (id int, name string, address structnumber:int,street:string,zip:string) STORED AS PARQUET; INSERT INTO TABLE test1 SELECT 1, 'Roger', named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM srcpart LIMIT 1; {code} Note: {{srcpart}} could be any table. It is just used to leverage the INSERT statement. The above table example generates the following Parquet file schema: {code} message hive_schema { optional int32 id; optional binary name (UTF8); optional group address { optional int32 number; optional binary street (UTF8); optional binary zip (UTF8); } } {code} Afterwards, I create a table that contains just a portion of the schema, and load the Parquet file generated above, a query will fail on that table: {code} CREATE TABLE test1 (name string, address structstreet:string) STORED AS PARQUET; LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1; hive SELECT name FROM test1; OK Roger Time taken: 0.071 seconds, Fetched: 1 row(s) hive SELECT address FROM test1; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable Time taken: 0.085 seconds {code} I would expect that Parquet can access the matched names, but Hive throws an error instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)