[jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match

2015-03-29 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386075#comment-14386075
 ] 

Chaoyu Tang commented on HIVE-10086:


Interesting, [~csun], I was not able to find the commit history in trunk git 
log. 

 Hive throws error when accessing Parquet file schema using field name match
 ---

 Key: HIVE-10086
 URL: https://issues.apache.org/jira/browse/HIVE-10086
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Fix For: 1.2.0

 Attachments: HIVE-10086.5.patch, HiveGroup.parquet


 When Hive table schema contains a portion of the schema of a Parquet file, 
 then the access to the values should work if the field names match the 
 schema. This does not work when a struct data type is in the schema, and 
 the Hive schema contains just a portion of the struct elements. Hive throws 
 an error instead.
 This is the example and how to reproduce:
 First, create a parquet table, and add some values on it:
 {code}
 CREATE TABLE test1 (id int, name string, address 
 structnumber:int,street:string,zip:string) STORED AS PARQUET;
 INSERT INTO TABLE test1 SELECT 1, 'Roger', 
 named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM 
 srcpart LIMIT 1;
 {code}
 Note: {{srcpart}} could be any table. It is just used to leverage the INSERT 
 statement.
 The above table example generates the following Parquet file schema:
 {code}
 message hive_schema {
   optional int32 id;
   optional binary name (UTF8);
   optional group address {
 optional int32 number;
 optional binary street (UTF8);
 optional binary zip (UTF8);
   }
 }
 {code} 
 Afterwards, I create a table that contains just a portion of the schema, and 
 load the Parquet file generated above, a query will fail on that table:
 {code}
 CREATE TABLE test1 (name string, address structstreet:string) STORED AS 
 PARQUET;
 LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1;
 hive SELECT name FROM test1;
 OK
 Roger
 Time taken: 0.071 seconds, Fetched: 1 row(s)
 hive SELECT address FROM test1;
 OK
 Failed with exception 
 java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.UnsupportedOperationException: Cannot inspect 
 org.apache.hadoop.io.IntWritable
 Time taken: 0.085 seconds
 {code}
 I would expect that Parquet can access the matched names, but Hive throws an 
 error instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match

2015-03-27 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384228#comment-14384228
 ] 

Szehon Ho commented on HIVE-10086:
--

+1 thanks

 Hive throws error when accessing Parquet file schema using field name match
 ---

 Key: HIVE-10086
 URL: https://issues.apache.org/jira/browse/HIVE-10086
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-10086.4.patch, HiveGroup.parquet


 When Hive table schema contains a portion of the schema of a Parquet file, 
 then the access to the values should work if the field names match the 
 schema. This does not work when a struct data type is in the schema, and 
 the Hive schema contains just a portion of the struct elements. Hive throws 
 an error instead.
 This is the example and how to reproduce:
 First, create a parquet table, and add some values on it:
 {code}
 CREATE TABLE test1 (id int, name string, address 
 structnumber:int,street:string,zip:string) STORED AS PARQUET;
 INSERT INTO TABLE test1 SELECT 1, 'Roger', 
 named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM 
 srcpart LIMIT 1;
 {code}
 Note: {{srcpart}} could be any table. It is just used to leverage the INSERT 
 statement.
 The above table example generates the following Parquet file schema:
 {code}
 message hive_schema {
   optional int32 id;
   optional binary name (UTF8);
   optional group address {
 optional int32 number;
 optional binary street (UTF8);
 optional binary zip (UTF8);
   }
 }
 {code} 
 Afterwards, I create a table that contains just a portion of the schema, and 
 load the Parquet file generated above, a query will fail on that table:
 {code}
 CREATE TABLE test1 (name string, address structstreet:string) STORED AS 
 PARQUET;
 LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1;
 hive SELECT name FROM test1;
 OK
 Roger
 Time taken: 0.071 seconds, Fetched: 1 row(s)
 hive SELECT address FROM test1;
 OK
 Failed with exception 
 java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.UnsupportedOperationException: Cannot inspect 
 org.apache.hadoop.io.IntWritable
 Time taken: 0.085 seconds
 {code}
 I would expect that Parquet can access the matched names, but Hive throws an 
 error instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match

2015-03-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380937#comment-14380937
 ] 

Sergio Peña commented on HIVE-10086:


[~xuefuz] [~szehon] Could you help me review this patch?

 Hive throws error when accessing Parquet file schema using field name match
 ---

 Key: HIVE-10086
 URL: https://issues.apache.org/jira/browse/HIVE-10086
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-10086.1.patch, HiveGroup.parquet


 When Hive table schema contains a portion of the schema of a Parquet file, 
 then the access to the values should work if the field names match the 
 schema. This does not work when a struct data type is in the schema, and 
 the Hive schema contains just a portion of the struct elements. Hive throws 
 an error instead.
 This is the example and how to reproduce:
 First, create a parquet table, and add some values on it:
 {code}
 CREATE TABLE test1 (id int, name string, address 
 structnumber:int,street:string,zip:string) STORED AS PARQUET;
 INSERT INTO TABLE test1 SELECT 1, 'Roger', 
 named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM 
 srcpart LIMIT 1;
 {code}
 Note: {{srcpart}} could be any table. It is just used to leverage the INSERT 
 statement.
 The above table example generates the following Parquet file schema:
 {code}
 message hive_schema {
   optional int32 id;
   optional binary name (UTF8);
   optional group address {
 optional int32 number;
 optional binary street (UTF8);
 optional binary zip (UTF8);
   }
 }
 {code} 
 Afterwards, I create a table that contains just a portion of the schema, and 
 load the Parquet file generated above, a query will fail on that table:
 {code}
 CREATE TABLE test1 (name string, address structstreet:string) STORED AS 
 PARQUET;
 LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1;
 hive SELECT name FROM test1;
 OK
 Roger
 Time taken: 0.071 seconds, Fetched: 1 row(s)
 hive SELECT address FROM test1;
 OK
 Failed with exception 
 java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.UnsupportedOperationException: Cannot inspect 
 org.apache.hadoop.io.IntWritable
 Time taken: 0.085 seconds
 {code}
 I would expect that Parquet can access the matched names, but Hive throws an 
 error instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match

2015-03-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381006#comment-14381006
 ] 

Sergio Peña commented on HIVE-10086:


[~rdblue] Could you help me review this?

 Hive throws error when accessing Parquet file schema using field name match
 ---

 Key: HIVE-10086
 URL: https://issues.apache.org/jira/browse/HIVE-10086
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-10086.1.patch, HiveGroup.parquet


 When Hive table schema contains a portion of the schema of a Parquet file, 
 then the access to the values should work if the field names match the 
 schema. This does not work when a struct data type is in the schema, and 
 the Hive schema contains just a portion of the struct elements. Hive throws 
 an error instead.
 This is the example and how to reproduce:
 First, create a parquet table, and add some values on it:
 {code}
 CREATE TABLE test1 (id int, name string, address 
 structnumber:int,street:string,zip:string) STORED AS PARQUET;
 INSERT INTO TABLE test1 SELECT 1, 'Roger', 
 named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM 
 srcpart LIMIT 1;
 {code}
 Note: {{srcpart}} could be any table. It is just used to leverage the INSERT 
 statement.
 The above table example generates the following Parquet file schema:
 {code}
 message hive_schema {
   optional int32 id;
   optional binary name (UTF8);
   optional group address {
 optional int32 number;
 optional binary street (UTF8);
 optional binary zip (UTF8);
   }
 }
 {code} 
 Afterwards, I create a table that contains just a portion of the schema, and 
 load the Parquet file generated above, a query will fail on that table:
 {code}
 CREATE TABLE test1 (name string, address structstreet:string) STORED AS 
 PARQUET;
 LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1;
 hive SELECT name FROM test1;
 OK
 Roger
 Time taken: 0.071 seconds, Fetched: 1 row(s)
 hive SELECT address FROM test1;
 OK
 Failed with exception 
 java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.UnsupportedOperationException: Cannot inspect 
 org.apache.hadoop.io.IntWritable
 Time taken: 0.085 seconds
 {code}
 I would expect that Parquet can access the matched names, but Hive throws an 
 error instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)