[jira] [Created] (HIVE-13539) HiveHFileOutputFormat searching the wrong directory for HFiles?
Tim Robertson created HIVE-13539: Summary: HiveHFileOutputFormat searching the wrong directory for HFiles? Key: HIVE-13539 URL: https://issues.apache.org/jira/browse/HIVE-13539 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 1.1.0 Environment: Built into CDH 5.4.7 Reporter: Tim Robertson Assignee: Sushanth Sowmyan Priority: Blocker When creating HFiles for a bulkload in HBase I believe it is looking in the wrong directory to find the HFiles, resulting in the following exception: {code} Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException: Multiple family directories found in hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple family directories found in hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:188) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:958) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287) ... 7 more Caused by: java.io.IOException: Multiple family directories found in hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:158) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:185) ... 11 more {code} The issue is that is looks for the HFiles in {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary}} when I believe it should be looking in the task attempt subfolder, such as {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary/attempt_1461004169450_0002_r_00_1000}}. This can be reproduced in any HBase load such as: {code:sql} CREATE TABLE coords_hbase(id INT, x DOUBLE, y DOUBLE) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = ':key,o:x,o:y', 'hbase.table.default.storage.type' = 'binary'); SET hfile.family.path=/tmp/coords_hfiles/o; SET hive.hbase.generatehfiles=true; INSERT OVERWRITE TABLE coords_hbase SELECT id, decimalLongitude, decimalLatitude FROM source CLUSTER BY id; {code} Any advice greatly appreciated -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7387) Guava version conflict between hadoop and spark [Spark-Branch]
[ https://issues.apache.org/jira/browse/HIVE-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295212#comment-14295212 ] Tim Robertson commented on HIVE-7387: - This affects anyone trying to use a custom UDF from the Hive CLI when the UDF depends on later Guava methods too. Suggest reopening this as a valid issue. Guava version conflict between hadoop and spark [Spark-Branch] -- Key: HIVE-7387 URL: https://issues.apache.org/jira/browse/HIVE-7387 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7387-spark.patch The guava conflict happens in hive driver compile stage, as in the follow exception stacktrace, conflict happens while initiate spark RDD in SparkClient, hive driver take both guava 11 from hadoop classpath and spark assembly jar which contains guava 14 classes in its classpath, spark invoked HashFunction.hasInt which method does not exists in guava 11 version, obvious the guava 11 version HashFunction is loaded into the JVM, which lead to a NoSuchMethodError during initiate spark RDD. {code} java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; at org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75) at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661) at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812) at org.apache.spark.broadcast.HttpBroadcast.init(HttpBroadcast.scala:52) at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:35) at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776) at org.apache.spark.rdd.HadoopRDD.init(HadoopRDD.scala:112) at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:527) at org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:307) at org.apache.hadoop.hive.ql.exec.spark.SparkClient.createRDD(SparkClient.java:204) at org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:167) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:32) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:159) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72) {code} NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2958) GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger]
[ https://issues.apache.org/jira/browse/HIVE-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256250#comment-13256250 ] Tim Robertson commented on HIVE-2958: - Thanks Navis for such a quick turnaround! I have applied the patch and confirm it works on our cluster (with HBase 0.90.4), but can't comment on the implications of the changes. GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger] --- Key: HIVE-2958 URL: https://issues.apache.org/jira/browse/HIVE-2958 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Environment: HBase 0.90.4, Hive 0.90 snapshot (trunk) built today Reporter: Tim Robertson Assignee: Navis Priority: Blocker Attachments: HIVE-2958.D2871.1.patch This relates to https://issues.apache.org/jira/browse/HIVE-1634. The following work fine: {code} CREATE EXTERNAL TABLE tim_hbase_occurrence ( id int, scientific_name string, data_resource_id int ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( hbase.columns.mapping = :key#b,v:scientific_name#s,v:data_resource_id#b ) TBLPROPERTIES( hbase.table.name = mini_occurrences, hbase.table.default.storage.type = binary ); SELECT * FROM tim_hbase_occurrence LIMIT 3; SELECT * FROM tim_hbase_occurrence WHERE data_resource_id=1081 LIMIT 3; {code} However, the following fails: {code} SELECT data_resource_id, count(*) FROM tim_hbase_occurrence GROUP BY data_resource_id; {code} The error given: {code} 0 TS 2012-04-17 16:58:45,693 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initialization Done 7 MAP 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias tim_hbase_occurrence for file hdfs://c1n2.gbif.org/user/hive/warehouse/tim_hbase_occurrence 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarding 1 rows 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows 2012-04-17 16:58:45,716 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 1 rows 2012-04-17 16:58:45,723 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:1444,scientific_name:null,data_resource_id:1081} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyInteger at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:737) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) ... 9 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyInteger at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.copyObject(LazyIntObjectInspector.java:43) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:239) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150
Re: Hive 0.9 now broken on HBase 0.90 ?
Thanks for clarifying Ashutosh. Looks like we'll be forking Hive for a while while we stick with CDH3. I might see if the Cloudera guys are interested in assisting in maintaining a CDH3 HBase compatible Hive 0.9 version - there are too many nice things in 0.9 for us not to use it, but we're kind of committed to CDH3. Cheers, Tim On Wed, Apr 18, 2012 at 10:25 PM, Ashutosh Chauhan hashut...@apache.orgwrote: Hi Tim, Sorry that it broke your setup. Decision to move to hbase-0.92 was made in https://issues.apache.org/jira/browse/HIVE-2748 Thanks, Ashutosh On Wed, Apr 18, 2012 at 11:42, Tim Robertson timrobertson...@gmail.com wrote: Hi all, This is my first post to hive-dev so please go easy on me... I built Hive from trunk (0.90) a couple of weeks ago and have been using it against HBase, and today patched it with the offering of HIVE-2958 and it all worked fine. I just tried an Oozie workflow, built using Maven and the Apache snapshot repository to get the 0.90 snapshot. It fails with the following: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.mapred.TableMapReduceUtil.initCredentials(Lorg/apache/hadoop/mapred/JobConf;)V at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:419) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:292) I believe the source of the issue could be this commit which happened after I built from trunk a couple weeks ago: http://mail-archives.apache.org/mod_mbox/hive-commits/201204.mbox/%3c20120409202655.bdb5d2388...@eris.apache.org%3E Is there a decision to make hive 0.9 require HBase 0.92.0+ ? It would be awesome if it still worked on 0.90.4 since CDH3 uses that. Hope this makes sense, Tim (suffering classpath hell)
[jira] [Created] (HIVE-2958) GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger]
GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger] --- Key: HIVE-2958 URL: https://issues.apache.org/jira/browse/HIVE-2958 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Environment: HBase 0.90.4, Hive 0.90 snapshot (trunk) built today Reporter: Tim Robertson Priority: Blocker This relates to 1634. The following work fine: CREATE EXTERNAL TABLE tim_hbase_occurrence ( id int, scientific_name string, data_resource_id int ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( hbase.columns.mapping = :key#b,v:scientific_name#s,v:data_resource_id#b ) TBLPROPERTIES( hbase.table.name = mini_occurrences, hbase.table.default.storage.type = binary ); SELECT * FROM tim_hbase_occurrence LIMIT 3; SELECT * FROM tim_hbase_occurrence WHERE data_resource_id=1081 LIMIT 3; However, the following fails: SELECT data_resource_id, count(*) FROM tim_hbase_occurrence GROUP BY data_resource_id; The error given: 0 TS 2012-04-17 16:58:45,693 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initialization Done 7 MAP 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias tim_hbase_occurrence for file hdfs://c1n2.gbif.org/user/hive/warehouse/tim_hbase_occurrence 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarding 1 rows 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows 2012-04-17 16:58:45,716 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 1 rows 2012-04-17 16:58:45,723 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:1444,scientific_name:null,data_resource_id:1081} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyInteger at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:737) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) ... 9 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyInteger at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.copyObject(LazyIntObjectInspector.java:43) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:239) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:750) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:722) ... 18 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information
[jira] [Updated] (HIVE-2958) GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger]
[ https://issues.apache.org/jira/browse/HIVE-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Robertson updated HIVE-2958: Description: This relates to https://issues.apache.org/jira/browse/HIVE-1634. The following work fine: {code} CREATE EXTERNAL TABLE tim_hbase_occurrence ( id int, scientific_name string, data_resource_id int ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( hbase.columns.mapping = :key#b,v:scientific_name#s,v:data_resource_id#b ) TBLPROPERTIES( hbase.table.name = mini_occurrences, hbase.table.default.storage.type = binary ); SELECT * FROM tim_hbase_occurrence LIMIT 3; SELECT * FROM tim_hbase_occurrence WHERE data_resource_id=1081 LIMIT 3; {code} However, the following fails: {code} SELECT data_resource_id, count(*) FROM tim_hbase_occurrence GROUP BY data_resource_id; {code} The error given: {code} 0 TS 2012-04-17 16:58:45,693 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initialization Done 7 MAP 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias tim_hbase_occurrence for file hdfs://c1n2.gbif.org/user/hive/warehouse/tim_hbase_occurrence 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarding 1 rows 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows 2012-04-17 16:58:45,716 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 1 rows 2012-04-17 16:58:45,723 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:1444,scientific_name:null,data_resource_id:1081} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyInteger at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:737) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) ... 9 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyInteger at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.copyObject(LazyIntObjectInspector.java:43) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:239) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:750) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:722) ... 18 more {code} was: This relates to 1634. The following work fine: CREATE EXTERNAL TABLE tim_hbase_occurrence ( id int, scientific_name string, data_resource_id int ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( hbase.columns.mapping = :key#b,v:scientific_name#s,v:data_resource_id#b ) TBLPROPERTIES( hbase.table.name = mini_occurrences, hbase.table.default.storage.type = binary ); SELECT * FROM tim_hbase_occurrence
[jira] Commented: (HIVE-680) add user constraints
[ https://issues.apache.org/jira/browse/HIVE-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000321#comment-13000321 ] Tim Robertson commented on HIVE-680: I have just experienced a frustrating issue manifested by a few bad rows. Using Oozie we Sqoop from Mysql to HDFS and then process with Hive and custom UDFs / UDAFs and UDTFs - nearly working and a super clean solution. What happened in my situation were some bad 70,000+ rows with tab and new line characters in fields within the source DB, resulting in invalid rows with missing IDs by the time they got to Hive work. During the processing we ended up with a join across 4 tables each keyed on the ID meaning 70k x 70k x 70k x 70k at quite some work for the reducer dealing with the NULL id. Perhaps it would be nice to allow basic constraints be declared on a table, and then give some generic sanitize() method to warn of potential issues? add user constraints - Key: HIVE-680 URL: https://issues.apache.org/jira/browse/HIVE-680 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Many times, because of a few bad input rows, the whole job fails and it takes a long time to debug those. It might be very useful to add some constraints, which can be checked while reading the data. An option can be added to ignore configurable number of bad rows. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-680) add user constraints
[ https://issues.apache.org/jira/browse/HIVE-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000322#comment-13000322 ] Tim Robertson commented on HIVE-680: I should add that the workaround was to add id IS NOT NULL to a few queries and all was happy, and longer term we will address the root cause and escape those characters. add user constraints - Key: HIVE-680 URL: https://issues.apache.org/jira/browse/HIVE-680 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Many times, because of a few bad input rows, the whole job fails and it takes a long time to debug those. It might be very useful to add some constraints, which can be checked while reading the data. An option can be added to ignore configurable number of bad rows. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira