[jira] [Created] (HIVE-13539) HiveHFileOutputFormat searching the wrong directory for HFiles?

2016-04-18 Thread Tim Robertson (JIRA)
Tim Robertson created HIVE-13539:


 Summary: HiveHFileOutputFormat searching the wrong directory for 
HFiles?
 Key: HIVE-13539
 URL: https://issues.apache.org/jira/browse/HIVE-13539
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 1.1.0
 Environment: Built into CDH 5.4.7
Reporter: Tim Robertson
Assignee: Sushanth Sowmyan
Priority: Blocker


When creating HFiles for a bulkload in HBase I believe it is looking in the 
wrong directory to find the HFiles, resulting in the following exception:

{code}
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: 
java.io.IOException: Multiple family directories found in 
hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: Multiple family directories found in 
hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:188)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:958)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
... 7 more
Caused by: java.io.IOException: Multiple family directories found in 
hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
at 
org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:158)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:185)
... 11 more
{code}

The issue is that is looks for the HFiles in 
{{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary}}
 when I believe it should be looking in the task attempt subfolder, such as 
{{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary/attempt_1461004169450_0002_r_00_1000}}.

This can be reproduced in any HBase load such as:

{code:sql}
CREATE TABLE coords_hbase(id INT, x DOUBLE, y DOUBLE)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  'hbase.columns.mapping' = ':key,o:x,o:y',
  'hbase.table.default.storage.type' = 'binary');

SET hfile.family.path=/tmp/coords_hfiles/o; 
SET hive.hbase.generatehfiles=true;

INSERT OVERWRITE TABLE coords_hbase 
SELECT id, decimalLongitude, decimalLatitude
FROM source
CLUSTER BY id; 
{code}

Any advice greatly appreciated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7387) Guava version conflict between hadoop and spark [Spark-Branch]

2015-01-28 Thread Tim Robertson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295212#comment-14295212
 ] 

Tim Robertson commented on HIVE-7387:
-

This affects anyone trying to use a custom UDF from the Hive CLI when the UDF 
depends on later Guava methods too.  
Suggest reopening this as a valid issue.

 Guava version conflict between hadoop and spark [Spark-Branch]
 --

 Key: HIVE-7387
 URL: https://issues.apache.org/jira/browse/HIVE-7387
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7387-spark.patch


 The guava conflict happens in hive driver compile stage, as in the follow 
 exception stacktrace, conflict happens while initiate spark RDD in 
 SparkClient, hive driver take both guava 11 from hadoop classpath and spark 
 assembly jar which contains guava 14 classes in its classpath, spark invoked 
 HashFunction.hasInt which method does not exists in guava 11 version, obvious 
 the guava 11 version HashFunction is loaded into the JVM, which lead to a  
 NoSuchMethodError during initiate spark RDD.
 {code}
 java.lang.NoSuchMethodError: 
 com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
   at 
 org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
   at 
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
   at 
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
   at 
 org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
   at 
 org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
   at 
 org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
   at 
 org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
   at 
 org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
   at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75)
   at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92)
   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661)
   at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546)
   at 
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812)
   at 
 org.apache.spark.broadcast.HttpBroadcast.init(HttpBroadcast.scala:52)
   at 
 org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:35)
   at 
 org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:29)
   at 
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
   at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776)
   at org.apache.spark.rdd.HadoopRDD.init(HadoopRDD.scala:112)
   at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:527)
   at 
 org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:307)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkClient.createRDD(SparkClient.java:204)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:32)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:159)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
 {code}
 NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2958) GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger]

2012-04-18 Thread Tim Robertson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256250#comment-13256250
 ] 

Tim Robertson commented on HIVE-2958:
-

Thanks Navis for such a quick turnaround!  I have applied the patch and confirm 
it works on our cluster (with HBase 0.90.4), but can't comment on the 
implications of the changes.

 GROUP BY causing ClassCastException [LazyDioInteger cannot be cast 
 LazyInteger]
 ---

 Key: HIVE-2958
 URL: https://issues.apache.org/jira/browse/HIVE-2958
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.9.0
 Environment: HBase 0.90.4, Hive 0.90 snapshot (trunk) built today
Reporter: Tim Robertson
Assignee: Navis
Priority: Blocker
 Attachments: HIVE-2958.D2871.1.patch


 This relates to https://issues.apache.org/jira/browse/HIVE-1634.
 The following work fine:
 {code}
 CREATE EXTERNAL TABLE tim_hbase_occurrence ( 
   id int,
   scientific_name string,
   data_resource_id int
 ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
 SERDEPROPERTIES (
   hbase.columns.mapping = :key#b,v:scientific_name#s,v:data_resource_id#b
 ) TBLPROPERTIES(
   hbase.table.name = mini_occurrences, 
   hbase.table.default.storage.type = binary
 );
 SELECT * FROM tim_hbase_occurrence LIMIT 3;
 SELECT * FROM tim_hbase_occurrence WHERE data_resource_id=1081 LIMIT 3;
 {code}
 However, the following fails:
 {code}
 SELECT data_resource_id, count(*) FROM tim_hbase_occurrence GROUP BY 
 data_resource_id;
 {code}
 The error given:
 {code}
 0 TS
 2012-04-17 16:58:45,693 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
 Initialization Done 7 MAP
 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
 Processing alias tim_hbase_occurrence for file 
 hdfs://c1n2.gbif.org/user/hive/warehouse/tim_hbase_occurrence
 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 
 forwarding 1 rows
 2012-04-17 16:58:45,714 INFO 
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows
 2012-04-17 16:58:45,716 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 
 forwarding 1 rows
 2012-04-17 16:58:45,723 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row {id:1444,scientific_name:null,data_resource_id:1081}
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to 
 org.apache.hadoop.hive.serde2.lazy.LazyInteger
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:737)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
   ... 9 more
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to 
 org.apache.hadoop.hive.serde2.lazy.LazyInteger
   at 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.copyObject(LazyIntObjectInspector.java:43)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:239)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150

Re: Hive 0.9 now broken on HBase 0.90 ?

2012-04-18 Thread Tim Robertson
Thanks for clarifying Ashutosh.

Looks like we'll be forking Hive for a while while we stick with CDH3.  I
might see if the Cloudera guys are interested in assisting in maintaining a
CDH3 HBase compatible Hive 0.9 version - there are too many nice things in
0.9 for us not to use it, but we're kind of committed to CDH3.

Cheers,
Tim






On Wed, Apr 18, 2012 at 10:25 PM, Ashutosh Chauhan hashut...@apache.orgwrote:

 Hi Tim,

 Sorry that it broke your setup. Decision to move to hbase-0.92 was made in
 https://issues.apache.org/jira/browse/HIVE-2748

 Thanks,
 Ashutosh

 On Wed, Apr 18, 2012 at 11:42, Tim Robertson timrobertson...@gmail.com
 wrote:

  Hi all,
 
  This is my first post to hive-dev so please go easy on me...
 
  I built Hive from trunk (0.90) a couple of weeks ago and have been using
 it
  against HBase, and today patched it with the offering of HIVE-2958 and it
  all worked fine.
 
  I just tried an Oozie workflow, built using Maven and the Apache snapshot
  repository to get the 0.90 snapshot.  It fails with the following:
 
  java.lang.NoSuchMethodError:
 
 
 org.apache.hadoop.hbase.mapred.TableMapReduceUtil.initCredentials(Lorg/apache/hadoop/mapred/JobConf;)V
 at
 
 org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:419)
 at
 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:292)
 
 
  I believe the source of the issue could be this commit which happened
 after
  I built from trunk a couple weeks ago:
 
 
 
 http://mail-archives.apache.org/mod_mbox/hive-commits/201204.mbox/%3c20120409202655.bdb5d2388...@eris.apache.org%3E
 
  Is there a decision to make hive 0.9  require HBase 0.92.0+ ?  It would
 be
  awesome if it still worked on 0.90.4 since CDH3 uses that.
 
  Hope this makes sense,
  Tim
  (suffering classpath hell)
 



[jira] [Created] (HIVE-2958) GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger]

2012-04-17 Thread Tim Robertson (Created) (JIRA)
GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger]
---

 Key: HIVE-2958
 URL: https://issues.apache.org/jira/browse/HIVE-2958
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.9.0
 Environment: HBase 0.90.4, Hive 0.90 snapshot (trunk) built today
Reporter: Tim Robertson
Priority: Blocker


This relates to 1634.

The following work fine:

CREATE EXTERNAL TABLE tim_hbase_occurrence ( 
  id int,
  scientific_name string,
  data_resource_id int
) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
SERDEPROPERTIES (
  hbase.columns.mapping = :key#b,v:scientific_name#s,v:data_resource_id#b
) TBLPROPERTIES(
  hbase.table.name = mini_occurrences, 
  hbase.table.default.storage.type = binary
);
SELECT * FROM tim_hbase_occurrence LIMIT 3;
SELECT * FROM tim_hbase_occurrence WHERE data_resource_id=1081 LIMIT 3;

However, the following fails:
  SELECT data_resource_id, count(*) FROM tim_hbase_occurrence GROUP BY 
data_resource_id;

The error given:

0 TS
2012-04-17 16:58:45,693 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
Initialization Done 7 MAP
2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
Processing alias tim_hbase_occurrence for file 
hdfs://c1n2.gbif.org/user/hive/warehouse/tim_hbase_occurrence
2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 
forwarding 1 rows
2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 forwarding 1 rows
2012-04-17 16:58:45,716 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 
forwarding 1 rows
2012-04-17 16:58:45,723 FATAL ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {id:1444,scientific_name:null,data_resource_id:1081}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to 
org.apache.hadoop.hive.serde2.lazy.LazyInteger
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:737)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
... 9 more
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to 
org.apache.hadoop.hive.serde2.lazy.LazyInteger
at 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.copyObject(LazyIntObjectInspector.java:43)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:239)
at 
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
at 
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
at 
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:750)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:722)
... 18 more




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information

[jira] [Updated] (HIVE-2958) GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger]

2012-04-17 Thread Tim Robertson (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Robertson updated HIVE-2958:


Description: 
This relates to https://issues.apache.org/jira/browse/HIVE-1634.

The following work fine:

{code}
CREATE EXTERNAL TABLE tim_hbase_occurrence ( 
  id int,
  scientific_name string,
  data_resource_id int
) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
SERDEPROPERTIES (
  hbase.columns.mapping = :key#b,v:scientific_name#s,v:data_resource_id#b
) TBLPROPERTIES(
  hbase.table.name = mini_occurrences, 
  hbase.table.default.storage.type = binary
);
SELECT * FROM tim_hbase_occurrence LIMIT 3;
SELECT * FROM tim_hbase_occurrence WHERE data_resource_id=1081 LIMIT 3;
{code}

However, the following fails:
{code}
SELECT data_resource_id, count(*) FROM tim_hbase_occurrence GROUP BY 
data_resource_id;
{code}

The error given:
{code}
0 TS
2012-04-17 16:58:45,693 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
Initialization Done 7 MAP
2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
Processing alias tim_hbase_occurrence for file 
hdfs://c1n2.gbif.org/user/hive/warehouse/tim_hbase_occurrence
2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 
forwarding 1 rows
2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 forwarding 1 rows
2012-04-17 16:58:45,716 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 
forwarding 1 rows
2012-04-17 16:58:45,723 FATAL ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {id:1444,scientific_name:null,data_resource_id:1081}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to 
org.apache.hadoop.hive.serde2.lazy.LazyInteger
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:737)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
... 9 more
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to 
org.apache.hadoop.hive.serde2.lazy.LazyInteger
at 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.copyObject(LazyIntObjectInspector.java:43)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:239)
at 
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
at 
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
at 
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:750)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:722)
... 18 more
{code}



  was:
This relates to 1634.

The following work fine:

CREATE EXTERNAL TABLE tim_hbase_occurrence ( 
  id int,
  scientific_name string,
  data_resource_id int
) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
SERDEPROPERTIES (
  hbase.columns.mapping = :key#b,v:scientific_name#s,v:data_resource_id#b
) TBLPROPERTIES(
  hbase.table.name = mini_occurrences, 
  hbase.table.default.storage.type = binary
);
SELECT * FROM tim_hbase_occurrence

[jira] Commented: (HIVE-680) add user constraints

2011-02-28 Thread Tim Robertson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000321#comment-13000321
 ] 

Tim Robertson commented on HIVE-680:


I have just experienced a frustrating issue manifested by a few bad rows.

Using Oozie we Sqoop from Mysql to HDFS and then process with Hive and custom 
UDFs / UDAFs and UDTFs - nearly working and a super clean solution.

What happened in my situation were some bad 70,000+ rows with tab and new line 
characters in fields within the source DB, resulting in invalid rows with 
missing IDs by the time they got to Hive work.  During the processing we ended 
up with a join across 4 tables each keyed on the ID meaning 70k x 70k x 70k x 
70k at quite some work for the reducer dealing with the NULL id.

Perhaps it would be nice to allow basic constraints be declared on a table, and 
then give some generic sanitize() method to warn of potential issues?  




 add user constraints 
 -

 Key: HIVE-680
 URL: https://issues.apache.org/jira/browse/HIVE-680
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain

 Many times, because of a few bad input rows, the whole job fails and it takes 
 a long time to debug those.
 It might be very useful to add some constraints, which can be checked while 
 reading the data.
 An option can be added to ignore configurable number of bad rows.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-680) add user constraints

2011-02-28 Thread Tim Robertson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000322#comment-13000322
 ] 

Tim Robertson commented on HIVE-680:


I should add that the workaround was to add id IS NOT NULL to a few queries 
and all was happy, and longer term we will address the root cause and escape 
those characters.

 add user constraints 
 -

 Key: HIVE-680
 URL: https://issues.apache.org/jira/browse/HIVE-680
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain

 Many times, because of a few bad input rows, the whole job fails and it takes 
 a long time to debug those.
 It might be very useful to add some constraints, which can be checked while 
 reading the data.
 An option can be added to ignore configurable number of bad rows.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira