[jira] [Commented] (HIVE-4975) Reading orc file throws exception after adding new column
[ https://issues.apache.org/jira/browse/HIVE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917691#comment-13917691 ] cyril liao commented on HIVE-4975: -- Just return null is not enough.Think about change the name of a column , or change the order of the columns . Reading orc file throws exception after adding new column - Key: HIVE-4975 URL: https://issues.apache.org/jira/browse/HIVE-4975 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.11.0 Environment: hive 0.11.0 hadoop 1.0.0 Reporter: cyril liao Assignee: Kevin Wilfong Priority: Critical Labels: orcfile Fix For: 0.13.0 Attachments: HIVE-4975.1.patch.txt ORC file read failure after add table column. create a table which have three column .(a string,b string,c string). add a new column after c by executing ALTER TABLE table ADD COLUMNS (d string). execute hiveql select d from table,the following exception goes: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ArrayIndexOutOfBoundsException: 4 at org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:206) at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) ] at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ArrayIndexOutOfBoundsException: 4 at org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:206) at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating d
[jira] [Commented] (HIVE-4996) unbalanced calls to openTransaction/commitTransaction
[ https://issues.apache.org/jira/browse/HIVE-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862873#comment-13862873 ] cyril liao commented on HIVE-4996: -- Every thing works well after i change connection pool from bonecp to dbcp . You can have a try. unbalanced calls to openTransaction/commitTransaction - Key: HIVE-4996 URL: https://issues.apache.org/jira/browse/HIVE-4996 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0, 0.11.0 Environment: hiveserver1 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Reporter: wangfeng Priority: Critical Labels: hive, metastore Original Estimate: 504h Remaining Estimate: 504h when we used hiveserver1 based on hive-0.10.0, we found the Exception thrown.It was: FAILED: Error in metadata: MetaException(message:java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that the re are unbalanced calls to openTransaction/commitTransaction) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask help -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-4996) unbalanced calls to openTransaction/commitTransaction
[ https://issues.apache.org/jira/browse/HIVE-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cyril liao updated HIVE-4996: - Attachment: hive-4996.path change connection poll from bonecp to dbcp unbalanced calls to openTransaction/commitTransaction - Key: HIVE-4996 URL: https://issues.apache.org/jira/browse/HIVE-4996 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0, 0.11.0, 0.12.0 Environment: hiveserver1 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Reporter: wangfeng Priority: Critical Labels: hive, metastore Attachments: hive-4996.path Original Estimate: 504h Remaining Estimate: 504h when we used hiveserver1 based on hive-0.10.0, we found the Exception thrown.It was: FAILED: Error in metadata: MetaException(message:java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that the re are unbalanced calls to openTransaction/commitTransaction) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask help -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-4996) unbalanced calls to openTransaction/commitTransaction
[ https://issues.apache.org/jira/browse/HIVE-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cyril liao updated HIVE-4996: - Tags: hive Metastore (was: hive hiveserver) Affects Version/s: 0.12.0 Status: Patch Available (was: Open) unbalanced calls to openTransaction/commitTransaction - Key: HIVE-4996 URL: https://issues.apache.org/jira/browse/HIVE-4996 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.12.0, 0.11.0, 0.10.0 Environment: hiveserver1 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Reporter: wangfeng Priority: Critical Labels: hive, metastore Attachments: hive-4996.path Original Estimate: 504h Remaining Estimate: 504h when we used hiveserver1 based on hive-0.10.0, we found the Exception thrown.It was: FAILED: Error in metadata: MetaException(message:java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that the re are unbalanced calls to openTransaction/commitTransaction) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask help -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-4996) unbalanced calls to openTransaction/commitTransaction
[ https://issues.apache.org/jira/browse/HIVE-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863702#comment-13863702 ] cyril liao commented on HIVE-4996: -- bonecp leads to communication with db fails,this is why there are unbalanced transcations . unbalanced calls to openTransaction/commitTransaction - Key: HIVE-4996 URL: https://issues.apache.org/jira/browse/HIVE-4996 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0, 0.11.0, 0.12.0 Environment: hiveserver1 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Reporter: wangfeng Priority: Critical Labels: hive, metastore Attachments: hive-4996.path Original Estimate: 504h Remaining Estimate: 504h when we used hiveserver1 based on hive-0.10.0, we found the Exception thrown.It was: FAILED: Error in metadata: MetaException(message:java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that the re are unbalanced calls to openTransaction/commitTransaction) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask help -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-4996) unbalanced calls to openTransaction/commitTransaction
[ https://issues.apache.org/jira/browse/HIVE-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863708#comment-13863708 ] cyril liao commented on HIVE-4996: -- Maybe there are some wrong usage of bonecp in hive,but i am sure bonecp is in charge of this issue. unbalanced calls to openTransaction/commitTransaction - Key: HIVE-4996 URL: https://issues.apache.org/jira/browse/HIVE-4996 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0, 0.11.0, 0.12.0 Environment: hiveserver1 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Reporter: wangfeng Priority: Critical Labels: hive, metastore Attachments: hive-4996.path Original Estimate: 504h Remaining Estimate: 504h when we used hiveserver1 based on hive-0.10.0, we found the Exception thrown.It was: FAILED: Error in metadata: MetaException(message:java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that the re are unbalanced calls to openTransaction/commitTransaction) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask help -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5235) Infinite loop with ORC file and Hive 0.11
[ https://issues.apache.org/jira/browse/HIVE-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840921#comment-13840921 ] cyril liao commented on HIVE-5235: -- in my environment, the problem appeared when a int type column contains null values. it was solved after i change the null values to -1 as default. Infinite loop with ORC file and Hive 0.11 - Key: HIVE-5235 URL: https://issues.apache.org/jira/browse/HIVE-5235 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Environment: Gentoo linux with Hortonworks Hadoop hadoop-1.1.2.23.tar.gz and Apache Hive 0.11d Reporter: Iván de Prado Priority: Blocker Attachments: gendata.py We are using Hive 0.11 with ORC file format and we get some tasks blocked in some kind of infinite loop. They keep working indefinitely when we set a huge task expiry timeout. If we the expiry time to 600 second, the taks fail because of not reporting progress, and finally, the Job fails. That is not consistent, and some times between jobs executions the behavior changes. It happen for different queries. We are using Hive 0.11 with Hadoop hadoop-1.1.2.23 from Hortonworks. The taks that is blocked keeps consuming 100% of CPU usage, and the stack trace is always the same consistently. Everything points to some kind of infinite loop. My guessing is that it has some relation to the ORC file. Maybe some pointer is not right when writing generating some kind of infinite loop when reading. Or maybe there is a bug in the reading stage. More information below. The stack trace: {noformat} main prio=10 tid=0x7f20a000a800 nid=0x1ed2 runnable [0x7f20a8136000] java.lang.Thread.State: RUNNABLE at java.util.zip.Inflater.inflateBytes(Native Method) at java.util.zip.Inflater.inflate(Inflater.java:256) - locked 0xf42a6ca0 (a java.util.zip.ZStreamRef) at org.apache.hadoop.hive.ql.io.orc.ZlibCodec.decompress(ZlibCodec.java:64) at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:128) at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:143) at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readVulong(SerializationUtils.java:54) at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readVslong(SerializationUtils.java:65) at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReader.readValues(RunLengthIntegerReader.java:66) at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReader.next(RunLengthIntegerReader.java:81) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:332) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:802) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1214) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:71) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:46) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:300) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:218) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236) - eliminated 0xe1459700 (a org.apache.hadoop.mapred.MapTask$TrackedRecordReader) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216) - locked 0xe1459700 (a org.apache.hadoop.mapred.MapTask$TrackedRecordReader) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178) at org.apache.hadoop.mapred.Child.main(Child.java:249) {noformat} We have seen the same stack trace
[jira] [Created] (HIVE-5888) group by after join operation product no result when hive.optimize.skewjoin = true
cyril liao created HIVE-5888: Summary: group by after join operation product no result when hive.optimize.skewjoin = true Key: HIVE-5888 URL: https://issues.apache.org/jira/browse/HIVE-5888 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: cyril liao -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5888) group by after join operation product no result when hive.optimize.skewjoin = true
[ https://issues.apache.org/jira/browse/HIVE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832362#comment-13832362 ] cyril liao commented on HIVE-5888: -- if hive.optimize.skewjoin is set to false,we got right result group by after join operation product no result when hive.optimize.skewjoin = true Key: HIVE-5888 URL: https://issues.apache.org/jira/browse/HIVE-5888 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: cyril liao -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5123) group by on a same key producing wrong result
[ https://issues.apache.org/jira/browse/HIVE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770402#comment-13770402 ] cyril liao commented on HIVE-5123: -- SELECT p_day, count(*) from (SELECT p_day ,uid,poid FROM t_app_bd_stat WHERE p_day =20130910 AND p_day=20130917 AND newuser =1 GROUP BY p_day,uid,poid) tmp GROUP BY p_day ORDER BY p_day ASC the result get a lot of lines with a same p_day value. group by on a same key producing wrong result - Key: HIVE-5123 URL: https://issues.apache.org/jira/browse/HIVE-5123 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: cyril liao grouping by on a same key twice will run a single mapreduce-job,producing wrong result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5124) group by without map aggregation lead to mapreduce exception
[ https://issues.apache.org/jira/browse/HIVE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747102#comment-13747102 ] cyril liao commented on HIVE-5124: -- The sql is : SELECT channeled, max(VV)AS vv, max(FUV) AS FUV, max(PV) AS PV, max(UV) AS UV FROM ( SELECT channeled, sum(CASE WHEN TYPE = 1 THEN a else cast( 0 as bigint) END) AS VV, sum(CASE WHEN TYPE = 1 THEN b else cast( 0 as bigint) END) AS FUV, sum(CASE WHEN TYPE = 2 THEN a else cast (0 as bigint) END) AS PV, sum(CASE WHEN TYPE = 2 THEN b else cast (0 as bigint) END) AS UV FROM (SELECT count(uid) AS a, count(DISTINCT uid) AS b, TYPE, channeled FROM (SELECT uid, channeled, TYPE FROM (SELECT uid, parse_url(url,'QUERY','channeled') as channeled, 1 AS TYPE FROM t_html5_vv WHERE p_day = ${idate} UNION ALL SELECT uid, parse_url(url,'QUERY','channeled') as channeled, 2 AS TYPE FROM t_html5_pv WHERE p_day = ${idate} )tmp where channeled is not null and channeled '' ) tmp2 GROUP BY channeled,TYPE )tmp3 GROUP BY channeled)tmp4 GROUP BY channeled I want to get uv and fuv from different table,t_html5_vv and t_html5_pv ,and combine the result in one row. The default hive.map.aggr argument in hive-site.xml is setted to true,and the sql goes prefect.But the exception is thrown when i set hive.map.aggr= false. group by without map aggregation lead to mapreduce exception Key: HIVE-5124 URL: https://issues.apache.org/jira/browse/HIVE-5124 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: cyril liao Assignee: Bing Li On my environment, the same query but diffent by seting hive.map.aggr with true or flase,produce different result. With hive.map.aggr=false,tasktracker report the following exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:160) ... 14 more Caused by: java.lang.RuntimeException: cannot find field value from [0:_col0, 1:_col1, 2:_col2, 3:_col3] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:143) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82) at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:299) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at
[jira] [Created] (HIVE-5123) group by on a same key producing wrong result
cyril liao created HIVE-5123: Summary: group by on a same key producing wrong result Key: HIVE-5123 URL: https://issues.apache.org/jira/browse/HIVE-5123 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: cyril liao grouping by on a same key twice will run a single mapreduce-job,producing wrong result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5124) group by without map aggregation lead to mapreduce exception
cyril liao created HIVE-5124: Summary: group by without map aggregation lead to mapreduce exception Key: HIVE-5124 URL: https://issues.apache.org/jira/browse/HIVE-5124 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: cyril liao On my environment, the same query but diffent by seting hive.map.aggr with true or flase,produce different result. With hive.map.aggr=false,tasktracker report the following exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:160) ... 14 more Caused by: java.lang.RuntimeException: cannot find field value from [0:_col0, 1:_col1, 2:_col2, 3:_col3] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:143) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82) at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:299) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:153) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4975) ORC file read failure after add table column
cyril liao created HIVE-4975: Summary: ORC file read failure after add table column Key: HIVE-4975 URL: https://issues.apache.org/jira/browse/HIVE-4975 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.11.0 Environment: hive 0.11.0 hadoop 1.0.0 Reporter: cyril liao Priority: Critical ORC file read failure after add table column. create a table which have three column .(a string,b string,c string). add a new column after c by executing ALTER TABLE table ADD COLUMNS (d string). execute hiveql select d from table,the following exception goes: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ArrayIndexOutOfBoundsException: 4 at org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:206) at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) ] at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ArrayIndexOutOfBoundsException: 4 at org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:206) at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating d at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:80) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at
[jira] [Updated] (HIVE-4975) Reading orc file throws exception after adding new column
[ https://issues.apache.org/jira/browse/HIVE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cyril liao updated HIVE-4975: - Tags: ORCfile Labels: orcfile (was: ) Summary: Reading orc file throws exception after adding new column (was: ORC file read failure after add table column) Reading orc file throws exception after adding new column - Key: HIVE-4975 URL: https://issues.apache.org/jira/browse/HIVE-4975 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.11.0 Environment: hive 0.11.0 hadoop 1.0.0 Reporter: cyril liao Priority: Critical Labels: orcfile ORC file read failure after add table column. create a table which have three column .(a string,b string,c string). add a new column after c by executing ALTER TABLE table ADD COLUMNS (d string). execute hiveql select d from table,the following exception goes: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ArrayIndexOutOfBoundsException: 4 at org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:206) at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) ] at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ArrayIndexOutOfBoundsException: 4 at org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:206) at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating d at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:80)
[jira] [Commented] (HIVE-2702) listPartitionsByFilter only supports string partitions
[ https://issues.apache.org/jira/browse/HIVE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629724#comment-13629724 ] cyril liao commented on HIVE-2702: -- Drop partition get org.apache.hadoop.hive.ql.parse.SemanticException because of this design. Drop partition is a base function,why this design exist if a base function can not work! listPartitionsByFilter only supports string partitions -- Key: HIVE-2702 URL: https://issues.apache.org/jira/browse/HIVE-2702 Project: Hive Issue Type: Bug Affects Versions: 0.8.1 Reporter: Aniket Mokashi Assignee: Aniket Mokashi Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2702.D2043.1.patch, HIVE-2702.1.patch listPartitionsByFilter supports only non-string partitions. This is because its explicitly specified in generateJDOFilterOverPartitions in ExpressionTree.java. //Can only support partitions whose types are string if( ! table.getPartitionKeys().get(partitionColumnIndex). getType().equals(org.apache.hadoop.hive.serde.Constants.STRING_TYPE_NAME) ) { throw new MetaException (Filtering is supported only on partition keys of type string); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3856) Authorization report NPE when table partition do not exits
cyril liao created HIVE-3856: Summary: Authorization report NPE when table partition do not exits Key: HIVE-3856 URL: https://issues.apache.org/jira/browse/HIVE-3856 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.9.0 Environment: hadoop 0.20.205 hive 0.9.0 Reporter: cyril liao the following hql report npe: use app;select a.name from( select profile['net'] as name from app.app_profile where p_day = 20130103 group by profile['net']) a left outer join app.app_network_mode b on a.name = b.name where b.name is null; the errors are : 2013-01-04 11:10:05,905 ERROR ql.Driver (SessionState.java:printError(400)) - FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:625) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:486) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:917) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) If i change the partition condition from p_day = 20130103 to p_day = 20121228 , it works. The p_day=20121228 partition ensure exits ,but the p_pay=20130103 partition do not exit. The statement should not report NPE ! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3827) LATERAL VIEW doesn't work with union all statement
[ https://issues.apache.org/jira/browse/HIVE-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539450#comment-13539450 ] cyril liao commented on HIVE-3827: -- if i create a table named tmp_tbl as SELECT 1 as from_pid, 1 as to_pid, cid as from_path, (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path, 0 as status FROM (SELECT union_map(c_map) AS c_map FROM (SELECT collect_map(id,parent_id)AS c_map FROM wl_channels GROUP BY id,parent_id )tmp )tmp2 LATERAL VIEW recursion_concat(c_map) a AS cid, pid and use the table tmp_tbl to select ,the result goes right. at the same time ,i do the same works under hive 0.7.1 , the result goes right too. LATERAL VIEW doesn't work with union all statement -- Key: HIVE-3827 URL: https://issues.apache.org/jira/browse/HIVE-3827 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0 Environment: hive0.9.0 hadoop 0.20.205 Reporter: cyril liao LATER VIEW lose data working with union all. query NO.1: SELECT 1 as from_pid, 1 as to_pid, cid as from_path, (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path, 0 as status FROM (SELECT union_map(c_map) AS c_map FROM (SELECT collect_map(id,parent_id)AS c_map FROM wl_channels GROUP BY id,parent_id )tmp )tmp2 LATERAL VIEW recursion_concat(c_map) a AS cid, pid this query returns about 1 rows ,and their status is 0. query NO.2: select a.from_pid as from_pid, a.to_pid as to_pid, a.from_path as from_path, a.to_path as to_path, a.status as status from wl_dc_channels a where a.status 0 this query returns about 100 rows ,and their status is 1 or 2. query NO.3: select from_pid, to_pid, from_path, to_path, status from ( SELECT 1 as from_pid, 1 as to_pid, cid as from_path, (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path, 0 as status FROM (SELECT union_map(c_map) AS c_map FROM (SELECT collect_map(id,parent_id)AS c_map FROM wl_channels GROUP BY id,parent_id )tmp )tmp2 LATERAL VIEW recursion_concat(c_map) a AS cid, pid union all select a.from_pid as from_pid, a.to_pid as to_pid, a.from_path as from_path, a.to_path as to_path, a.status as status from wl_dc_channels a where a.status 0 ) unin_tbl this query has the same result as query NO.2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3104) Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW
[ https://issues.apache.org/jira/browse/HIVE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536934#comment-13536934 ] cyril liao commented on HIVE-3104: -- LATER VIEW doesn't work with UNION ALL too. query NO.1: SELECT 1 as from_pid, 1 as to_pid, cid as from_path, (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path, 0 as status FROM (SELECT union_map(c_map) AS c_map FROM (SELECT collect_map(id,parent_id)AS c_map FROM wl_channels GROUP BY id,parent_id )tmp )tmp2 LATERAL VIEW recursion_concat(c_map) a AS cid, pid this query returns about 1 rows ,and there status is 0. query NO.2: select a.from_pid as from_pid, a.to_pid as to_pid, a.from_path as from_path, a.to_path as to_path, a.status as status from wl_dc_channels a where a.status 0 this query returns about 100 rows ,and there status is 1 or 2. query NO.3: select from_pid, to_pid, from_path, to_path, status from ( SELECT 1 as from_pid, 1 as to_pid, cid as from_path, (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path, 0 as status FROM (SELECT union_map(c_map) AS c_map FROM (SELECT collect_map(id,parent_id)AS c_map FROM wl_channels GROUP BY id,parent_id )tmp )tmp2 LATERAL VIEW recursion_concat(c_map) a AS cid, pid union all select a.from_pid as from_pid, a.to_pid as to_pid, a.from_path as from_path, a.to_path as to_path, a.status as status from wl_dc_channels a where a.status 0 ) unin_tbl this query has the same result as query NO.2 Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW --- Key: HIVE-3104 URL: https://issues.apache.org/jira/browse/HIVE-3104 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.9.0 Environment: Apache Hive 0.9.0, Apache Hadoop 0.20.205.0 Reporter: Mark Grover Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. It also seems to work for multi-insert queries *not* using LATERAL VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW. Here are some examples. In the below examples, I make use of the fact that a query with no partition filtering when run under hive.mapred.mode=strict fails. --Table creation and population DROP TABLE IF EXISTS test; CREATE TABLE test (col1 arrayint, col2 int) PARTITIONED BY (part_col int); INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), count(*) FROM test; INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), count(*) FROM test; -- Query 1 -- This succeeds (using LATERAL VIEW with single insert) set hive.mapred.mode=strict; FROM partition_test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2); -- Query 2 -- This succeeds (NOT using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM partition_test INSERT OVERWRITE DIRECTORY '/test/1' SELECT col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT col1 WHERE (part_col=2); -- Query 3 -- This fails (using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM partition_test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT exp_col1 WHERE (part_col=2); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3104) Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW
[ https://issues.apache.org/jira/browse/HIVE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537714#comment-13537714 ] cyril liao commented on HIVE-3104: -- ok Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW --- Key: HIVE-3104 URL: https://issues.apache.org/jira/browse/HIVE-3104 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.9.0 Environment: Apache Hive 0.9.0, Apache Hadoop 0.20.205.0 Reporter: Mark Grover Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. It also seems to work for multi-insert queries *not* using LATERAL VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW. Here are some examples. In the below examples, I make use of the fact that a query with no partition filtering when run under hive.mapred.mode=strict fails. --Table creation and population DROP TABLE IF EXISTS test; CREATE TABLE test (col1 arrayint, col2 int) PARTITIONED BY (part_col int); INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), count(*) FROM test; INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), count(*) FROM test; -- Query 1 -- This succeeds (using LATERAL VIEW with single insert) set hive.mapred.mode=strict; FROM partition_test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2); -- Query 2 -- This succeeds (NOT using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM partition_test INSERT OVERWRITE DIRECTORY '/test/1' SELECT col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT col1 WHERE (part_col=2); -- Query 3 -- This fails (using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM partition_test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT exp_col1 WHERE (part_col=2); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3827) LATERAL VIEW doesn't work with union all statement
cyril liao created HIVE-3827: Summary: LATERAL VIEW doesn't work with union all statement Key: HIVE-3827 URL: https://issues.apache.org/jira/browse/HIVE-3827 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0 Environment: hive0.9.0 hadoop 0.20.205 Reporter: cyril liao LATER VIEW lose data working with union all. query NO.1: SELECT 1 as from_pid, 1 as to_pid, cid as from_path, (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path, 0 as status FROM (SELECT union_map(c_map) AS c_map FROM (SELECT collect_map(id,parent_id)AS c_map FROM wl_channels GROUP BY id,parent_id )tmp )tmp2 LATERAL VIEW recursion_concat(c_map) a AS cid, pid this query returns about 1 rows ,and their status is 0. query NO.2: select a.from_pid as from_pid, a.to_pid as to_pid, a.from_path as from_path, a.to_path as to_path, a.status as status from wl_dc_channels a where a.status 0 this query returns about 100 rows ,and their status is 1 or 2. query NO.3: select from_pid, to_pid, from_path, to_path, status from ( SELECT 1 as from_pid, 1 as to_pid, cid as from_path, (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path, 0 as status FROM (SELECT union_map(c_map) AS c_map FROM (SELECT collect_map(id,parent_id)AS c_map FROM wl_channels GROUP BY id,parent_id )tmp )tmp2 LATERAL VIEW recursion_concat(c_map) a AS cid, pid union all select a.from_pid as from_pid, a.to_pid as to_pid, a.from_path as from_path, a.to_path as to_path, a.status as status from wl_dc_channels a where a.status 0 ) unin_tbl this query has the same result as query NO.2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1545) Add a bunch of UDFs and UDAFs
[ https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095738#comment-13095738 ] cyril liao commented on HIVE-1545: -- Neither in core.tar.gz nor ext.tar.gz,there is a class named com.facebook.hive.udf.lib.UDFUtils,which is used by many UDFs. In package com.facebook.hive.udf.lib ,only Counter and SetOps are included. Add a bunch of UDFs and UDAFs - Key: HIVE-1545 URL: https://issues.apache.org/jira/browse/HIVE-1545 Project: Hive Issue Type: New Feature Components: UDF Reporter: Jonathan Chang Assignee: Jonathan Chang Priority: Minor Attachments: core.tar.gz, ext.tar.gz, udfs.tar.gz, udfs.tar.gz Here some UD(A)Fs which can be incorporated into the Hive distribution: UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 5, 3) returns 1. UDFBucket - Find the bucket in which the first argument belongs. e.g., BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x b_{i} but = b_{i+1}. Returns 0 if x is smaller than all the buckets. UDFFindInArray - Finds the 1-index of the first element in the array given as the second argument. Returns 0 if not found. Returns NULL if either argument is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, array(1,2,3)) will return 0. UDFGreatCircleDist - Finds the great circle distance (in km) between two lat/long coordinates (in degrees). UDFLDA - Performs LDA inference on a vector given fixed topics. UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 whenever any of its parameters changes. UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 5. UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches in an array. UDFUnescape - Returns the string unescaped (using C/Java style unescaping). UDFWhich - Given a boolean array, return the indices which are TRUE. UDFJaccard UDAFCollect - Takes all the values associated with a row and converts it into a list. Make sure to have: set hive.map.aggr = false; UDAFCollectMap - Like collect except that it takes tuples and generates a map. UDAFEntropy - Compute the entropy of a column. UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two columns. UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value of VAL. UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated with the N (passed as the third parameter) largest values of VAL. UDAFHistogram -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1545) Add a bunch of UDFs and UDAFs
[ https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095106#comment-13095106 ] cyril liao commented on HIVE-1545: -- com.facebook.hive.udf.lib.UDFUtils is not included. Would you please upload it? Add a bunch of UDFs and UDAFs - Key: HIVE-1545 URL: https://issues.apache.org/jira/browse/HIVE-1545 Project: Hive Issue Type: New Feature Components: UDF Reporter: Jonathan Chang Assignee: Jonathan Chang Priority: Minor Attachments: core.tar.gz, ext.tar.gz, udfs.tar.gz, udfs.tar.gz Here some UD(A)Fs which can be incorporated into the Hive distribution: UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 5, 3) returns 1. UDFBucket - Find the bucket in which the first argument belongs. e.g., BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x b_{i} but = b_{i+1}. Returns 0 if x is smaller than all the buckets. UDFFindInArray - Finds the 1-index of the first element in the array given as the second argument. Returns 0 if not found. Returns NULL if either argument is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, array(1,2,3)) will return 0. UDFGreatCircleDist - Finds the great circle distance (in km) between two lat/long coordinates (in degrees). UDFLDA - Performs LDA inference on a vector given fixed topics. UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 whenever any of its parameters changes. UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 5. UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches in an array. UDFUnescape - Returns the string unescaped (using C/Java style unescaping). UDFWhich - Given a boolean array, return the indices which are TRUE. UDFJaccard UDAFCollect - Takes all the values associated with a row and converts it into a list. Make sure to have: set hive.map.aggr = false; UDAFCollectMap - Like collect except that it takes tuples and generates a map. UDAFEntropy - Compute the entropy of a column. UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two columns. UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value of VAL. UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated with the N (passed as the third parameter) largest values of VAL. UDAFHistogram -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira