[jira] [Commented] (HIVE-11327) HiveQL to HBase - Predicate Pushdown for composite key not working
[ https://issues.apache.org/jira/browse/HIVE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641834#comment-14641834 ] Swarnim Kulkarni commented on HIVE-11327: - [~yzuehlke] Thanks for logging this. This is expected behavior. The support for predicate pushdown for simple delimited composite keys is not yet there in hive. One solution is to instead treat your keys as a complex composite key and provide a custom implementation for that. In that way, you should be able to take advantage of the hbase filters to make your queries run much faster. Please refer to the documentation here for further details[1] [1] https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-ComplexCompositeRowKeysandHBaseKeyFactory > HiveQL to HBase - Predicate Pushdown for composite key not working > -- > > Key: HIVE-11327 > URL: https://issues.apache.org/jira/browse/HIVE-11327 > Project: Hive > Issue Type: Bug > Components: HBase Handler, Hive >Affects Versions: 0.14.0 >Reporter: Yannik Zuehlke >Priority: Blocker > > I am using Hive 0.14 and Hbase 0.98.8 I would like to use HiveQL for > accessing a HBase "table". > I created a table with a complex composite rowkey: > > {quote} > CREATE EXTERNAL TABLE db.hive_hbase (rowkey struct p3:string>, column1 string, column2 string) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' > COLLECTION ITEMS TERMINATED BY ';' > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = > ":key,cf:c1,cf:c2") > TBLPROPERTIES("hbase.table.name"="hbase_table"); > {quote} > > The table is getting successfully created, but the HiveQL query is taking > forever: > > {quote} > SELECT * from db.hive_hbase WHERE rowkey.p1 = 'xyz'; > {quote} > > I am working with 1 TB of data (around 1,5 bn records) and this queries takes > forever (It ran over night, but did not finish in the morning). > I changed the log4j properties to 'DEBUG' and found some interesting > information: > > {quote} > 2015-07-15 15:56:41,232 INFO ppd.OpProcFactory > (OpProcFactory.java:logExpr(823)) - Pushdown Predicates of FIL For Alias : > hive_hbase > 2015-07-15 15:56:41,232 INFO ppd.OpProcFactory > (OpProcFactory.java:logExpr(826)) - (rowkey.p1 = 'xyz') > {quote} > > But some lines later: > > {quote} > 2015-07-15 15:56:41,430 DEBUG ppd.OpProcFactory > (OpProcFactory.java:pushFilterToStorageHandler(1051)) - No pushdown possible > for predicate: (rowkey.p1 = 'xyz') > {quote} > > So my guess is: HiveQL over HBase does not do any predicate pushdown but > starts a MapReduce job. > The normal HBase scan (via the HBase Shell) takes around 5 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11329) Column prefix in key of hbase column prefix map
[ https://issues.apache.org/jira/browse/HIVE-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641832#comment-14641832 ] Swarnim Kulkarni commented on HIVE-11329: - [~woj_in] Thanks for the patch! I am pretty sure I am missing something here but would you mind explaining with an example to me as to what problems is having a prefix in the column name causing? The reason I ask is this needs to be consistently applied in all the other cases as well if needed, for example when reading all columns of hbase etc. > Column prefix in key of hbase column prefix map > --- > > Key: HIVE-11329 > URL: https://issues.apache.org/jira/browse/HIVE-11329 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 0.14.0 >Reporter: Wojciech Indyk >Assignee: Wojciech Indyk >Priority: Minor > Attachments: HIVE-11329.1.patch > > > When I create a table with hbase column prefix > https://issues.apache.org/jira/browse/HIVE-3725 I have the prefix in result > map in hive. > E.g. record in HBase > rowkey: 123 > column: tag_one, value: 0.5 > column: tag_two, value 0.5 > representation in Hive via column prefix mapping "tag_.*": > column: tag map > key: tag_one, value: 0.5 > key: tag_two, value: 0.5 > should be: > key: one, value: 0.5 > key: two: value: 0.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641830#comment-14641830 ] Lefty Leverenz commented on HIVE-11055: --- [~hoffmann99], the Hive user mailing list might be a better place to post your questions and usage problems with HPL/SQL. This JIRA issue has already been resolved, so any bugs would require new JIRA issues. Besides, other people on the u...@hive.apache.org list would probably be interested in your comments but these JIRA comments only go to the d...@hive.apache.org list. Here's information about the mailing lists: http://hive.apache.org/mailing_lists.html. > HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) > --- > > Key: HIVE-11055 > URL: https://issues.apache.org/jira/browse/HIVE-11055 > Project: Hive > Issue Type: Improvement >Reporter: Dmitry Tolpeko >Assignee: Dmitry Tolpeko > Fix For: 2.0.0 > > Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, > HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml > > > There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive > (actually any SQL-on-Hadoop implementation and any JDBC source). > Alan Gates offered to contribute it to Hive under HPL/SQL name > (org.apache.hive.hplsql package). This JIRA is to create a patch to > contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11371) Null pointer exception for nested table query when using ORC versus text
[ https://issues.apache.org/jira/browse/HIVE-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11371: --- Component/s: Vectorization > Null pointer exception for nested table query when using ORC versus text > > > Key: HIVE-11371 > URL: https://issues.apache.org/jira/browse/HIVE-11371 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 1.2.0 >Reporter: N Campbell > Attachments: TJOIN1, TJOIN2, TJOIN3, TJOIN4 > > > Following query will fail if the file format is ORC > select tj1rnum, tj2rnum, tjoin3.rnum as rnumt3 from (select tjoin1.rnum > tj1rnum, tjoin2.rnum tj2rnum, tjoin2.c1 tj2c1 from tjoin1 left outer join > tjoin2 on tjoin1.c1 = tjoin2.c1 ) tj left outer join tjoin3 on tj2c1 = > tjoin3.c1 > aused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow$LongCopyRow.copy(VectorCopyRow.java:60) > at > org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.copyByReference(VectorCopyRow.java:260) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.generateHashMapResultMultiValue(VectorMapJoinGenerateResultOperator.java:238) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinOuterGenerateResultOperator.finishOuter(VectorMapJoinOuterGenerateResultOperator.java:495) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinOuterLongOperator.process(VectorMapJoinOuterLongOperator.java:430) > ... 22 more > ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 > killedTasks:0, Vertex vertex_1437788144883_0004_2_02 [Map 1] killed/failed > due to:null]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 > killedVertices:0 > SQLState: 08S01 > ErrorCode: 2 > getDatabaseProductNameApache Hive > getDatabaseProductVersion 1.2.1.2.3.0.0-2557 > getDriverName Hive JDBC > getDriverVersion 1.2.1.2.3.0.0-2557 > getDriverMajorVersion 1 > getDriverMinorVersion 2 > create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) > -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS orc; > create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) > -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS orc ; > create table if not exists TJOIN3 (RNUM int , C1 int, C2 char(2)) > -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS orc ; > create table if not exists TJOIN4 (RNUM int , C1 int, C2 char(2)) > -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS orc ; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11373) Incorrect (de)serialization STRING field to MAP in TRANSFORM operation
[ https://issues.apache.org/jira/browse/HIVE-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] eugeny birukov updated HIVE-11373: -- Description: I try transform json string to Map using python code: for d in sys.stdin: r=re.sub('[:,]', '\003', re.sub('[{}\"]','',d)) print r.strip() echo '{"key1":"valu1","key2":"value2"}' | ./json2map.py key1valu1key2value2 It's string must transform to HIVE type MAP But transformation result view as {"key1":"valu1\u0003key2\u0003value2"} With one key-value entry work fine: hive> SELECT TRANSFORM ('{"key1":"valu1"}') USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json; ... {"key1":"valu1"} Time taken: 35.177 seconds, Fetched: 1 row(s) With many key-value entry work incorrect: hive> SELECT TRANSFORM ('{"key1":"valu1","key2":"value2"}') USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json; ... {"key1":"valu1\u0003key2\u0003value2"} Time taken: 33.486 seconds, Fetched: 1 row(s) Steps for full reproduce: echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/json.txt' overwrite into table json;" hive -e "SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;" converting to local s3://webgames-emr/hive/restore/json2map.py Added resources: [s3://webgames-emr/hive/restore/json2map.py] Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1437833808701_0006, Tracking URL = http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0% 2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job = job_1437833808701_0006 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write: 25 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK {"key1":"valu1\u0003key2\u0003value2"} Time taken: 48.878 seconds, Fetched: 1 row(s) Expected Result {"key1":"valu1","key2":"value2"} Actual Result {"key1":"valu1\u0003key2\u0003value2"} was: I try transform json string to Map using python code: for d in sys.stdin: r=re.sub('[:,]', '\003', re.sub('[{}\"]','',d)) print r.strip() echo '{"key1":"valu1","key2":"value2"}' | ./json2map.py key1valu1key2value2 It's string must transform to HIVE type MAP But transformation result view as {"key1":"valu1\u0003key2\u0003value2"} Steps for reproduce: echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/json.txt' overwrite into table json;" hive -e "SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;" converting to local s3://webgames-emr/hive/restore/json2map.py Added resources: [s3://webgames-emr/hive/restore/json2map.py] Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1437833808701_0006, Tracking URL = http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0% 2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job = job_1437833808701_0006 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write: 25 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK {"key1":"valu1\u0003key2\u0003value2"} Time taken: 48.878 seconds, Fetched: 1 row(s) Expected Result {"key1":"valu1","key2":"value2"} Actual Result {"key1":"valu1\u0003key2\u0003value2"} > Incorrect (de)serialization STRING field to MAP in TRANSFORM > operation > -- > > Key: HIVE-11373 > URL: https://issues.apache.org/jira/browse/HIVE-11373 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.1, 1.0.0 > Environment: Amazon EMR (AMI 3.8 with HIVE 0.13.1, emr-4.0.0 with > HIVE 1.0) >Reporter: eugeny birukov > > I try transform json
[jira] [Updated] (HIVE-11373) Incorrect (de)serialization STRING field to MAP in TRANSFORM operation
[ https://issues.apache.org/jira/browse/HIVE-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] eugeny birukov updated HIVE-11373: -- Description: I try transform json string to Map using python code: for d in sys.stdin: r=re.sub('[:,]', '\003', re.sub('[{}\"]','',d)) print r.strip() echo '{"key1":"valu1","key2":"value2"}' | ./json2map.py key1valu1key2value2 It's string must transform to HIVE type MAP But transformation result view as {"key1":"valu1\u0003key2\u0003value2"} Steps for reproduce: echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/json.txt' overwrite into table json;" hive -e "SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;" converting to local s3://webgames-emr/hive/restore/json2map.py Added resources: [s3://webgames-emr/hive/restore/json2map.py] Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1437833808701_0006, Tracking URL = http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0% 2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job = job_1437833808701_0006 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write: 25 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK {"key1":"valu1\u0003key2\u0003value2"} Time taken: 48.878 seconds, Fetched: 1 row(s) Expected Result {"key1":"valu1","key2":"value2"} Actual Result {"key1":"valu1\u0003key2\u0003value2"} was: I try transform json string to Map using python code import sys,re for d in sys.stdin: r=d.replace('{','').replace('}','').replace('"','') r=re.sub('[:,]', '\003', r) print r.strip() Steps for reproduce: echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/json.txt' overwrite into table json;" hive -e "SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;" converting to local s3://webgames-emr/hive/restore/json2map.py Added resources: [s3://webgames-emr/hive/restore/json2map.py] Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1437833808701_0006, Tracking URL = http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0% 2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job = job_1437833808701_0006 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write: 25 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK {"key1":"valu1\u0003key2\u0003value2"} Time taken: 48.878 seconds, Fetched: 1 row(s) Expected Result {"key1":"valu1","key2":"value2"} Actual Result {"key1":"valu1\u0003key2\u0003value2"} > Incorrect (de)serialization STRING field to MAP in TRANSFORM > operation > -- > > Key: HIVE-11373 > URL: https://issues.apache.org/jira/browse/HIVE-11373 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.1, 1.0.0 > Environment: Amazon EMR (AMI 3.8 with HIVE 0.13.1, emr-4.0.0 with > HIVE 1.0) >Reporter: eugeny birukov > > I try transform json string to Map using python code: > for d in sys.stdin: > r=re.sub('[:,]', '\003', re.sub('[{}\"]','',d)) > print r.strip() > echo '{"key1":"valu1","key2":"value2"}' | ./json2map.py > key1valu1key2value2 > It's string must transform to HIVE type MAP > But transformation result view as {"key1":"valu1\u0003key2\u0003value2"} > Steps for reproduce: > echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; > hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath > '/tmp/json.txt' overwrite into table json;" > hive -e "SELECT TRANSFORM (jsonStr) USING > 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP STRING>) FROM json;" > converting to loca
[jira] [Updated] (HIVE-11296) Merge from master to spark branch [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-11296: Attachment: (was: HIVE-11296.1-spark.patch) > Merge from master to spark branch [Spark Branch] > > > Key: HIVE-11296 > URL: https://issues.apache.org/jira/browse/HIVE-11296 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-11296-1.spark.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11296) Merge from master to spark branch [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-11296: Attachment: HIVE-11296-1.spark.patch > Merge from master to spark branch [Spark Branch] > > > Key: HIVE-11296 > URL: https://issues.apache.org/jira/browse/HIVE-11296 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-11296-1.spark.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11373) Incorrect (de)serialization STRING field to MAP in TRANSFORM operation
[ https://issues.apache.org/jira/browse/HIVE-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] eugeny birukov updated HIVE-11373: -- Description: I try transform json string to Map using python code import sys,re for d in sys.stdin: r=d.replace('{','').replace('}','').replace('"','') r=re.sub('[:,]', '\003', r) print r.strip() Steps for reproduce: echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/json.txt' overwrite into table json;" hive -e "SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;" converting to local s3://webgames-emr/hive/restore/json2map.py Added resources: [s3://webgames-emr/hive/restore/json2map.py] Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1437833808701_0006, Tracking URL = http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0% 2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job = job_1437833808701_0006 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write: 25 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK {"key1":"valu1\u0003key2\u0003value2"} Time taken: 48.878 seconds, Fetched: 1 row(s) Expected Result {"key1":"valu1","key2":"value2"} Actual Result {"key1":"valu1\u0003key2\u0003value2"} was: I try transform json string to Map using python code import sys,re for d in sys.stdin: r=d.replace('{','').replace('}','').replace('"','') r=re.sub('[:,]', '\003', r) print r.strip() echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/json.txt' overwrite into table json;" hive -e "SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;" converting to local s3://webgames-emr/hive/restore/json2map.py Added resources: [s3://webgames-emr/hive/restore/json2map.py] Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1437833808701_0006, Tracking URL = http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0% 2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job = job_1437833808701_0006 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write: 25 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK {"key1":"valu1\u0003key2\u0003value2"} Time taken: 48.878 seconds, Fetched: 1 row(s) Expected Result {"key1":"valu1","key2":"value2"} Actual Result {"key1":"valu1\u0003key2\u0003value2"} > Incorrect (de)serialization STRING field to MAP in TRANSFORM > operation > -- > > Key: HIVE-11373 > URL: https://issues.apache.org/jira/browse/HIVE-11373 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.1, 1.0.0 > Environment: Amazon EMR (AMI 3.8 with HIVE 0.13.1, emr-4.0.0 with > HIVE 1.0) >Reporter: eugeny birukov > > I try transform json string to Map using python code > import sys,re > for d in sys.stdin: > r=d.replace('{','').replace('}','').replace('"','') > r=re.sub('[:,]', '\003', r) > print r.strip() > Steps for reproduce: > echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; > hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath > '/tmp/json.txt' overwrite into table json;" > hive -e "SELECT TRANSFORM (jsonStr) USING > 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP STRING>) FROM json;" > converting to local s3://webgames-emr/hive/restore/json2map.py > Added resources: [s3://webgames-emr/hive/restore/json2map.py] > Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722 > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_1437833808701_000
[jira] [Updated] (HIVE-11373) Incorrect (de)serialization STRING field to MAP in TRANSFORM operation
[ https://issues.apache.org/jira/browse/HIVE-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] eugeny birukov updated HIVE-11373: -- Description: I try transform json string to Map using python code import sys,re for d in sys.stdin: r=d.replace('{','').replace('}','').replace('"','') r=re.sub('[:,]', '\003', r) print r.strip() echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/json.txt' overwrite into table json;" hive -e "SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;" converting to local s3://webgames-emr/hive/restore/json2map.py Added resources: [s3://webgames-emr/hive/restore/json2map.py] Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1437833808701_0006, Tracking URL = http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0% 2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job = job_1437833808701_0006 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write: 25 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK {"key1":"valu1\u0003key2\u0003value2"} Time taken: 48.878 seconds, Fetched: 1 row(s) Expected Result {"key1":"valu1","key2":"value2"} Actual Result {"key1":"valu1\u0003key2\u0003value2"} was: I try transform json string to Map using python code import sys,re for d in sys.stdin: r=d.replace('{','').replace('}','').replace('"','') r=re.sub('[:,]', '\003', r) print r.strip() echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/json.txt' overwrite into table json;" hive -e "CREATE TABLE d(jsondata MAP); SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;" converting to local s3://webgames-emr/hive/restore/json2map.py Added resources: [s3://webgames-emr/hive/restore/json2map.py] Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1437833808701_0006, Tracking URL = http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0% 2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job = job_1437833808701_0006 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write: 25 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK {"key1":"valu1\u0003key2\u0003value2"} Time taken: 48.878 seconds, Fetched: 1 row(s) Expected Result {"key1":"valu1","key2":"value2"} Actual Result {"key1":"valu1\u0003key2\u0003value2"} > Incorrect (de)serialization STRING field to MAP in TRANSFORM > operation > -- > > Key: HIVE-11373 > URL: https://issues.apache.org/jira/browse/HIVE-11373 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.1, 1.0.0 > Environment: Amazon EMR (AMI 3.8 with HIVE 0.13.1, emr-4.0.0 with > HIVE 1.0) >Reporter: eugeny birukov > > I try transform json string to Map using python code > import sys,re > for d in sys.stdin: > r=d.replace('{','').replace('}','').replace('"','') > r=re.sub('[:,]', '\003', r) > print r.strip() > echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; > hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath > '/tmp/json.txt' overwrite into table json;" > hive -e "SELECT TRANSFORM (jsonStr) USING > 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP STRING>) FROM json;" > converting to local s3://webgames-emr/hive/restore/json2map.py > Added resources: [s3://webgames-emr/hive/restore/json2map.py] > Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722 > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_1437833808701_0006, Tracking U
[jira] [Updated] (HIVE-10171) Create a storage-api module
[ https://issues.apache.org/jira/browse/HIVE-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-10171: Description: To support high performance file formats, I'd like to propose that we move the minimal set of classes that are required to integrate with Hive into a new module named "storage-api". This module will include VectorizedRowBatch, the various ColumnVector classes, and the SARG classes. It will form the start of an API that high performance storage formats can use to integrate with Hive. Both ORC and Parquet can use the new API to support vectorization and SARGs without performance destroying shims. (was: To support high performance file formats, I'd like to propose that we move the minimal set of classes that are required to integrate with Hive in to a new module named "storage-api". This module will include VectorizedRowBatch, the various ColumnVector classes, and the SARG classes. It will form the start of an API that high performance storage formats can use to integrate with Hive. Both ORC and Parquet can use the new API to support vectorization and SARGs without performance destroying shims.) > Create a storage-api module > --- > > Key: HIVE-10171 > URL: https://issues.apache.org/jira/browse/HIVE-10171 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.0.0 > > > To support high performance file formats, I'd like to propose that we move > the minimal set of classes that are required to integrate with Hive into a > new module named "storage-api". This module will include VectorizedRowBatch, > the various ColumnVector classes, and the SARG classes. It will form the > start of an API that high performance storage formats can use to integrate > with Hive. Both ORC and Parquet can use the new API to support vectorization > and SARGs without performance destroying shims. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9873) Hive on MR throws DeprecatedParquetHiveInput exception
[ https://issues.apache.org/jira/browse/HIVE-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavas Garg updated HIVE-9873: - Component/s: Hive > Hive on MR throws DeprecatedParquetHiveInput exception > -- > > Key: HIVE-9873 > URL: https://issues.apache.org/jira/browse/HIVE-9873 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 1.2.0 > > Attachments: HIVE-9873.1.patch > > > The following error is thrown when information about columns is changed on > {{projectionPusher.pushProjectionsAndFilters}}. > {noformat} > 2015-02-26 15:56:40,275 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.io.IOException: java.io.IOException: > java.io.IOException: DeprecatedParquetHiveInput : size of object differs. > Value size : 23, Current Object size : 29 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:226) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:136) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.io.IOException: java.io.IOException: > DeprecatedParquetHiveInput : size of object differs. Value size : 23, > Current Object size : 29 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:105) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:224) > ... 11 more > Caused by: java.io.IOException: DeprecatedParquetHiveInput : size of object > differs. Value size : 23, Current Object size : 29 > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:199) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:52) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 15 more > {noformat} > The bug is in {{ParquetRecordReaderWrapper}}. We store metastore such as the > list of columns in the {{Configuration/JobConf}}. The issue is that this > metadata is incorrect until the call to > {{projectionPusher.pushProjectionsAndFilters}}. In the current codebase we > don't use the configuration object returned from > {{projectionPusher.pushProjectionsAndFilters}} in other sections of code such > as creation and initialization of {{realReader}}. The end result is that > parquet is given an empty read schema and returns all nulls. Since the join > key is null, no records are joined. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11372) join with betwee predicate comparing integer types returns no rows when ORC format used
[ https://issues.apache.org/jira/browse/HIVE-11372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] N Campbell updated HIVE-11372: -- Attachment: TSINT TINT > join with betwee predicate comparing integer types returns no rows when ORC > format used > --- > > Key: HIVE-11372 > URL: https://issues.apache.org/jira/browse/HIVE-11372 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: N Campbell > Attachments: TINT, TSINT > > > getDatabaseProductNameApache Hive > getDatabaseProductVersion 1.2.1.2.3.0.0-2557 > getDriverName Hive JDBC > getDriverVersion 1.2.1.2.3.0.0-2557 > getDriverMajorVersion 1 > getDriverMinorVersion 2 > select tint.rnum, tsint.rnum from tint , tsint where tint.cint between > tsint.csint and tsint.csint > when ORC used no rows returned versus TEXT > create table if not exists TSINT ( RNUM int , CSINT smallint ) > -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS orc ; > create table if not exists TINT ( RNUM int , CINT int ) > -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS orc ; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11371) Null pointer exception for nested table query when using ORC versus text
[ https://issues.apache.org/jira/browse/HIVE-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] N Campbell updated HIVE-11371: -- Attachment: TJOIN4 TJOIN3 TJOIN2 TJOIN1 > Null pointer exception for nested table query when using ORC versus text > > > Key: HIVE-11371 > URL: https://issues.apache.org/jira/browse/HIVE-11371 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: N Campbell > Attachments: TJOIN1, TJOIN2, TJOIN3, TJOIN4 > > > Following query will fail if the file format is ORC > select tj1rnum, tj2rnum, tjoin3.rnum as rnumt3 from (select tjoin1.rnum > tj1rnum, tjoin2.rnum tj2rnum, tjoin2.c1 tj2c1 from tjoin1 left outer join > tjoin2 on tjoin1.c1 = tjoin2.c1 ) tj left outer join tjoin3 on tj2c1 = > tjoin3.c1 > aused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow$LongCopyRow.copy(VectorCopyRow.java:60) > at > org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.copyByReference(VectorCopyRow.java:260) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.generateHashMapResultMultiValue(VectorMapJoinGenerateResultOperator.java:238) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinOuterGenerateResultOperator.finishOuter(VectorMapJoinOuterGenerateResultOperator.java:495) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinOuterLongOperator.process(VectorMapJoinOuterLongOperator.java:430) > ... 22 more > ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 > killedTasks:0, Vertex vertex_1437788144883_0004_2_02 [Map 1] killed/failed > due to:null]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 > killedVertices:0 > SQLState: 08S01 > ErrorCode: 2 > getDatabaseProductNameApache Hive > getDatabaseProductVersion 1.2.1.2.3.0.0-2557 > getDriverName Hive JDBC > getDriverVersion 1.2.1.2.3.0.0-2557 > getDriverMajorVersion 1 > getDriverMinorVersion 2 > create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) > -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS orc; > create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) > -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS orc ; > create table if not exists TJOIN3 (RNUM int , C1 int, C2 char(2)) > -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS orc ; > create table if not exists TJOIN4 (RNUM int , C1 int, C2 char(2)) > -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS orc ; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641539#comment-14641539 ] wangchangchun commented on HIVE-11055: -- FOURTH: Create procedure() begin INSERT INTO TMP_EDR_MAX_CONSPT1 SELECT STARTTIME, SERVICENAME,SUBSCRIBERSN,SUBSCRIBEDATETIME,VALIDFROMDATETIME, EXPIREDATETIME FROM TDR_PCC_SUBSCRIPTION end; FIfth: create procedure testexception() begin DECLARE booknum int; total int; percent int; SET booknum = 10; SET total = 0; SET percent = booknum / total; EXCEPTION WHEN OTHERS THEN DBMS_OUTPUT.PUT_LINE('Error'); end; CALL testexception(); > HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) > --- > > Key: HIVE-11055 > URL: https://issues.apache.org/jira/browse/HIVE-11055 > Project: Hive > Issue Type: Improvement >Reporter: Dmitry Tolpeko >Assignee: Dmitry Tolpeko > Fix For: 2.0.0 > > Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, > HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml > > > There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive > (actually any SQL-on-Hadoop implementation and any JDBC source). > Alan Gates offered to contribute it to Hive under HPL/SQL name > (org.apache.hive.hplsql package). This JIRA is to create a patch to > contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641534#comment-14641534 ] wangchangchun commented on HIVE-11055: -- Hello, I try to use HPL/SQL function. AND most of this function is OK. I put some problem I found here, you can solve it later. First: CREATE TABLE (a int, b int); create or replace testinsertinto() BEGIN INSERT INTO values (50,10); END; CALL testinsertinto(); Second: create or replace procedure testinto() BEGIN declare v_dtlTime DECIMAL(18,0); select top 1 starttime into v_dtlTime from TDR_PCC_SUBQUOTA_17000; PRINT v_dtlTime; END; CALL testinto(); Third: create procedure testwf() begin SELECT dept, userid, sal, CUME_DIST() OVER(ORDER BY sal) AS rn1, CUME_DIST() OVER(PARTITION BY dept ORDER BY sal) AS rn2 FROM lxw1234; end; call testwf(); > HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) > --- > > Key: HIVE-11055 > URL: https://issues.apache.org/jira/browse/HIVE-11055 > Project: Hive > Issue Type: Improvement >Reporter: Dmitry Tolpeko >Assignee: Dmitry Tolpeko > Fix For: 2.0.0 > > Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, > HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml > > > There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive > (actually any SQL-on-Hadoop implementation and any JDBC source). > Alan Gates offered to contribute it to Hive under HPL/SQL name > (org.apache.hive.hplsql package). This JIRA is to create a patch to > contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11271) java.lang.IndexOutOfBoundsException when union all with if function
[ https://issues.apache.org/jira/browse/HIVE-11271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641449#comment-14641449 ] Hive QA commented on HIVE-11271: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12747155/HIVE-11271.4.patch {color:green}SUCCESS:{color} +1 9259 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4716/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4716/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4716/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12747155 - PreCommit-HIVE-TRUNK-Build > java.lang.IndexOutOfBoundsException when union all with if function > --- > > Key: HIVE-11271 > URL: https://issues.apache.org/jira/browse/HIVE-11271 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 0.14.0, 1.0.0, 1.2.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-11271.1.patch, HIVE-11271.2.patch, > HIVE-11271.3.patch, HIVE-11271.4.patch > > > Some queries with Union all as subquery fail in MapReduce task with > stacktrace: > {noformat} > 15/07/15 14:19:30 [pool-13-thread-1]: INFO exec.UnionOperator: Initializing > operator UNION[104] > 15/07/15 14:19:30 [Thread-72]: INFO mapred.LocalJobRunner: Map task executor > complete. > 15/07/15 14:19:30 [Thread-72]: WARN mapred.LocalJobRunner: > job_local826862759_0005 > java.lang.Exception: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) > Caused by: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 10 more > Caused by: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) > ... 14 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 17 more > Caused by: java.lang.RuntimeException: Map operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140) > ... 21 more > Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.rangeCheck(ArrayList.java:635) > at java.util.ArrayList.get(ArrayList.java:411) > at > org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) > at org.apache.hadoop.h