[jira] [Commented] (HIVE-11327) HiveQL to HBase - Predicate Pushdown for composite key not working

2015-07-25 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641834#comment-14641834
 ] 

Swarnim Kulkarni commented on HIVE-11327:
-

[~yzuehlke] Thanks for logging this. This is expected behavior. The support for 
predicate pushdown for simple delimited composite keys is not yet there in 
hive. One solution is to instead treat your keys as a complex composite key and 
provide a custom implementation for that. In that way, you should be able to 
take advantage of the hbase filters to make your queries run much faster. 
Please refer to the documentation here for further details[1]

[1] 
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-ComplexCompositeRowKeysandHBaseKeyFactory

> HiveQL to HBase - Predicate Pushdown for composite key not working
> --
>
> Key: HIVE-11327
> URL: https://issues.apache.org/jira/browse/HIVE-11327
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler, Hive
>Affects Versions: 0.14.0
>Reporter: Yannik Zuehlke
>Priority: Blocker
>
> I am using Hive 0.14 and Hbase 0.98.8 I would like to use HiveQL for 
> accessing a HBase "table".
> I created a table with a complex composite rowkey:
> 
> {quote}
> CREATE EXTERNAL TABLE db.hive_hbase (rowkey struct p3:string>, column1 string, column2 string) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> COLLECTION ITEMS TERMINATED BY ';'
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = 
> ":key,cf:c1,cf:c2")
> TBLPROPERTIES("hbase.table.name"="hbase_table");
> {quote}
> 
> The table is getting successfully created, but the HiveQL query is taking 
> forever:
> 
> {quote}
> SELECT * from db.hive_hbase WHERE rowkey.p1 = 'xyz';
> {quote}
> 
> I am working with 1 TB of data (around 1,5 bn records) and this queries takes 
> forever (It ran over night, but did not finish in the morning).
> I changed the log4j properties to 'DEBUG' and found some interesting 
> information:
> 
> {quote}
> 2015-07-15 15:56:41,232 INFO  ppd.OpProcFactory
> (OpProcFactory.java:logExpr(823)) - Pushdown Predicates of FIL For Alias : 
> hive_hbase
> 2015-07-15 15:56:41,232 INFO  ppd.OpProcFactory 
> (OpProcFactory.java:logExpr(826)) - (rowkey.p1 = 'xyz')
> {quote}
> 
> But some lines later:
> 
> {quote}
> 2015-07-15 15:56:41,430 DEBUG ppd.OpProcFactory 
> (OpProcFactory.java:pushFilterToStorageHandler(1051)) - No pushdown possible 
> for predicate:  (rowkey.p1 = 'xyz')
> {quote}
> 
> So my guess is: HiveQL over HBase does not do any predicate pushdown but 
> starts a MapReduce job.
> The normal HBase scan (via the HBase Shell) takes around 5 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11329) Column prefix in key of hbase column prefix map

2015-07-25 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641832#comment-14641832
 ] 

Swarnim Kulkarni commented on HIVE-11329:
-

[~woj_in] Thanks for the patch! I am pretty sure I am missing something here 
but would you mind explaining with an example to me as to what problems is 
having a prefix in the column name causing? The reason I ask is this needs to 
be consistently applied in all the other cases as well if needed, for example 
when reading all columns of hbase etc.

> Column prefix in key of hbase column prefix map
> ---
>
> Key: HIVE-11329
> URL: https://issues.apache.org/jira/browse/HIVE-11329
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.14.0
>Reporter: Wojciech Indyk
>Assignee: Wojciech Indyk
>Priority: Minor
> Attachments: HIVE-11329.1.patch
>
>
> When I create a table with hbase column prefix 
> https://issues.apache.org/jira/browse/HIVE-3725 I have the prefix in result 
> map in hive. 
> E.g. record in HBase
> rowkey: 123
> column: tag_one, value: 0.5
> column: tag_two, value 0.5
> representation in Hive via column prefix mapping "tag_.*":
> column: tag map
> key: tag_one, value: 0.5
> key: tag_two, value: 0.5
> should be:
> key: one, value: 0.5
> key: two: value: 0.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)

2015-07-25 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641830#comment-14641830
 ] 

Lefty Leverenz commented on HIVE-11055:
---

[~hoffmann99], the Hive user mailing list might be a better place to post your 
questions and usage problems with HPL/SQL.  This JIRA issue has already been 
resolved, so any bugs would require new JIRA issues.  Besides, other people on 
the u...@hive.apache.org list would probably be interested in your comments but 
these JIRA comments only go to the d...@hive.apache.org list.

Here's information about the mailing lists:  
http://hive.apache.org/mailing_lists.html.


> HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
> ---
>
> Key: HIVE-11055
> URL: https://issues.apache.org/jira/browse/HIVE-11055
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dmitry Tolpeko
>Assignee: Dmitry Tolpeko
> Fix For: 2.0.0
>
> Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
> HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml
>
>
> There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
> (actually any SQL-on-Hadoop implementation and any JDBC source).
> Alan Gates offered to contribute it to Hive under HPL/SQL name 
> (org.apache.hive.hplsql package). This JIRA is to create a patch to 
> contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11371) Null pointer exception for nested table query when using ORC versus text

2015-07-25 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11371:
---
Component/s: Vectorization

> Null pointer exception for nested table query when using ORC versus text
> 
>
> Key: HIVE-11371
> URL: https://issues.apache.org/jira/browse/HIVE-11371
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.2.0
>Reporter: N Campbell
> Attachments: TJOIN1, TJOIN2, TJOIN3, TJOIN4
>
>
> Following query will fail if the file format is ORC 
> select tj1rnum, tj2rnum, tjoin3.rnum as rnumt3 from   (select tjoin1.rnum 
> tj1rnum, tjoin2.rnum tj2rnum, tjoin2.c1 tj2c1  from tjoin1 left outer join 
> tjoin2 on tjoin1.c1 = tjoin2.c1 ) tj  left outer join tjoin3 on tj2c1 = 
> tjoin3.c1 
> aused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow$LongCopyRow.copy(VectorCopyRow.java:60)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.copyByReference(VectorCopyRow.java:260)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.generateHashMapResultMultiValue(VectorMapJoinGenerateResultOperator.java:238)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinOuterGenerateResultOperator.finishOuter(VectorMapJoinOuterGenerateResultOperator.java:495)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinOuterLongOperator.process(VectorMapJoinOuterLongOperator.java:430)
>   ... 22 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
> killedTasks:0, Vertex vertex_1437788144883_0004_2_02 [Map 1] killed/failed 
> due to:null]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 
> killedVertices:0
> SQLState:  08S01
> ErrorCode: 2
> getDatabaseProductNameApache Hive
> getDatabaseProductVersion 1.2.1.2.3.0.0-2557
> getDriverName Hive JDBC
> getDriverVersion  1.2.1.2.3.0.0-2557
> getDriverMajorVersion 1
> getDriverMinorVersion 2
> create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
> -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS orc;
> create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
> -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS orc ;
> create table  if not exists TJOIN3 (RNUM int , C1 int, C2 char(2))
> -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS orc ;
> create table  if not exists TJOIN4 (RNUM int , C1 int, C2 char(2))
> -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS orc ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11373) Incorrect (de)serialization STRING field to MAP in TRANSFORM operation

2015-07-25 Thread eugeny birukov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugeny birukov updated HIVE-11373:
--
Description: 
I try transform json string to Map using python code:

for d in sys.stdin:
 r=re.sub('[:,]', '\003', re.sub('[{}\"]','',d))
 print r.strip()

echo '{"key1":"valu1","key2":"value2"}' | ./json2map.py 
key1valu1key2value2

It's string must transform to HIVE type MAP
But  transformation result view as  {"key1":"valu1\u0003key2\u0003value2"}

With one key-value entry work fine:

hive> SELECT TRANSFORM ('{"key1":"valu1"}') USING 
's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;
...
{"key1":"valu1"}
Time taken: 35.177 seconds, Fetched: 1 row(s)

With many key-value entry work incorrect:

hive> SELECT TRANSFORM ('{"key1":"valu1","key2":"value2"}') USING 
's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;
...
{"key1":"valu1\u0003key2\u0003value2"}
Time taken: 33.486 seconds, Fetched: 1 row(s)

Steps for full reproduce:

echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;

hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath 
'/tmp/json.txt' overwrite into table json;"

hive -e "SELECT TRANSFORM (jsonStr) USING 
's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;"

converting to local s3://webgames-emr/hive/restore/json2map.py
Added resources: [s3://webgames-emr/hive/restore/json2map.py]
Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1437833808701_0006, Tracking URL = 
http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1437833808701_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-25 15:01:16,773 Stage-1 map = 0%,  reduce = 0%
2015-07-25 15:01:34,319 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.96 
sec
MapReduce Total cumulative CPU time: 1 seconds 960 msec
Ended Job = job_1437833808701_0006
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.96 sec   HDFS Read: 261 HDFS Write: 
25 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 960 msec
OK
{"key1":"valu1\u0003key2\u0003value2"}
Time taken: 48.878 seconds, Fetched: 1 row(s)

Expected Result {"key1":"valu1","key2":"value2"}

Actual Result {"key1":"valu1\u0003key2\u0003value2"}

  was:
I try transform json string to Map using python code:

for d in sys.stdin:
 r=re.sub('[:,]', '\003', re.sub('[{}\"]','',d))
 print r.strip()

echo '{"key1":"valu1","key2":"value2"}' | ./json2map.py 
key1valu1key2value2

It's string must transform to HIVE type MAP
But  transformation result view as  {"key1":"valu1\u0003key2\u0003value2"}

Steps for reproduce:

echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;

hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath 
'/tmp/json.txt' overwrite into table json;"

hive -e "SELECT TRANSFORM (jsonStr) USING 
's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;"

converting to local s3://webgames-emr/hive/restore/json2map.py
Added resources: [s3://webgames-emr/hive/restore/json2map.py]
Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1437833808701_0006, Tracking URL = 
http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1437833808701_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-25 15:01:16,773 Stage-1 map = 0%,  reduce = 0%
2015-07-25 15:01:34,319 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.96 
sec
MapReduce Total cumulative CPU time: 1 seconds 960 msec
Ended Job = job_1437833808701_0006
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.96 sec   HDFS Read: 261 HDFS Write: 
25 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 960 msec
OK
{"key1":"valu1\u0003key2\u0003value2"}
Time taken: 48.878 seconds, Fetched: 1 row(s)

Expected Result {"key1":"valu1","key2":"value2"}

Actual Result {"key1":"valu1\u0003key2\u0003value2"}


> Incorrect  (de)serialization STRING field to MAP in TRANSFORM 
> operation
> --
>
> Key: HIVE-11373
> URL: https://issues.apache.org/jira/browse/HIVE-11373
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.13.1, 1.0.0
> Environment: Amazon EMR (AMI 3.8 with HIVE 0.13.1, emr-4.0.0 with 
> HIVE 1.0)
>Reporter: eugeny birukov
>
> I try transform json

[jira] [Updated] (HIVE-11373) Incorrect (de)serialization STRING field to MAP in TRANSFORM operation

2015-07-25 Thread eugeny birukov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugeny birukov updated HIVE-11373:
--
Description: 
I try transform json string to Map using python code:

for d in sys.stdin:
 r=re.sub('[:,]', '\003', re.sub('[{}\"]','',d))
 print r.strip()

echo '{"key1":"valu1","key2":"value2"}' | ./json2map.py 
key1valu1key2value2

It's string must transform to HIVE type MAP
But  transformation result view as  {"key1":"valu1\u0003key2\u0003value2"}

Steps for reproduce:

echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;

hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath 
'/tmp/json.txt' overwrite into table json;"

hive -e "SELECT TRANSFORM (jsonStr) USING 
's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;"

converting to local s3://webgames-emr/hive/restore/json2map.py
Added resources: [s3://webgames-emr/hive/restore/json2map.py]
Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1437833808701_0006, Tracking URL = 
http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1437833808701_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-25 15:01:16,773 Stage-1 map = 0%,  reduce = 0%
2015-07-25 15:01:34,319 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.96 
sec
MapReduce Total cumulative CPU time: 1 seconds 960 msec
Ended Job = job_1437833808701_0006
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.96 sec   HDFS Read: 261 HDFS Write: 
25 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 960 msec
OK
{"key1":"valu1\u0003key2\u0003value2"}
Time taken: 48.878 seconds, Fetched: 1 row(s)

Expected Result {"key1":"valu1","key2":"value2"}

Actual Result {"key1":"valu1\u0003key2\u0003value2"}

  was:
I try transform json string to Map using python code

import sys,re

for d in sys.stdin:
 r=d.replace('{','').replace('}','').replace('"','')
 r=re.sub('[:,]', '\003', r)
 print r.strip()

Steps for reproduce:

echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;

hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath 
'/tmp/json.txt' overwrite into table json;"

hive -e "SELECT TRANSFORM (jsonStr) USING 
's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;"

converting to local s3://webgames-emr/hive/restore/json2map.py
Added resources: [s3://webgames-emr/hive/restore/json2map.py]
Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1437833808701_0006, Tracking URL = 
http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1437833808701_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-25 15:01:16,773 Stage-1 map = 0%,  reduce = 0%
2015-07-25 15:01:34,319 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.96 
sec
MapReduce Total cumulative CPU time: 1 seconds 960 msec
Ended Job = job_1437833808701_0006
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.96 sec   HDFS Read: 261 HDFS Write: 
25 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 960 msec
OK
{"key1":"valu1\u0003key2\u0003value2"}
Time taken: 48.878 seconds, Fetched: 1 row(s)

Expected Result {"key1":"valu1","key2":"value2"}

Actual Result {"key1":"valu1\u0003key2\u0003value2"}


> Incorrect  (de)serialization STRING field to MAP in TRANSFORM 
> operation
> --
>
> Key: HIVE-11373
> URL: https://issues.apache.org/jira/browse/HIVE-11373
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.13.1, 1.0.0
> Environment: Amazon EMR (AMI 3.8 with HIVE 0.13.1, emr-4.0.0 with 
> HIVE 1.0)
>Reporter: eugeny birukov
>
> I try transform json string to Map using python code:
> for d in sys.stdin:
>  r=re.sub('[:,]', '\003', re.sub('[{}\"]','',d))
>  print r.strip()
> echo '{"key1":"valu1","key2":"value2"}' | ./json2map.py 
> key1valu1key2value2
> It's string must transform to HIVE type MAP
> But  transformation result view as  {"key1":"valu1\u0003key2\u0003value2"}
> Steps for reproduce:
> echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;
> hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath 
> '/tmp/json.txt' overwrite into table json;"
> hive -e "SELECT TRANSFORM (jsonStr) USING 
> 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP STRING>) FROM json;"
> converting to loca

[jira] [Updated] (HIVE-11296) Merge from master to spark branch [Spark Branch]

2015-07-25 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-11296:

Attachment: (was: HIVE-11296.1-spark.patch)

> Merge from master to spark branch [Spark Branch]
> 
>
> Key: HIVE-11296
> URL: https://issues.apache.org/jira/browse/HIVE-11296
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-11296-1.spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11296) Merge from master to spark branch [Spark Branch]

2015-07-25 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-11296:

Attachment: HIVE-11296-1.spark.patch

> Merge from master to spark branch [Spark Branch]
> 
>
> Key: HIVE-11296
> URL: https://issues.apache.org/jira/browse/HIVE-11296
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-11296-1.spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11373) Incorrect (de)serialization STRING field to MAP in TRANSFORM operation

2015-07-25 Thread eugeny birukov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugeny birukov updated HIVE-11373:
--
Description: 
I try transform json string to Map using python code

import sys,re

for d in sys.stdin:
 r=d.replace('{','').replace('}','').replace('"','')
 r=re.sub('[:,]', '\003', r)
 print r.strip()

Steps for reproduce:

echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;

hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath 
'/tmp/json.txt' overwrite into table json;"

hive -e "SELECT TRANSFORM (jsonStr) USING 
's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;"

converting to local s3://webgames-emr/hive/restore/json2map.py
Added resources: [s3://webgames-emr/hive/restore/json2map.py]
Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1437833808701_0006, Tracking URL = 
http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1437833808701_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-25 15:01:16,773 Stage-1 map = 0%,  reduce = 0%
2015-07-25 15:01:34,319 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.96 
sec
MapReduce Total cumulative CPU time: 1 seconds 960 msec
Ended Job = job_1437833808701_0006
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.96 sec   HDFS Read: 261 HDFS Write: 
25 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 960 msec
OK
{"key1":"valu1\u0003key2\u0003value2"}
Time taken: 48.878 seconds, Fetched: 1 row(s)

Expected Result {"key1":"valu1","key2":"value2"}

Actual Result {"key1":"valu1\u0003key2\u0003value2"}

  was:
I try transform json string to Map using python code

import sys,re

for d in sys.stdin:
 r=d.replace('{','').replace('}','').replace('"','')
 r=re.sub('[:,]', '\003', r)
 print r.strip()

echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;

hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath 
'/tmp/json.txt' overwrite into table json;"

hive -e "SELECT TRANSFORM (jsonStr) USING 
's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;"

converting to local s3://webgames-emr/hive/restore/json2map.py
Added resources: [s3://webgames-emr/hive/restore/json2map.py]
Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1437833808701_0006, Tracking URL = 
http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1437833808701_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-25 15:01:16,773 Stage-1 map = 0%,  reduce = 0%
2015-07-25 15:01:34,319 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.96 
sec
MapReduce Total cumulative CPU time: 1 seconds 960 msec
Ended Job = job_1437833808701_0006
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.96 sec   HDFS Read: 261 HDFS Write: 
25 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 960 msec
OK
{"key1":"valu1\u0003key2\u0003value2"}
Time taken: 48.878 seconds, Fetched: 1 row(s)

Expected Result {"key1":"valu1","key2":"value2"}

Actual Result {"key1":"valu1\u0003key2\u0003value2"}


> Incorrect  (de)serialization STRING field to MAP in TRANSFORM 
> operation
> --
>
> Key: HIVE-11373
> URL: https://issues.apache.org/jira/browse/HIVE-11373
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.13.1, 1.0.0
> Environment: Amazon EMR (AMI 3.8 with HIVE 0.13.1, emr-4.0.0 with 
> HIVE 1.0)
>Reporter: eugeny birukov
>
> I try transform json string to Map using python code
> import sys,re
> for d in sys.stdin:
>  r=d.replace('{','').replace('}','').replace('"','')
>  r=re.sub('[:,]', '\003', r)
>  print r.strip()
> Steps for reproduce:
> echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;
> hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath 
> '/tmp/json.txt' overwrite into table json;"
> hive -e "SELECT TRANSFORM (jsonStr) USING 
> 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP STRING>) FROM json;"
> converting to local s3://webgames-emr/hive/restore/json2map.py
> Added resources: [s3://webgames-emr/hive/restore/json2map.py]
> Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1437833808701_000

[jira] [Updated] (HIVE-11373) Incorrect (de)serialization STRING field to MAP in TRANSFORM operation

2015-07-25 Thread eugeny birukov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugeny birukov updated HIVE-11373:
--
Description: 
I try transform json string to Map using python code

import sys,re

for d in sys.stdin:
 r=d.replace('{','').replace('}','').replace('"','')
 r=re.sub('[:,]', '\003', r)
 print r.strip()

echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;

hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath 
'/tmp/json.txt' overwrite into table json;"

hive -e "SELECT TRANSFORM (jsonStr) USING 
's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) FROM json;"

converting to local s3://webgames-emr/hive/restore/json2map.py
Added resources: [s3://webgames-emr/hive/restore/json2map.py]
Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1437833808701_0006, Tracking URL = 
http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1437833808701_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-25 15:01:16,773 Stage-1 map = 0%,  reduce = 0%
2015-07-25 15:01:34,319 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.96 
sec
MapReduce Total cumulative CPU time: 1 seconds 960 msec
Ended Job = job_1437833808701_0006
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.96 sec   HDFS Read: 261 HDFS Write: 
25 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 960 msec
OK
{"key1":"valu1\u0003key2\u0003value2"}
Time taken: 48.878 seconds, Fetched: 1 row(s)

Expected Result {"key1":"valu1","key2":"value2"}

Actual Result {"key1":"valu1\u0003key2\u0003value2"}

  was:

I try transform json string to Map using python code

import sys,re

for d in sys.stdin:
 r=d.replace('{','').replace('}','').replace('"','')
 r=re.sub('[:,]', '\003', r)
 print r.strip()

echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;

hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath 
'/tmp/json.txt' overwrite into table json;"

hive -e "CREATE TABLE d(jsondata MAP); SELECT TRANSFORM 
(jsonStr) USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson 
MAP) FROM json;"

converting to local s3://webgames-emr/hive/restore/json2map.py
Added resources: [s3://webgames-emr/hive/restore/json2map.py]
Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1437833808701_0006, Tracking URL = 
http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1437833808701_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-25 15:01:16,773 Stage-1 map = 0%,  reduce = 0%
2015-07-25 15:01:34,319 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.96 
sec
MapReduce Total cumulative CPU time: 1 seconds 960 msec
Ended Job = job_1437833808701_0006
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.96 sec   HDFS Read: 261 HDFS Write: 
25 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 960 msec
OK
{"key1":"valu1\u0003key2\u0003value2"}
Time taken: 48.878 seconds, Fetched: 1 row(s)

Expected Result {"key1":"valu1","key2":"value2"}

Actual Result {"key1":"valu1\u0003key2\u0003value2"}


> Incorrect  (de)serialization STRING field to MAP in TRANSFORM 
> operation
> --
>
> Key: HIVE-11373
> URL: https://issues.apache.org/jira/browse/HIVE-11373
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.13.1, 1.0.0
> Environment: Amazon EMR (AMI 3.8 with HIVE 0.13.1, emr-4.0.0 with 
> HIVE 1.0)
>Reporter: eugeny birukov
>
> I try transform json string to Map using python code
> import sys,re
> for d in sys.stdin:
>  r=d.replace('{','').replace('}','').replace('"','')
>  r=re.sub('[:,]', '\003', r)
>  print r.strip()
> echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;
> hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath 
> '/tmp/json.txt' overwrite into table json;"
> hive -e "SELECT TRANSFORM (jsonStr) USING 
> 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP STRING>) FROM json;"
> converting to local s3://webgames-emr/hive/restore/json2map.py
> Added resources: [s3://webgames-emr/hive/restore/json2map.py]
> Query ID = hadoop_2015072515_46c48f7d-92c6-41d7-9c54-a90d5b351722
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1437833808701_0006, Tracking U

[jira] [Updated] (HIVE-10171) Create a storage-api module

2015-07-25 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-10171:

Description: To support high performance file formats, I'd like to propose 
that we move the minimal set of classes that are required to integrate with 
Hive into a new module named "storage-api". This module will include 
VectorizedRowBatch, the various ColumnVector classes, and the SARG classes. It 
will form the start of an API that high performance storage formats can use to 
integrate with Hive. Both ORC and Parquet can use the new API to support 
vectorization and SARGs without performance destroying shims.  (was: To support 
high performance file formats, I'd like to propose that we move the minimal set 
of classes that are required to integrate with Hive in to a new module named 
"storage-api". This module will include VectorizedRowBatch, the various 
ColumnVector classes, and the SARG classes. It will form the start of an API 
that high performance storage formats can use to integrate with Hive. Both ORC 
and Parquet can use the new API to support vectorization and SARGs without 
performance destroying shims.)

> Create a storage-api module
> ---
>
> Key: HIVE-10171
> URL: https://issues.apache.org/jira/browse/HIVE-10171
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.0.0
>
>
> To support high performance file formats, I'd like to propose that we move 
> the minimal set of classes that are required to integrate with Hive into a 
> new module named "storage-api". This module will include VectorizedRowBatch, 
> the various ColumnVector classes, and the SARG classes. It will form the 
> start of an API that high performance storage formats can use to integrate 
> with Hive. Both ORC and Parquet can use the new API to support vectorization 
> and SARGs without performance destroying shims.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9873) Hive on MR throws DeprecatedParquetHiveInput exception

2015-07-25 Thread Pavas Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavas Garg updated HIVE-9873:
-
Component/s: Hive

> Hive on MR throws DeprecatedParquetHiveInput exception
> --
>
> Key: HIVE-9873
> URL: https://issues.apache.org/jira/browse/HIVE-9873
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 1.2.0
>
> Attachments: HIVE-9873.1.patch
>
>
> The following error is thrown when information about columns is changed on 
> {{projectionPusher.pushProjectionsAndFilters}}. 
> {noformat}
> 2015-02-26 15:56:40,275 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.IOException: java.io.IOException: 
> java.io.IOException: DeprecatedParquetHiveInput : size of object differs. 
> Value size :  23, Current Object size : 29
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:226)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:136)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.io.IOException: java.io.IOException: 
> DeprecatedParquetHiveInput : size of object differs. Value size :  23, 
> Current Object size : 29
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:105)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:224)
>   ... 11 more
> Caused by: java.io.IOException: DeprecatedParquetHiveInput : size of object 
> differs. Value size :  23, Current Object size : 29
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:199)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:52)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
>   ... 15 more
> {noformat}
> The bug is in {{ParquetRecordReaderWrapper}}. We store metastore such as the 
> list of columns in the {{Configuration/JobConf}}. The issue is that this 
> metadata is incorrect until the call to 
> {{projectionPusher.pushProjectionsAndFilters}}. In the current codebase we 
> don't use the configuration object returned from 
> {{projectionPusher.pushProjectionsAndFilters}} in other sections of code such 
> as creation and initialization of {{realReader}}. The end result is that 
> parquet is given an empty read schema and returns all nulls. Since the join 
> key is null, no records are joined.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11372) join with betwee predicate comparing integer types returns no rows when ORC format used

2015-07-25 Thread N Campbell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

N Campbell updated HIVE-11372:
--
Attachment: TSINT
TINT

> join with betwee predicate comparing integer types returns no rows when ORC 
> format used
> ---
>
> Key: HIVE-11372
> URL: https://issues.apache.org/jira/browse/HIVE-11372
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: N Campbell
> Attachments: TINT, TSINT
>
>
> getDatabaseProductNameApache Hive
> getDatabaseProductVersion 1.2.1.2.3.0.0-2557
> getDriverName Hive JDBC
> getDriverVersion  1.2.1.2.3.0.0-2557
> getDriverMajorVersion 1
> getDriverMinorVersion 2
> select tint.rnum, tsint.rnum from tint , tsint where tint.cint between 
> tsint.csint and tsint.csint
> when ORC used no rows returned versus TEXT
> create table  if not exists TSINT ( RNUM int , CSINT smallint   )
> -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS orc  ;
> create table  if not exists TINT ( RNUM int , CINT int   )
> -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS orc  ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11371) Null pointer exception for nested table query when using ORC versus text

2015-07-25 Thread N Campbell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

N Campbell updated HIVE-11371:
--
Attachment: TJOIN4
TJOIN3
TJOIN2
TJOIN1

> Null pointer exception for nested table query when using ORC versus text
> 
>
> Key: HIVE-11371
> URL: https://issues.apache.org/jira/browse/HIVE-11371
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: N Campbell
> Attachments: TJOIN1, TJOIN2, TJOIN3, TJOIN4
>
>
> Following query will fail if the file format is ORC 
> select tj1rnum, tj2rnum, tjoin3.rnum as rnumt3 from   (select tjoin1.rnum 
> tj1rnum, tjoin2.rnum tj2rnum, tjoin2.c1 tj2c1  from tjoin1 left outer join 
> tjoin2 on tjoin1.c1 = tjoin2.c1 ) tj  left outer join tjoin3 on tj2c1 = 
> tjoin3.c1 
> aused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow$LongCopyRow.copy(VectorCopyRow.java:60)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.copyByReference(VectorCopyRow.java:260)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.generateHashMapResultMultiValue(VectorMapJoinGenerateResultOperator.java:238)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinOuterGenerateResultOperator.finishOuter(VectorMapJoinOuterGenerateResultOperator.java:495)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinOuterLongOperator.process(VectorMapJoinOuterLongOperator.java:430)
>   ... 22 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
> killedTasks:0, Vertex vertex_1437788144883_0004_2_02 [Map 1] killed/failed 
> due to:null]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 
> killedVertices:0
> SQLState:  08S01
> ErrorCode: 2
> getDatabaseProductNameApache Hive
> getDatabaseProductVersion 1.2.1.2.3.0.0-2557
> getDriverName Hive JDBC
> getDriverVersion  1.2.1.2.3.0.0-2557
> getDriverMajorVersion 1
> getDriverMinorVersion 2
> create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
> -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS orc;
> create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
> -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS orc ;
> create table  if not exists TJOIN3 (RNUM int , C1 int, C2 char(2))
> -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS orc ;
> create table  if not exists TJOIN4 (RNUM int , C1 int, C2 char(2))
> -- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS orc ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)

2015-07-25 Thread wangchangchun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641539#comment-14641539
 ] 

wangchangchun commented on HIVE-11055:
--

FOURTH:
Create procedure()
begin
INSERT INTO TMP_EDR_MAX_CONSPT1 SELECT STARTTIME, 
SERVICENAME,SUBSCRIBERSN,SUBSCRIBEDATETIME,VALIDFROMDATETIME, EXPIREDATETIME 
FROM TDR_PCC_SUBSCRIPTION
end;

FIfth:
create procedure testexception()
begin
DECLARE
   booknum int;
   total int;
   percent int;
   SET booknum = 10;
   SET total = 0;
   SET percent = booknum / total;

EXCEPTION WHEN OTHERS THEN
  DBMS_OUTPUT.PUT_LINE('Error');
end;

CALL testexception();



> HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
> ---
>
> Key: HIVE-11055
> URL: https://issues.apache.org/jira/browse/HIVE-11055
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dmitry Tolpeko
>Assignee: Dmitry Tolpeko
> Fix For: 2.0.0
>
> Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
> HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml
>
>
> There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
> (actually any SQL-on-Hadoop implementation and any JDBC source).
> Alan Gates offered to contribute it to Hive under HPL/SQL name 
> (org.apache.hive.hplsql package). This JIRA is to create a patch to 
> contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)

2015-07-25 Thread wangchangchun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641534#comment-14641534
 ] 

wangchangchun commented on HIVE-11055:
--

Hello, I try to use HPL/SQL function. AND most of this function is OK.
I put some problem I found here, you can solve it later.

First:
CREATE TABLE  (a int, b int);
create or replace testinsertinto()
BEGIN
INSERT INTO  values (50,10);
END;
CALL testinsertinto();

Second:

create or replace procedure  testinto()
BEGIN
declare
v_dtlTime  DECIMAL(18,0);
select top 1 starttime into v_dtlTime from TDR_PCC_SUBQUOTA_17000;
PRINT v_dtlTime;
END;
CALL testinto();

Third:
create procedure testwf()
begin
  SELECT
  dept,
  userid,
  sal,
  CUME_DIST() OVER(ORDER BY sal) AS rn1,
  CUME_DIST() OVER(PARTITION BY dept ORDER BY sal) AS rn2
  FROM lxw1234;

end;
call testwf();







> HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
> ---
>
> Key: HIVE-11055
> URL: https://issues.apache.org/jira/browse/HIVE-11055
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dmitry Tolpeko
>Assignee: Dmitry Tolpeko
> Fix For: 2.0.0
>
> Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
> HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml
>
>
> There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
> (actually any SQL-on-Hadoop implementation and any JDBC source).
> Alan Gates offered to contribute it to Hive under HPL/SQL name 
> (org.apache.hive.hplsql package). This JIRA is to create a patch to 
> contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11271) java.lang.IndexOutOfBoundsException when union all with if function

2015-07-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641449#comment-14641449
 ] 

Hive QA commented on HIVE-11271:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747155/HIVE-11271.4.patch

{color:green}SUCCESS:{color} +1 9259 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4716/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4716/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4716/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747155 - PreCommit-HIVE-TRUNK-Build

> java.lang.IndexOutOfBoundsException when union all with if function
> ---
>
> Key: HIVE-11271
> URL: https://issues.apache.org/jira/browse/HIVE-11271
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11271.1.patch, HIVE-11271.2.patch, 
> HIVE-11271.3.patch, HIVE-11271.4.patch
>
>
> Some queries with Union all as subquery fail in MapReduce task with 
> stacktrace:
> {noformat}
> 15/07/15 14:19:30 [pool-13-thread-1]: INFO exec.UnionOperator: Initializing 
> operator UNION[104]
> 15/07/15 14:19:30 [Thread-72]: INFO mapred.LocalJobRunner: Map task executor 
> complete.
> 15/07/15 14:19:30 [Thread-72]: WARN mapred.LocalJobRunner: 
> job_local826862759_0005
> java.lang.Exception: java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
> Caused by: java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>   ... 10 more
> Caused by: java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>   at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>   ... 14 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>   ... 17 more
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140)
>   ... 21 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
>   at org.apache.hadoop.h