date:20150904

[jira] [Updated] (HIVE-11735) Different results when multiple if() functions are used

2015-09-04 Thread Chetna Chaudhari (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetna Chaudhari updated HIVE-11735:

Description: 
Hive if() udf is returns different results when string equality is used as 
condition, with case change. 
Observation:
   1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are 
treated as equal.
   2) The rightmost udf result is pushed to predicates on left side. Leading to 
same result for both the udfs.

How to reproduce the issue:
1) CREATE TABLE `sample`(
  `name` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
TBLPROPERTIES (
  'transient_lastDdlTime'='1425075745');

2) insert into table sample values ('chetna');
3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3))  from 
sample; 
This will give result : 
33
Expected result:
43
4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3))  from 
sample; 
This will give result 
44
Expected result:
34




  was:
Hive if() udf is returning different results when string equality is used as 
condition, with case change. 
Observation:
   1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are 
treated as equal.
   2) The rightmost udf result is pushed to predicates on left side. Leading to 
same result for both the udfs.

How to reproduce the issue:
1) CREATE TABLE `sample`(
  `name` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
TBLPROPERTIES (
  'transient_lastDdlTime'='1425075745');

2) insert into table sample values ('chetna');
3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3))  from 
sample; 
This will give result : 
33
Expected result:
43
4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3))  from 
sample; 
This will give result 
44
Expected result:
34





> Different results when multiple if() functions are used 
> 
>
> Key: HIVE-11735
> URL: https://issues.apache.org/jira/browse/HIVE-11735
> Project: Hive
>  Issue Type: Bug
>Reporter: Chetna Chaudhari
>
> Hive if() udf is returns different results when string equality is used as 
> condition, with case change. 
> Observation:
>1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are 
> treated as equal.
>2) The rightmost udf result is pushed to predicates on left side. Leading 
> to same result for both the udfs.
> How to reproduce the issue:
> 1) CREATE TABLE `sample`(
>   `name` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1425075745');
> 2) insert into table sample values ('chetna');
> 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3))  from 
> sample; 
> This will give result : 
> 33
> Expected result:
> 43
> 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3))  from 
> sample; 
> This will give result 
> 44
> Expected result:
> 34



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11707) Implement "dump metastore"

2015-09-04 Thread Damien Carol (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730873#comment-14730873
 ] 

Damien Carol commented on HIVE-11707:
-

This feature is a must have. We're using dump from MySQL stored in HDFS.
This feature could make things a lot better.

> Implement "dump metastore"
> --
>
> Key: HIVE-11707
> URL: https://issues.apache.org/jira/browse/HIVE-11707
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>
> In projects, we've frequently met the need of copying existing metastore to 
> other database (for other version of hive or other engines like impala, tajo, 
> spark, etc.). RDBs support dumping data of metastore into series of SQLs but 
> it's needed to be translated before apply if we uses different RDB which is 
> time counsuming, error-prone work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11696) Exception when table-level serde is Parquet while partition-level serde is JSON

2015-09-04 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11696:

Attachment: HIVE-11696.2.patch

Seems we have to deal with both ArrayWritable (for Parquet only which doesn't 
need the conversion) and ArrayList (needs the conversion from the other 
format). 

> Exception when table-level serde is Parquet while partition-level serde is 
> JSON
> ---
>
> Key: HIVE-11696
> URL: https://issues.apache.org/jira/browse/HIVE-11696
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11696.2.patch, HIVE-11696.patch
>
>
> Create a table with partitions and set the SerDe to be Json. The query 
> "select * from tbl1" works fine.
> Now set table level SerDe to be Parquet serde. Based on HIVE-6785, table 
> level and partition level can have different serdes. 
> {noformat}
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:154)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1764)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:425)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:417)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
> ... 13 more
> Caused by: java.lang.UnsupportedOperationException: Cannot inspect 
> java.util.ArrayList
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:112)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:310)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:202)
> at 
> org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:61)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
> at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
> at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:40)
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:90)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11696) Exception when table-level serde is Parquet while partition-level serde is JSON

2015-09-04 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11696:

Attachment: (was: HIVE-11696.2.patch)

> Exception when table-level serde is Parquet while partition-level serde is 
> JSON
> ---
>
> Key: HIVE-11696
> URL: https://issues.apache.org/jira/browse/HIVE-11696
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11696.patch
>
>
> Create a table with partitions and set the SerDe to be Json. The query 
> "select * from tbl1" works fine.
> Now set table level SerDe to be Parquet serde. Based on HIVE-6785, table 
> level and partition level can have different serdes. 
> {noformat}
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:154)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1764)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:425)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:417)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
> ... 13 more
> Caused by: java.lang.UnsupportedOperationException: Cannot inspect 
> java.util.ArrayList
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:112)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:310)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:202)
> at 
> org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:61)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
> at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
> at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:40)
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:90)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11734) Hive Server2 not impersonating HDFS for CREATE TABLE/DATABASE with KERBEROS auth

2015-09-04 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731005#comment-14731005
 ] 

Thejas M Nair commented on HIVE-11734:
--

Can you try starting hiveserver2 with embedded metastore ? 
hiveserver2 -hiveconf hive.metastore.uris=' ' ..

(For historical reasons, at hortonworks we have stuck to using metastore in 
embedded mode with HS2 and our system tests are run in that mode.  I haven't 
seen this issue in that mode).

> Hive Server2 not impersonating HDFS for CREATE TABLE/DATABASE with KERBEROS 
> auth
> 
>
> Key: HIVE-11734
> URL: https://issues.apache.org/jira/browse/HIVE-11734
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.1.1
>Reporter: Jakub Pastuszek
>
> My configuration is as follows:
> {code}
> hive-site.xml:
> hive.server2.enable.doAs=true
> hive.metastore.execute.setugi=true
> hive.security.metastore.authorization.auth.reads=true
> hive.metastore.sasl.enabled=true
> hive.server2.authentication=KERBEROS
> hive.server2.thrift.sasl.qop=auth-conf
> hive.warehouse.subdir.inherit.perms=false
> ...
> hdfs-site.xml:
> dfs.block.access.token.enable=true
> fs.permissions.umask-mode=027
> ...
> core-site.xml:
> hadoop.security.authentication=kerberos
> hadoop.security.authorization=true
> hadoop.proxyuser.hive.hosts=localhost,master
> hadoop.proxyuser.hive.groups=*
> ...
> {code}
> When I create a database or a table using Kerberos authorised (kinit) user 
> account and beeline (shell) the HDFS directories created by Hive are owned by 
> 'hive' user and group is same as for parent directory ('data' in my case) 
> ('hive' user does not even belong to that group at all but it is in 
> supergroup).
> Now when I try to load the data (or do any other map-reduce) the table files 
> end up owned as the kinit'ed user and the actual user running Yarn container 
> is the kinit'ed user (not 'hive').
> This is causing a permission issues when I run queries that do map-reduce 
> since I don't own the database and table directories.
> Also this allows anybody to drop my database/table since this operation is 
> performed as 'hive' user which is in the supergroup.
> What I want to get is DDL queries to use kinit'ed user when accessing HDFS so 
> database/table directories end up being owned as that user.
> Is this a bug or configuration problem? 
> Also the group should be users primary group (inherit.perms=false) and not 
> group of the parent directory. This way I can use owner/group authorisation 
> on HDFS to grant/restrict access using groups.
> As it stands it is serious security issue and also renders the whole 
> doAs/impersonation system useless for me.
> Also see my question on Serverfault:
> http://serverfault.com/questions/717483/hive-server2-not-impersonating-hdfs
> Versions:
> {code}
> hadoop-0.20-mapreduce-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
> hadoop-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
> hadoop-client-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
> hadoop-hdfs-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
> hadoop-hdfs-namenode-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
> hadoop-mapreduce-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
> hadoop-mapreduce-historyserver-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
> hadoop-yarn-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
> hadoop-yarn-resourcemanager-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
> hive-1.1.0+cdh5.4.4+157-1.cdh5.4.4.p0.6.el6.noarch
> hive-jdbc-1.1.0+cdh5.4.4+157-1.cdh5.4.4.p0.6.el6.noarch
> hive-metastore-1.1.0+cdh5.4.4+157-1.cdh5.4.4.p0.6.el6.noarch
> hive-server2-1.1.0+cdh5.4.4+157-1.cdh5.4.4.p0.6.el6.noarch
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11646) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix multiple window spec for PTF operator

2015-09-04 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731018#comment-14731018
 ] 

Jesus Camacho Rodriguez commented on HIVE-11646:


[~pxiong], I thought you were going to add the additional test case? Thanks

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix multiple 
> window spec for PTF operator
> ---
>
> Key: HIVE-11646
> URL: https://issues.apache.org/jira/browse/HIVE-11646
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11646.01.patch
>
>
> Current return path only supports a single windowing spec. All the following 
> window spec will overwrite the first one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11600) Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())

2015-09-04 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731079#comment-14731079
 ] 

Pengcheng Xiong commented on HIVE-11600:


Based on the review comments, pushed it to master. Thanks for [~jpullokkaran]'s 
review.

> Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())
> 
>
> Key: HIVE-11600
> URL: https://issues.apache.org/jira/browse/HIVE-11600
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11600.01.patch, HIVE-11600.02.patch, 
> HIVE-11600.03.patch, HIVE-11600.04.patch, HIVE-11600.05.patch
>
>
> Current hive only support single column in clause, e.g., 
> {code}select * from src where  col0 in (v1,v2,v3);{code}
> We want it to support 
> {code}select * from src where (col0,col1+3) in 
> ((col0+v1,v2),(v3,v4-col1));{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9756) LLAP: use log4j 2 for llap

2015-09-04 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731226#comment-14731226
 ] 

Prasanth Jayachandran commented on HIVE-9756:
-

[~sseth] Can you please review this patch? Here is the RB: 
https://reviews.apache.org/r/38132/

> LLAP: use log4j 2 for llap
> --
>
> Key: HIVE-9756
> URL: https://issues.apache.org/jira/browse/HIVE-9756
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Gunther Hagleitner
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-9756.1.patch
>
>
> For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get 
> throughput friendly logging.
> http://logging.apache.org/log4j/2.0/manual/async.html#Performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-11703) Make RegExp and RLike reserved keywords

2015-09-04 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-11703.

Resolution: Fixed

> Make RegExp and RLike reserved keywords
> ---
>
> Key: HIVE-11703
> URL: https://issues.apache.org/jira/browse/HIVE-11703
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> RegExp and RLike are treated as "precedenceEqualNegatableOperator" in Hive. 
> They actually come from MySQL. Both of them are not keywords in SQL2011. But 
> they are reserved keywords in MySQL. Making them reserved ones can eliminate 
> the current 14 ambiguities that we have in current Hive. If users still would 
> like to use them as identifiers/function names, users can "set 
> hive.support.sql11.reserved.keywords=false;"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11642) LLAP: make sure tests pass #3

2015-09-04 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731156#comment-14731156
 ] 

Prasanth Jayachandran commented on HIVE-11642:
--

[~sershe] QTestUtil depends on MiniLlapCluster from llap-server-tests jar. Its 
a compile time dependency. We cannot shim this layer as shim will depend on 
llap-server and vice-versa creating circular dependency. Since llap-server is 
compiled only for hadoop-2 I am not sure how to make itests compile for 
hadoop-1. Any thoughts/idea?

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9756) LLAP: use log4j 2 for llap

2015-09-04 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731320#comment-14731320
 ] 

Sergey Shelukhin commented on HIVE-9756:


How does this work for IO elevator thread, and other potential permanent 
threads (e.g. when thread runner will be changed to be on a threadpool and not 
on separate threads)?
I had problem with MDC (iirc) in that the MDC would be set once for a 
threadpool thread and then all the fragments would log that MDC.

> LLAP: use log4j 2 for llap
> --
>
> Key: HIVE-9756
> URL: https://issues.apache.org/jira/browse/HIVE-9756
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Gunther Hagleitner
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-9756.1.patch
>
>
> For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get 
> throughput friendly logging.
> http://logging.apache.org/log4j/2.0/manual/async.html#Performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-11676) implement metastore API to do file footer PPD

2015-09-04 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-11676:
---

Assignee: Sergey Shelukhin

> implement metastore API to do file footer PPD
> -
>
> Key: HIVE-11676
> URL: https://issues.apache.org/jira/browse/HIVE-11676
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11676.patch
>
>
> Need to pass on the expression/sarg, extract column stats from footer (at 
> write time?) and then apply one to the other. I may file a separate JIRA for 
> ORC changes cause that is usually PITA
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4752) Add support for hs2 api to use thrift over http

2015-09-04 Thread Austin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731475#comment-14731475
 ] 

Austin Lee commented on HIVE-4752:
--

Are there any plans to add similar support for the metastore to use Thrift over 
HTTP?

> Add support for hs2 api to use thrift over http
> ---
>
> Key: HIVE-4752
> URL: https://issues.apache.org/jira/browse/HIVE-4752
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Thejas M Nair
>Assignee: Vaibhav Gumashta
>
> Hiveserver2 acts as service on the cluster for external applications. One way 
> to implement access control to services on a hadoop cluster to have a gateway 
> server authorizes service requests before forwarding them to the server. The 
> [knox project | http://wiki.apache.org/incubator/knox] has taken this 
> approach to simplify cluster security management.
> Other services on hadoop cluster such as webhdfs and webhcat already use 
> HTTP. Having hiveserver2 also support thrift over http transport will enable 
> securing hiveserver2 as well using the same approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11642) LLAP: make sure tests pass #3

2015-09-04 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731311#comment-14731311
 ] 

Sergey Shelukhin commented on HIVE-11642:
-

Filed a separate JIRA for this

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-09-04 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731324#comment-14731324
 ] 

Sergey Shelukhin commented on HIVE-11587:
-

I actually have a comment :) other than that should be good

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11737) IndexOutOfBounds compiling query with duplicated groupby keys

2015-09-04 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731355#comment-14731355
 ] 

Szehon Ho commented on HIVE-11737:
--

Thanks for the fix, looks good, wondering any way to make a unit test?

> IndexOutOfBounds compiling query with duplicated groupby keys
> -
>
> Key: HIVE-11737
> URL: https://issues.apache.org/jira/browse/HIVE-11737
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11737.1.patch
>
>
> {noformat}
> SELECT
> tinyint_col_7,
> MIN(timestamp_col_1) AS timestamp_col,
> MAX(LEAST(CAST(COALESCE(int_col_5, -279) AS int), 
> CAST(COALESCE(tinyint_col_7, 476) AS int))) AS int_col,
> tinyint_col_7 AS int_col_1,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int)) AS int_col_2
> FROM table_3
> GROUP BY
> tinyint_col_7,
> tinyint_col_7,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int))
> {noformat}
> Query compilation fails:
> {noformat}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4633)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5630)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8987)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9864)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9757)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10193)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10204)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10121)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-09-04 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731418#comment-14731418
 ] 

Wei Zheng commented on HIVE-11587:
--

[~sershe] Answered review comments :)

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11650) Create LLAP Monitor Daemon class and launch scripts

2015-09-04 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11650:

Attachment: example.patch

> Create LLAP Monitor Daemon class and launch scripts
> ---
>
> Key: HIVE-11650
> URL: https://issues.apache.org/jira/browse/HIVE-11650
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
> Attachments: HIVE-11650-llap.00.patch, Screen Shot 2015-08-26 at 
> 16.54.35.png, example.patch
>
>
> This JIRA for creating LLAP Monitor Daemon class and related launching 
> scripts for slider package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11646) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix multiple window spec for PTF operator

2015-09-04 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731315#comment-14731315
 ] 

Jesus Camacho Rodriguez commented on HIVE-11646:


+1

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix multiple 
> window spec for PTF operator
> ---
>
> Key: HIVE-11646
> URL: https://issues.apache.org/jira/browse/HIVE-11646
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11646.01.patch, HIVE-11646.02.patch
>
>
> Current return path only supports a single windowing spec. All the following 
> window spec will overwrite the first one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11737) IndexOutOfBounds compiling query with duplicated groupby keys

2015-09-04 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-11737:
---
Attachment: HIVE-11737.1.patch

> IndexOutOfBounds compiling query with duplicated groupby keys
> -
>
> Key: HIVE-11737
> URL: https://issues.apache.org/jira/browse/HIVE-11737
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11737.1.patch
>
>
> {noformat}
> SELECT
> tinyint_col_7,
> MIN(timestamp_col_1) AS timestamp_col,
> MAX(LEAST(CAST(COALESCE(int_col_5, -279) AS int), 
> CAST(COALESCE(tinyint_col_7, 476) AS int))) AS int_col,
> tinyint_col_7 AS int_col_1,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int)) AS int_col_2
> FROM table_3
> GROUP BY
> tinyint_col_7,
> tinyint_col_7,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int))
> {noformat}
> Query compilation fails:
> {noformat}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4633)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5630)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8987)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9864)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9757)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10193)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10204)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10121)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11646) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix multiple window spec for PTF operator

2015-09-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731313#comment-14731313
 ] 

Hive QA commented on HIVE-11646:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754238/HIVE-11646.02.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9399 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5181/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5181/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5181/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754238 - PreCommit-HIVE-TRUNK-Build

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix multiple 
> window spec for PTF operator
> ---
>
> Key: HIVE-11646
> URL: https://issues.apache.org/jira/browse/HIVE-11646
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11646.01.patch, HIVE-11646.02.patch
>
>
> Current return path only supports a single windowing spec. All the following 
> window spec will overwrite the first one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-11740) LLAP: hadoop-1 build is broken as MiniLlapCluster cannot compile on hadoop-1

2015-09-04 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-11740.
--
Resolution: Duplicate

Duplicate of HIVE-11732

> LLAP: hadoop-1 build is broken as MiniLlapCluster cannot compile on hadoop-1 
> -
>
> Key: HIVE-11740
> URL: https://issues.apache.org/jira/browse/HIVE-11740
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11650) Create LLAP Monitor Daemon class and launch scripts

2015-09-04 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731431#comment-14731431
 ] 

Sergey Shelukhin commented on HIVE-11650:
-

You also need to add dependency. I did the attached example (for real usage, 
you'd want to add version as a variable like with other dependencies, and 
remove tabs :)) and it was included:
{noformat}
[INFO] Including org.eclipse.jetty:jetty-server:jar:9.3.3.v20150827 in the 
shaded jar.
{noformat}

> Create LLAP Monitor Daemon class and launch scripts
> ---
>
> Key: HIVE-11650
> URL: https://issues.apache.org/jira/browse/HIVE-11650
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
> Attachments: HIVE-11650-llap.00.patch, Screen Shot 2015-08-26 at 
> 16.54.35.png
>
>
> This JIRA for creating LLAP Monitor Daemon class and related launching 
> scripts for slider package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-09-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731762#comment-14731762
 ] 

Hive QA commented on HIVE-11587:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754293/HIVE-11587.08.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9393 tests executed
*Failed tests:*
{noformat}
TestContribNegativeCliDriver - did not produce a TEST-*.xml file
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5185/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5185/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5185/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754293 - PreCommit-HIVE-TRUNK-Build

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-09-04 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731130#comment-14731130
 ] 

Wei Zheng commented on HIVE-11587:
--

Both TestPigHBaseStorageHandler and TestStreaming pass locally.

[~sershe] Could you review the latest patch and commit it if possible? Thanks!

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-09-04 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11587:
-
Affects Version/s: 1.2.0
   1.2.1

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-09-04 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11587:
-
Component/s: Hive

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11696) Exception when table-level serde is Parquet while partition-level serde is JSON

2015-09-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731152#comment-14731152
 ] 

Hive QA commented on HIVE-11696:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754197/HIVE-11696.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9388 tests executed
*Failed tests:*
{noformat}
TestJdbcWithMiniMr - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5180/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5180/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5180/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754197 - PreCommit-HIVE-TRUNK-Build

> Exception when table-level serde is Parquet while partition-level serde is 
> JSON
> ---
>
> Key: HIVE-11696
> URL: https://issues.apache.org/jira/browse/HIVE-11696
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11696.2.patch, HIVE-11696.patch
>
>
> Create a table with partitions and set the SerDe to be Json. The query 
> "select * from tbl1" works fine.
> Now set table level SerDe to be Parquet serde. Based on HIVE-6785, table 
> level and partition level can have different serdes. 
> {noformat}
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:154)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1764)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:425)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:417)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
> ... 13 more
> Caused by: java.lang.UnsupportedOperationException: Cannot inspect 
> java.util.ArrayList
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:112)
> at 
>

[jira] [Updated] (HIVE-11646) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix multiple window spec for PTF operator

2015-09-04 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11646:
---
Attachment: HIVE-11646.02.patch

new patch with test cases [~jcamachorodriguez], could you take another look? 
Thanks.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix multiple 
> window spec for PTF operator
> ---
>
> Key: HIVE-11646
> URL: https://issues.apache.org/jira/browse/HIVE-11646
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11646.01.patch, HIVE-11646.02.patch
>
>
> Current return path only supports a single windowing spec. All the following 
> window spec will overwrite the first one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11642) LLAP: make sure tests pass #3

2015-09-04 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731264#comment-14731264
 ] 

Sergey Shelukhin commented on HIVE-11642:
-

Reflection :)

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11743) HBase Port conflict for MiniHBaseCluster

2015-09-04 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-11743:
--
Attachment: HIVE-11743.1.patch

> HBase Port conflict for MiniHBaseCluster
> 
>
> Key: HIVE-11743
> URL: https://issues.apache.org/jira/browse/HIVE-11743
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: hbase-metastore-branch
>
> Attachments: HIVE-11743.1.patch
>
>
> HMaster http port conflict. Presumably a big in HBaseTestingUtility. It 
> suppose not to use default port for everything.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11737) IndexOutOfBounds compiling query with duplicated groupby keys

2015-09-04 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731520#comment-14731520
 ] 

Szehon Ho commented on HIVE-11737:
--

Cool, would you mind simplifying the test so only the basic columns to repro 
the problem are there?

> IndexOutOfBounds compiling query with duplicated groupby keys
> -
>
> Key: HIVE-11737
> URL: https://issues.apache.org/jira/browse/HIVE-11737
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11737.1.patch, HIVE-11737.2.patch
>
>
> {noformat}
> SELECT
> tinyint_col_7,
> MIN(timestamp_col_1) AS timestamp_col,
> MAX(LEAST(CAST(COALESCE(int_col_5, -279) AS int), 
> CAST(COALESCE(tinyint_col_7, 476) AS int))) AS int_col,
> tinyint_col_7 AS int_col_1,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int)) AS int_col_2
> FROM table_3
> GROUP BY
> tinyint_col_7,
> tinyint_col_7,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int))
> {noformat}
> Query compilation fails:
> {noformat}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4633)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5630)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8987)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9864)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9757)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10193)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10204)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10121)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11743) HBase Port conflict for MiniHBaseCluster

2015-09-04 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-11743:
--
Description: HMaster http port conflict. Presumably a bug in 
HBaseTestingUtility. It suppose not to use default port for everything.  (was: 
HMaster http port conflict. Presumably a big in HBaseTestingUtility. It suppose 
not to use default port for everything.)

> HBase Port conflict for MiniHBaseCluster
> 
>
> Key: HIVE-11743
> URL: https://issues.apache.org/jira/browse/HIVE-11743
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: hbase-metastore-branch
>
> Attachments: HIVE-11743.1.patch
>
>
> HMaster http port conflict. Presumably a bug in HBaseTestingUtility. It 
> suppose not to use default port for everything.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11737) IndexOutOfBounds compiling query with duplicated groupby keys

2015-09-04 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-11737:
---
Attachment: HIVE-11737.3.patch

Patch 3 has a simplified test.

> IndexOutOfBounds compiling query with duplicated groupby keys
> -
>
> Key: HIVE-11737
> URL: https://issues.apache.org/jira/browse/HIVE-11737
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11737.1.patch, HIVE-11737.2.patch, 
> HIVE-11737.3.patch
>
>
> {noformat}
> SELECT
> tinyint_col_7,
> MIN(timestamp_col_1) AS timestamp_col,
> MAX(LEAST(CAST(COALESCE(int_col_5, -279) AS int), 
> CAST(COALESCE(tinyint_col_7, 476) AS int))) AS int_col,
> tinyint_col_7 AS int_col_1,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int)) AS int_col_2
> FROM table_3
> GROUP BY
> tinyint_col_7,
> tinyint_col_7,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int))
> {noformat}
> Query compilation fails:
> {noformat}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4633)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5630)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8987)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9864)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9757)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10193)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10204)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10121)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11737) IndexOutOfBounds compiling query with duplicated groupby keys

2015-09-04 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731568#comment-14731568
 ] 

Szehon Ho commented on HIVE-11737:
--

Thanks, can we get rid of the max/least/coalesce stuff?

> IndexOutOfBounds compiling query with duplicated groupby keys
> -
>
> Key: HIVE-11737
> URL: https://issues.apache.org/jira/browse/HIVE-11737
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11737.1.patch, HIVE-11737.2.patch, 
> HIVE-11737.3.patch
>
>
> {noformat}
> SELECT
> tinyint_col_7,
> MIN(timestamp_col_1) AS timestamp_col,
> MAX(LEAST(CAST(COALESCE(int_col_5, -279) AS int), 
> CAST(COALESCE(tinyint_col_7, 476) AS int))) AS int_col,
> tinyint_col_7 AS int_col_1,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int)) AS int_col_2
> FROM table_3
> GROUP BY
> tinyint_col_7,
> tinyint_col_7,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int))
> {noformat}
> Query compilation fails:
> {noformat}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4633)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5630)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8987)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9864)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9757)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10193)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10204)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10121)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11614) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem

2015-09-04 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731649#comment-14731649
 ] 

Pengcheng Xiong commented on HIVE-11614:


[~jpullokkaran], most of the test failures are due to different golden files 
(the new ones sounds better). The only thing that we need to worry about is the 
failures in TestJdbcDriver2. Could you take a look? Thanks.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after 
> order by has problem
> -
>
> Key: HIVE-11614
> URL: https://issues.apache.org/jira/browse/HIVE-11614
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11614.01.patch, HIVE-11614.02.patch, 
> HIVE-11614.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11614) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem

2015-09-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731642#comment-14731642
 ] 

Hive QA commented on HIVE-11614:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754277/HIVE-11614.03.patch

{color:red}ERROR:{color} -1 due to 126 failed/errored test(s), 9399 tests 
executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view_translate
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fouter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_innerjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join40
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_alt_syntax
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_merging
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown_negative
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_louter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mergejoins
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_optional_outer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_outer_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_udf_case
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_print_header
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regex_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_router_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unionDistinct_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2

[jira] [Commented] (HIVE-11617) Explain plan for multiple lateral views is very slow

2015-09-04 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730987#comment-14730987
 ] 

Jesus Camacho Rodriguez commented on HIVE-11617:


+1

> Explain plan for multiple lateral views is very slow
> 
>
> Key: HIVE-11617
> URL: https://issues.apache.org/jira/browse/HIVE-11617
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11617.2.patch, HIVE-11617.patch, HIVE-11617.patch
>
>
> The following explain job will be very slow or never finish if there are many 
> lateral views involved. High CPU usage is also noticed.
> {noformat}
> CREATE TABLE `t1`(`pattern` array);
>   
> explain select * from t1 
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1;
> {noformat}
> From jstack, the job is busy with preorder tree traverse. 
> {noformat}
> at java.util.regex.Matcher.getTextLength(Matcher.java:1234)
> at java.util.regex.Matcher.reset(Matcher.java:308)
> at java.util.regex.Matcher.(Matcher.java:228)
> at java.util.regex.Pattern.matcher(Pattern.java:1088)
> at org.apache.hadoop.hive.ql.lib.RuleRegExp.cost(RuleRegExp.java:67)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:72)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
>

[jira] [Commented] (HIVE-11737) IndexOutOfBounds compiling query with duplicated groupby keys

2015-09-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731698#comment-14731698
 ] 

Hive QA commented on HIVE-11737:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754294/HIVE-11737.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9399 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5184/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5184/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5184/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754294 - PreCommit-HIVE-TRUNK-Build

> IndexOutOfBounds compiling query with duplicated groupby keys
> -
>
> Key: HIVE-11737
> URL: https://issues.apache.org/jira/browse/HIVE-11737
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11737.1.patch, HIVE-11737.2.patch, 
> HIVE-11737.3.patch
>
>
> {noformat}
> SELECT
> tinyint_col_7,
> MIN(timestamp_col_1) AS timestamp_col,
> MAX(LEAST(CAST(COALESCE(int_col_5, -279) AS int), 
> CAST(COALESCE(tinyint_col_7, 476) AS int))) AS int_col,
> tinyint_col_7 AS int_col_1,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int)) AS int_col_2
> FROM table_3
> GROUP BY
> tinyint_col_7,
> tinyint_col_7,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int))
> {noformat}
> Query compilation fails:
> {noformat}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4633)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5630)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8987)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9864)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9757)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10193)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10204)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10121)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11737) IndexOutOfBounds compiling query with duplicated groupby keys

2015-09-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731534#comment-14731534
 ] 

Hive QA commented on HIVE-11737:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754261/HIVE-11737.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9398 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5182/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5182/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5182/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754261 - PreCommit-HIVE-TRUNK-Build

> IndexOutOfBounds compiling query with duplicated groupby keys
> -
>
> Key: HIVE-11737
> URL: https://issues.apache.org/jira/browse/HIVE-11737
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11737.1.patch, HIVE-11737.2.patch
>
>
> {noformat}
> SELECT
> tinyint_col_7,
> MIN(timestamp_col_1) AS timestamp_col,
> MAX(LEAST(CAST(COALESCE(int_col_5, -279) AS int), 
> CAST(COALESCE(tinyint_col_7, 476) AS int))) AS int_col,
> tinyint_col_7 AS int_col_1,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int)) AS int_col_2
> FROM table_3
> GROUP BY
> tinyint_col_7,
> tinyint_col_7,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int))
> {noformat}
> Query compilation fails:
> {noformat}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4633)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5630)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8987)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9864)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9757)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10193)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10204)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10121)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-09-04 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11587:
-
Attachment: HIVE-11587.08.patch

Upload patch 8.

Instead of reusing HIVEHYBRIDGRACEHASHJOINMINWBSIZE, use a constant as the 
rowCount in MapJoinOperator.

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11737) IndexOutOfBounds compiling query with duplicated groupby keys

2015-09-04 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731574#comment-14731574
 ] 

Jimmy Xiang commented on HIVE-11737:


We do need some functions to reproduce the issue.

> IndexOutOfBounds compiling query with duplicated groupby keys
> -
>
> Key: HIVE-11737
> URL: https://issues.apache.org/jira/browse/HIVE-11737
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11737.1.patch, HIVE-11737.2.patch, 
> HIVE-11737.3.patch
>
>
> {noformat}
> SELECT
> tinyint_col_7,
> MIN(timestamp_col_1) AS timestamp_col,
> MAX(LEAST(CAST(COALESCE(int_col_5, -279) AS int), 
> CAST(COALESCE(tinyint_col_7, 476) AS int))) AS int_col,
> tinyint_col_7 AS int_col_1,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int)) AS int_col_2
> FROM table_3
> GROUP BY
> tinyint_col_7,
> tinyint_col_7,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int))
> {noformat}
> Query compilation fails:
> {noformat}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4633)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5630)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8987)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9864)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9757)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10193)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10204)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10121)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11614) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem

2015-09-04 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11614:
---
Attachment: HIVE-11614.03.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after 
> order by has problem
> -
>
> Key: HIVE-11614
> URL: https://issues.apache.org/jira/browse/HIVE-11614
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11614.01.patch, HIVE-11614.02.patch, 
> HIVE-11614.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11742) last_value window specifier enforces ordering as a partition

2015-09-04 Thread Prateek Rungta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731499#comment-14731499
 ] 

Prateek Rungta commented on HIVE-11742:
---

If folks agree with the change, I'd like to use this JIRA to contribute to 
Hive. Please assign it to me.

> last_value window specifier enforces ordering as a partition
> 
>
> Key: HIVE-11742
> URL: https://issues.apache.org/jira/browse/HIVE-11742
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Reporter: Prateek Rungta
>
> [HIVE-4262|https://issues.apache.org/jira/browse/HIVE-4262] changed the 
> partitioning behavior of the last_value function. For a specified 
> last_value() OVER X. The ordering spec within X is used in addition to the 
> partition spec for partitioning. i.e. last_value(a) OVER (PARTITION BY i 
> ORDER BY j) operates last_value(a) on all rows within the unique combination 
> of (i,j). The behavior I'd expect is for PARTITION BY i to define the 
> partitioning, and ORDER BY to define the ordering within the PARTITION. i.e. 
> last_value(a) OVER (PARTITION BY i ORDER BY j) should operate last_value(a) 
> on all rows within the unique values of (i), ordered by j within the 
> partition. 
> This was changed to be consistent with how SQLServer handled such queries. 
> [SQLServer 
> Docs|https://msdn.microsoft.com/en-us/library/hh231517.aspx?f=255=-2147217396]
>  describe their example (which performs as Hive does): 
> {quote}
> The PARTITION BY clause partitions the employees by department and the 
> LAST_VALUE function is applied to each partition independently. The ORDER BY 
> clause specified in the OVER clause determines the logical order in which the 
> LAST_VALUE function is applied to the rows in each partition.
> {quote}
> To me, their behavior is inconsistent with their description. I've filled an 
> [upstream 
> bug|https://connect.microsoft.com/SQLServer/feedback/details/1753482] with 
> Microsoft for the same. 
> [Oracle|https://oracle-base.com/articles/misc/first-value-and-last-value-analytic-functions]
>  and 
> [Redshift|http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_firstlast_WF.html]
>  both exhibit the behavior I'd expect.
> Considering Hive-4262 has been in core for 2+ years, I don't think we can 
> change the behavior without potentially impacting clients. But I would like a 
> way to enable the expected behavior at the least (behind a config flag 
> maybe?). What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11737) IndexOutOfBounds compiling query with duplicated groupby keys

2015-09-04 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-11737:
---
Attachment: HIVE-11737.2.patch

Attached patch v2 that has a unit test to cover this scenario.

> IndexOutOfBounds compiling query with duplicated groupby keys
> -
>
> Key: HIVE-11737
> URL: https://issues.apache.org/jira/browse/HIVE-11737
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11737.1.patch, HIVE-11737.2.patch
>
>
> {noformat}
> SELECT
> tinyint_col_7,
> MIN(timestamp_col_1) AS timestamp_col,
> MAX(LEAST(CAST(COALESCE(int_col_5, -279) AS int), 
> CAST(COALESCE(tinyint_col_7, 476) AS int))) AS int_col,
> tinyint_col_7 AS int_col_1,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int)) AS int_col_2
> FROM table_3
> GROUP BY
> tinyint_col_7,
> tinyint_col_7,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int))
> {noformat}
> Query compilation fails:
> {noformat}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4633)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5630)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8987)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9864)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9757)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10193)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10204)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10121)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11737) IndexOutOfBounds compiling query with duplicated groupby keys

2015-09-04 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731595#comment-14731595
 ] 

Szehon Ho commented on HIVE-11737:
--

OK +1

> IndexOutOfBounds compiling query with duplicated groupby keys
> -
>
> Key: HIVE-11737
> URL: https://issues.apache.org/jira/browse/HIVE-11737
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11737.1.patch, HIVE-11737.2.patch, 
> HIVE-11737.3.patch
>
>
> {noformat}
> SELECT
> tinyint_col_7,
> MIN(timestamp_col_1) AS timestamp_col,
> MAX(LEAST(CAST(COALESCE(int_col_5, -279) AS int), 
> CAST(COALESCE(tinyint_col_7, 476) AS int))) AS int_col,
> tinyint_col_7 AS int_col_1,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int)) AS int_col_2
> FROM table_3
> GROUP BY
> tinyint_col_7,
> tinyint_col_7,
> LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
> 476) AS int))
> {noformat}
> Query compilation fails:
> {noformat}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4633)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5630)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8987)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9864)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9757)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10193)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10204)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10121)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11593) Add aes_encrypt and aes_decrypt UDFs

2015-09-04 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731683#comment-14731683
 ] 

Jason Dere commented on HIVE-11593:
---

+1

> Add aes_encrypt and aes_decrypt UDFs
> 
>
> Key: HIVE-11593
> URL: https://issues.apache.org/jira/browse/HIVE-11593
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
> Attachments: HIVE-11593.1.patch, HIVE-11593.2.patch
>
>
> AES (Advanced Encryption Standard) algorithm.
> Oracle JRE supports AES-128 out of the box
> AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited 
> Strength Jurisdiction Policy Files are installed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11705) refactor SARG stripe filtering for ORC into a method

2015-09-04 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731707#comment-14731707
 ] 

Prasanth Jayachandran commented on HIVE-11705:
--

Left some comments in RB.

> refactor SARG stripe filtering for ORC into a method
> 
>
> Key: HIVE-11705
> URL: https://issues.apache.org/jira/browse/HIVE-11705
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11705.01.patch, HIVE-11705.patch
>
>
> For footer cache PPD to metastore, we'd need a method to do the PPD. Tiny 
> item to create it on OrcInputFormat.
> For metastore path, these methods will be called from expression proxy 
> similar to current objectstore expr filtering; it will change to have 
> serialized sarg and column list to come from request instead of conf; 
> includedCols/etc. will also come from request instead of assorted java 
> objects. 
> The types and stripe stats will need to be extracted from HBase. This is a 
> little bit of a problem, since ideally we want to be inside HBase 
> filter/coprocessor/ I'd need to take a look to see if this is possible... 
> since that filter would need to either deserialize orc, or we would need to 
> store types and stats information in some other, non-ORC manner on write. The 
> latter is probably a better idea, although it's dangerous because there's no 
> sync between this code and ORC itself.
> Meanwhile minimize dependencies for stripe picking to essentials (and conf 
> which is easy to remove).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11711) Merge hbase-metastore branch to trunk

2015-09-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730440#comment-14730440
 ] 

Hive QA commented on HIVE-11711:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754142/HIVE-11711.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9361 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5175/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5175/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5175/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754142 - PreCommit-HIVE-TRUNK-Build

> Merge hbase-metastore branch to trunk
> -
>
> Key: HIVE-11711
> URL: https://issues.apache.org/jira/browse/HIVE-11711
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 2.0.0
>
> Attachments: HIVE-11711.1.patch, HIVE-11711.2.patch, 
> HIVE-11711.3.patch
>
>
> Major development of hbase-metastore is done and it's time to merge the 
> branch back into master.
> Currently hbase-metastore is only invoked when running TestMiniTezCliDriver. 
> The instruction for setting up hbase-metastore is captured in 
> https://cwiki.apache.org/confluence/display/Hive/HBaseMetastoreDevelopmentGuide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11733) UDF GenericUDFReflect cannot find classes added by "ADD JAR"

2015-09-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730444#comment-14730444
 ] 

Hive QA commented on HIVE-11733:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754159/HIVE-11733.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5177/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5177/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5177/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-5177/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 730a404 HIVE-11657 : HIVE-2573 introduces some issues during 
metastore init (and CLI init) (Sergey Shelukhin, reviewed by Sushanth Sowmyan)
+ git clean -f -d
Removing bin/ext/hbaseimport.cmd
Removing bin/ext/hbaseimport.sh
Removing bin/ext/hbaseschematool.sh
Removing itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/hbase/
Removing itests/util/src/main/java/org/apache/hadoop/hive/metastore/
Removing metastore/src/gen/protobuf/
Removing 
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClearFileMetadataRequest.java
Removing 
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClearFileMetadataResult.java
Removing 
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprRequest.java
Removing 
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprResult.java
Removing 
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataRequest.java
Removing 
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataResult.java
Removing 
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/MetadataPpdResult.java
Removing 
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PutFileMetadataRequest.java
Removing 
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PutFileMetadataResult.java
Removing 
metastore/src/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java
Removing metastore/src/java/org/apache/hadoop/hive/metastore/hbase/
Removing metastore/src/protobuf/
Removing metastore/src/test/org/apache/hadoop/hive/metastore/hbase/
Removing 
serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDeWithEndPrefix.java
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 730a404 HIVE-11657 : HIVE-2573 introduces some issues during 
metastore init (and CLI init) (Sergey Shelukhin, reviewed by Sushanth Sowmyan)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754159 - PreCommit-HIVE-TRUNK-Build

> UDF GenericUDFReflect cannot find classes added by "ADD JAR"
>

[jira] [Assigned] (HIVE-11733) UDF GenericUDFReflect cannot find classes added by "ADD JAR"

2015-09-04 Thread Yibing Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi reassigned HIVE-11733:
-

Assignee: Yibing Shi

> UDF GenericUDFReflect cannot find classes added by "ADD JAR"
> 
>
> Key: HIVE-11733
> URL: https://issues.apache.org/jira/browse/HIVE-11733
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.2.1
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>
> When run below command:
> {quote}
> hive -e "add jar /root/hive/TestReflect.jar; \
> select reflect('com.yshi.hive.TestReflect', 'testReflect', code) from 
> sample_07 limit 3"
> {quote}
> Get below error:
> {noformat}
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> UDFReflect evaluate
> {noformat}
> The full stack trace is:
> {noformat}
> 15/09/04 07:00:37 [main]: INFO compress.CodecPool: Got brand-new decompressor 
> [.bz2]
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> UDFReflect evaluate
> 15/09/04 07:00:37 [main]: ERROR CliDriver: Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> UDFReflect evaluate
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> UDFReflect evaluate
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1657)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: UDFReflect 
> evaluate
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFReflect.evaluate(GenericUDFReflect.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:185)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:424)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:416)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
>   ... 13 more
> Caused by: java.lang.ClassNotFoundException: com.yshi.hive.TestReflect
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:190)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFReflect.evaluate(GenericUDFReflect.java:105)
>   ... 22 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11733) UDF GenericUDFReflect cannot find classes added by "ADD JAR"

2015-09-04 Thread Yibing Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-11733:
--
Attachment: HIVE-11733.1.patch

Upload the patch.

> UDF GenericUDFReflect cannot find classes added by "ADD JAR"
> 
>
> Key: HIVE-11733
> URL: https://issues.apache.org/jira/browse/HIVE-11733
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.2.1
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-11733.1.patch
>
>
> When run below command:
> {quote}
> hive -e "add jar /root/hive/TestReflect.jar; \
> select reflect('com.yshi.hive.TestReflect', 'testReflect', code) from 
> sample_07 limit 3"
> {quote}
> Get below error:
> {noformat}
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> UDFReflect evaluate
> {noformat}
> The full stack trace is:
> {noformat}
> 15/09/04 07:00:37 [main]: INFO compress.CodecPool: Got brand-new decompressor 
> [.bz2]
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> UDFReflect evaluate
> 15/09/04 07:00:37 [main]: ERROR CliDriver: Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> UDFReflect evaluate
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> UDFReflect evaluate
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1657)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: UDFReflect 
> evaluate
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFReflect.evaluate(GenericUDFReflect.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:185)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:424)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:416)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
>   ... 13 more
> Caused by: java.lang.ClassNotFoundException: com.yshi.hive.TestReflect
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:190)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFReflect.evaluate(GenericUDFReflect.java:105)
>   ... 22 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11669) OrcFileDump service should support directories

2015-09-04 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730510#comment-14730510
 ] 

Lefty Leverenz commented on HIVE-11669:
---

Thanks [~prasanth_j].  I added a link to this JIRA issue.

I also opened ORC-26 so we won't forget to document the file dump utility when 
the time comes.



> OrcFileDump service should support directories
> --
>
> Key: HIVE-11669
> URL: https://issues.apache.org/jira/browse/HIVE-11669
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: TODOC1.3
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11669.1.patch
>
>
> orcfiledump service does not support directories. If directory is specified 
> then the program should iterate through all the files in the directory and 
> perform file dump.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11669) OrcFileDump service should support directories

2015-09-04 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-11669:
--
Labels:   (was: TODOC1.3)

> OrcFileDump service should support directories
> --
>
> Key: HIVE-11669
> URL: https://issues.apache.org/jira/browse/HIVE-11669
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11669.1.patch
>
>
> orcfiledump service does not support directories. If directory is specified 
> then the program should iterate through all the files in the directory and 
> perform file dump.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

54 matches

Mail list logo