date:20151021

[jira] [Commented] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements

2015-10-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966850#comment-14966850
 ] 

Hive QA commented on HIVE-11901:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12767754/HIVE-11901.02.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9697 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5722/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5722/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5722/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12767754 - PreCommit-HIVE-TRUNK-Build

> StorageBasedAuthorizationProvider requires write permission on table for 
> SELECT statements
> --
>
> Key: HIVE-11901
> URL: https://issues.apache.org/jira/browse/HIVE-11901
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HIVE-11901.01.patch, HIVE-11901.02.patch
>
>
> With HIVE-7895, it will require write permission on the table directory even 
> for a SELECT statement.
> Looking at the stacktrace, it seems the method 
> {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, 
> Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats 
> a null partition as a CREATE statement, which can also be a SELECT.
> We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first   
> in order to tell which statement it is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12200) INSERT INTO table using a select statement w/o a FROM clause fails

2015-10-21 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-12200:
---
Labels: TODOC1.3  (was: )

> INSERT INTO table using a select statement w/o a FROM clause fails
> --
>
> Key: HIVE-12200
> URL: https://issues.apache.org/jira/browse/HIVE-12200
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
>  Labels: TODOC1.3
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12200.1.patch
>
>
> Here is the stack trace:
> {noformat}
> FailedPredicateException(regularBody,{$s.tree.getChild(1) !=null}?)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:41047)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40222)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40092)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1656)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1140)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:407)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:312)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1162)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1215)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1091)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1081)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:225)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:177)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:388)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:323)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:731)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:704)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 1:29 Failed to recognize predicate ''. 
> Failed rule: 'regularBody' in statement
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11973) IN operator fails when the column type is DATE

2015-10-21 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967317#comment-14967317
 ] 

Jimmy Xiang commented on HIVE-11973:


With the patch, it means the common type of date and string is date, which 
seems not right.
IN is handled differently from comparison. That's why the behavior is like in 
this jira description.

> IN operator fails when the column type is DATE 
> ---
>
> Key: HIVE-11973
> URL: https://issues.apache.org/jira/browse/HIVE-11973
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0
>Reporter: sanjiv singh
>Assignee: Yongzhi Chen
> Attachments: HIVE-11973.1.patch
>
>
> Test DLL :
> {code}
> CREATE TABLE `date_dim`(
>   `d_date_sk` int, 
>   `d_date_id` string, 
>   `d_date` date, 
>   `d_current_week` string, 
>   `d_current_month` string, 
>   `d_current_quarter` string, 
>   `d_current_year` string) ;
> {code}
> Hive query :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date  IN ('2000-03-22','2001-03-22')  ;
> {code}
> In 1.0.0 ,  the above query fails with:
> {code}
> FAILED: SemanticException [Error 10014]: Line 1:180 Wrong arguments 
> ''2001-03-22'': The arguments for IN should be the same type! Types are: 
> {date IN (string, string)}
> {code}
> I changed the query as given to pass the error :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date  IN (CAST('2000-03-22' AS DATE) , CAST('2001-03-22' AS DATE) 
>  )  ;
> {code}
> But it works without casting  :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date   = '2000-03-22' ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12213) Investigating the test failure TestHCatClient.testTableSchemaPropagation

2015-10-21 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966827#comment-14966827
 ] 

Aihua Xu commented on HIVE-12213:
-

Got you. 


Seems you need to check fileStatus is empty after {{DO_NOT_UPDATE_STATS}} check 
in the following function. The following test failures seem to be caused by 
that.

{noformat}
public static boolean updateTableStatsFast(Table tbl,
  FileStatus[] fileStatus, boolean newDir, boolean forceRecompute) throws 
MetaException {
{noformat}

BTW: I feel it's a overkill to create a new isEmpty() function. 

> Investigating the test failure TestHCatClient.testTableSchemaPropagation
> 
>
> Key: HIVE-12213
> URL: https://issues.apache.org/jira/browse/HIVE-12213
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aleksei Statkevich
>Priority: Minor
> Attachments: HIVE-12213.patch, HIVE-12231.1.patch
>
>
> The test has been failing for some time with following error.
> {noformat}
> Error Message
> Table after deserialization should have been identical to sourceTable. 
> expected:<[TABLE_PROPERTIES]> but was:<[]>
> Stacktrace
> java.lang.AssertionError: Table after deserialization should have been 
> identical to sourceTable. expected:<[TABLE_PROPERTIES]> but was:<[]>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation(TestHCatClient.java:1065)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-10-21 Thread Naveen Gangam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967237#comment-14967237
 ] 

Naveen Gangam commented on HIVE-12184:
--

Review posted to RB at https://reviews.apache.org/r/39508/

> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Attachments: HIVE-12184.2.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12200) INSERT INTO table using a select statement w/o a FROM clause fails

2015-10-21 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967230#comment-14967230
 ] 

Jimmy Xiang commented on HIVE-12200:


Added. For select statement, the doc for [LanguageManual Select | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select] 
probably should be amended as well.

> INSERT INTO table using a select statement w/o a FROM clause fails
> --
>
> Key: HIVE-12200
> URL: https://issues.apache.org/jira/browse/HIVE-12200
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
>  Labels: TODOC1.3
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12200.1.patch
>
>
> Here is the stack trace:
> {noformat}
> FailedPredicateException(regularBody,{$s.tree.getChild(1) !=null}?)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:41047)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40222)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40092)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1656)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1140)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:407)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:312)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1162)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1215)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1091)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1081)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:225)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:177)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:388)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:323)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:731)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:704)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 1:29 Failed to recognize predicate ''. 
> Failed rule: 'regularBody' in statement
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11985) don't store type names in metastore when metastore type names are not used

2015-10-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968519#comment-14968519
 ] 

Sergey Shelukhin commented on HIVE-11985:
-

In the previous case, we stored what was in the schema; on some DBs it could be 
truncated and on some DBs it would make it impossible to create the table 
(Oracle). In the former case the truncation will not actually be visible to the 
user, because metastore values are never used, wrong data is stored in 
metastore. In any case, it is not always possible to get rid of truncation 
because varchar (e.g. in Oracle) is limited, you cannot make the column bigger 
than the limit.
I think there's also a risk from storing values in metastore in this case, 
because if someone were to actually use the values, it would be incorrect.
For example, if schema is changed, and someone takes the type from metastore, 
type will be incorrect; or, if someone tries to make changes in metastore as 
normal from some new code in Hive, these changes will be ignored because schema 
comes from deserializer.
The patch changes logic to not store values when they should not be used.

> don't store type names in metastore when metastore type names are not used
> --
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.02.patch, 
> HIVE-11985.03.patch, HIVE-11985.05.patch, HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11540) HDP 2.3 and Flume 1.6: Hive Streaming – Too many delta files during Compaction

2015-10-21 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11540:
--
Attachment: HIVE-11540.patch

> HDP 2.3 and Flume 1.6: Hive Streaming – Too many delta files during Compaction
> --
>
> Key: HIVE-11540
> URL: https://issues.apache.org/jira/browse/HIVE-11540
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Nivin Mathew
>Assignee: Eugene Koifman
> Attachments: HIVE-11540.patch
>
>
> Hello,
> I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with 
> an average of 20 million records a day. I have 5 compactors running at 
> various times (30m/5m/5s), no matter what time I give, the compactors seem to 
> run out of memory cleaning up a couple thousand delta files and ultimately 
> falls behind compacting/cleaning delta files. Any suggestions on what I can 
> do to improve performance? Or can Hive streaming not handle this kind of load?
> I used this post as reference: 
> http://henning.kropponline.de/2015/05/19/hivesink-for-flume/
> 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
> Error running child : java.lang.OutOfMemoryError: Direct buffer memory
> Max block location exceeded for split: CompactorInputSplit{base: 
> hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406,
>  bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, 
> delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, 
> delta_1056415_1056416, delta_1056417_1056418,…
> , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, 
> delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, 
> delta_1074051_1074052]} splitsize: 8772 maxsize: 10
> 2015-08-12 15:34:25,271 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of 
> splits:3
> 2015-08-12 15:34:25,367 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting 
> tokens for job: job_1439397150426_0068
> 2015-08-12 15:34:25,603 INFO  [upladevhwd04v.researchnow.com-18]: 
> impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted 
> application application_1439397150426_0068
> 2015-08-12 15:34:25,610 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:submit(1294)) - The url to track the job: 
> http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/
> 2015-08-12 15:34:25,611 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: 
> job_1439397150426_0068
> 2015-08-12 15:34:30,170 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:33,756 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job 
> job_1439397150426_0068 running in uber mode : false
> 2015-08-12 15:34:33,757 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
> 2015-08-12 15:34:35,147 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:40,155 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:45,184 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:50,201 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:55,256 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:00,205 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:02,975 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 33% reduce 0%
> 2015-08-12 15:35:02,982 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_00_0, Status : FAILED
> 2015-08-12 15:35:03,000 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_01_0, Status : FAILED
> 2015-08-12 15:35:04,008 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job

[jira] [Updated] (HIVE-11540) HDP 2.3 and Flume 1.6: Hive Streaming – Too many delta files during Compaction

2015-10-21 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11540:
--
Description: 
Hello,

I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with 
an average of 20 million records a day. I have 5 compactors running at various 
times (30m/5m/5s), no matter what time I give, the compactors seem to run out 
of memory cleaning up a couple thousand delta files and ultimately falls behind 
compacting/cleaning delta files. Any suggestions on what I can do to improve 
performance? Or can Hive streaming not handle this kind of load?

I used this post as reference: 
http://henning.kropponline.de/2015/05/19/hivesink-for-flume/

{noformat}
2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error 
running child : java.lang.OutOfMemoryError: Direct buffer memory

Max block location exceeded for split: CompactorInputSplit{base: 
hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406,
 bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, 
delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, 
delta_1056415_1056416, delta_1056417_1056418,…
, delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, 
delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, 
delta_1074051_1074052]} splitsize: 8772 maxsize: 10
2015-08-12 15:34:25,271 INFO  [upladevhwd04v.researchnow.com-18]: 
mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of 
splits:3
2015-08-12 15:34:25,367 INFO  [upladevhwd04v.researchnow.com-18]: 
mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting tokens 
for job: job_1439397150426_0068
2015-08-12 15:34:25,603 INFO  [upladevhwd04v.researchnow.com-18]: 
impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted 
application application_1439397150426_0068
2015-08-12 15:34:25,610 INFO  [upladevhwd04v.researchnow.com-18]: mapreduce.Job 
(Job.java:submit(1294)) - The url to track the job: 
http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/
2015-08-12 15:34:25,611 INFO  [upladevhwd04v.researchnow.com-18]: mapreduce.Job 
(Job.java:monitorAndPrintJob(1339)) - Running job: job_1439397150426_0068
2015-08-12 15:34:30,170 INFO  [Thread-7]: compactor.Initiator 
(Initiator.java:run(88)) - Checking to see if we should compact 
weblogs.vop_hs.dt=15-08-12
2015-08-12 15:34:33,756 INFO  [upladevhwd04v.researchnow.com-18]: mapreduce.Job 
(Job.java:monitorAndPrintJob(1360)) - Job job_1439397150426_0068 running in 
uber mode : false
2015-08-12 15:34:33,757 INFO  [upladevhwd04v.researchnow.com-18]: mapreduce.Job 
(Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
2015-08-12 15:34:35,147 INFO  [Thread-7]: compactor.Initiator 
(Initiator.java:run(88)) - Checking to see if we should compact 
weblogs.vop_hs.dt=15-08-12
2015-08-12 15:34:40,155 INFO  [Thread-7]: compactor.Initiator 
(Initiator.java:run(88)) - Checking to see if we should compact 
weblogs.vop_hs.dt=15-08-12
2015-08-12 15:34:45,184 INFO  [Thread-7]: compactor.Initiator 
(Initiator.java:run(88)) - Checking to see if we should compact 
weblogs.vop_hs.dt=15-08-12
2015-08-12 15:34:50,201 INFO  [Thread-7]: compactor.Initiator 
(Initiator.java:run(88)) - Checking to see if we should compact 
weblogs.vop_hs.dt=15-08-12
2015-08-12 15:34:55,256 INFO  [Thread-7]: compactor.Initiator 
(Initiator.java:run(88)) - Checking to see if we should compact 
weblogs.vop_hs.dt=15-08-12
2015-08-12 15:35:00,205 INFO  [Thread-7]: compactor.Initiator 
(Initiator.java:run(88)) - Checking to see if we should compact 
weblogs.vop_hs.dt=15-08-12
2015-08-12 15:35:02,975 INFO  [upladevhwd04v.researchnow.com-18]: mapreduce.Job 
(Job.java:monitorAndPrintJob(1367)) -  map 33% reduce 0%
2015-08-12 15:35:02,982 INFO  [upladevhwd04v.researchnow.com-18]: mapreduce.Job 
(Job.java:printTaskEvents(1406)) - Task Id : 
attempt_1439397150426_0068_m_00_0, Status : FAILED
2015-08-12 15:35:03,000 INFO  [upladevhwd04v.researchnow.com-18]: mapreduce.Job 
(Job.java:printTaskEvents(1406)) - Task Id : 
attempt_1439397150426_0068_m_01_0, Status : FAILED
2015-08-12 15:35:04,008 INFO  [upladevhwd04v.researchnow.com-18]: mapreduce.Job 
(Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
2015-08-12 15:35:05,132 INFO  [Thread-7]: compactor.Initiator 
(Initiator.java:run(88)) - Checking to see if we should compact 
weblogs.vop_hs.dt=15-08-12
2015-08-12 15:35:10,206 INFO  [Thread-7]: compactor.Initiator 
(Initiator.java:run(88)) - Checking to see if we should compact 
weblogs.vop_hs.dt=15-08-12
2015-08-12 15:35:15,228 INFO  [Thread-7]: compactor.Initiator 
(Initiator.java:run(88)) - Checking to see if we should compact 
weblogs.vop_hs.dt=15-08-12
2015-08-12 15:35:20,207 INFO  [Thread-7]: compactor.Initiator 
(Initiator.java:run(88)) - Checking to see if we should compact

[jira] [Commented] (HIVE-11473) Upgrade Spark dependency to 1.5 [Spark Branch]

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968493#comment-14968493
 ] 

Xuefu Zhang commented on HIVE-11473:


Yes, I think that's a viable way. However, currently the two branches are out 
of synch, so we have to wait until all patches from Spark branch are committed 
to master. For that, we need to have the branch test work first.

> Upgrade Spark dependency to 1.5 [Spark Branch]
> --
>
> Key: HIVE-11473
> URL: https://issues.apache.org/jira/browse/HIVE-11473
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Jimmy Xiang
>Assignee: Rui Li
> Attachments: HIVE-11473.1-spark.patch, HIVE-11473.2-spark.patch, 
> HIVE-11473.3-spark.patch, HIVE-11473.3-spark.patch
>
>
> In Spark 1.5, SparkListener interface is changed. So HoS may fail to create 
> the spark client if the un-implemented event callback method is invoked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12230) custom UDF configure() not called in Vectorization mode

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968498#comment-14968498
 ] 

Xuefu Zhang commented on HIVE-12230:


The code looks quite entertaining. Not sure how it made it way there.

> custom UDF configure() not called in Vectorization mode
> ---
>
> Key: HIVE-12230
> URL: https://issues.apache.org/jira/browse/HIVE-12230
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>
> PROBLEM:
> A custom UDF that overrides the configure()
> {code}
> @Override
>   public void configure(MapredContext context) {
>   greeting = "Hello ";
>   }
> {code}
> In vectorization mode, it is not called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12201) Tez settings need to be shown in set -v output when execution engine is tez.

2015-10-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968568#comment-14968568
 ] 

Hive QA commented on HIVE-12201:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12767862/HIVE-12201.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9698 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5730/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5730/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5730/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12767862 - PreCommit-HIVE-TRUNK-Build

> Tez settings need to be shown in set -v output when execution engine is tez.
> 
>
> Key: HIVE-12201
> URL: https://issues.apache.org/jira/browse/HIVE-12201
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.0.1, 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>Priority: Minor
> Attachments: HIVE-12201.1.patch, HIVE-12201.2.patch, 
> HIVE-12201.3.patch
>
>
> The set -v output currently shows configurations for yarn, hdfs etc. but does 
> not show tez settings when tez is set as the execution engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11721) non-ascii characters shows improper with "insert into"

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968578#comment-14968578
 ] 

Xuefu Zhang commented on HIVE-11721:


+1

> non-ascii characters shows improper with "insert into"
> --
>
> Key: HIVE-11721
> URL: https://issues.apache.org/jira/browse/HIVE-11721
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 1.1.0, 1.2.1, 2.0.0
>Reporter: Jun Yin
>Assignee: Aleksei Statkevich
> Attachments: HIVE-11721.1.patch, HIVE-11721.patch
>
>
> Hive: 1.1.0
> hive> create table char_255_noascii as select cast("Garçu 谢谢 Kôkaku 
> ありがとうございますkidôtai한국어" as char(255));
> hive> select * from char_255_noascii;
> OK
> Garçu 谢谢 Kôkaku ありがとうございますkidôtai>한국어
> it shows correct, and also it works good with "LOAD DATA" 
> but when I try another way to insert data as below:
> hive> create table nonascii(t1 char(255));
> OK
> Time taken: 0.125 seconds
> hive> insert into nonascii values("Garçu 谢谢 Kôkaku ありがとうございますkidôtai한국어");
> hive> select * from nonascii;
> OK
> Gar�u "" K�kaku B�LhFTVD~Ykid�tai\m� 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12080) Support auto type widening for Parquet table

2015-10-21 Thread Mohammad Kamrul Islam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-12080:
-
Attachment: HIVE-12080.3.patch

Addressed test case failures.

> Support auto type widening for Parquet table
> 
>
> Key: HIVE-12080
> URL: https://issues.apache.org/jira/browse/HIVE-12080
> Project: Hive
>  Issue Type: New Feature
>  Components: File Formats
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-12080.1.patch, HIVE-12080.2.patch, 
> HIVE-12080.3.patch
>
>
> Currently Hive+Parquet doesn't support it. It should include at least basic 
> type promotions short->int->bigint,  float->double etc, that are already 
> supported for  other file formats.
> There were similar effort (Hive-6784) but was not committed. This JIRA is to 
> address the same in different way with little (no) performance impact.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11973) IN operator fails when the column type is DATE

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968583#comment-14968583
 ] 

Xuefu Zhang commented on HIVE-11973:


It seems to me that x IN (val1, val2, ..., valn) should be equivalent to x = 
val1 OR x = val2 ... OR x = valn. Could we check how "WHERE d_date   = 
'2000-03-22'" works and apply the same logic here? Maybe, we should just need 
to add an implicit cast when the type doesn't match.

> IN operator fails when the column type is DATE 
> ---
>
> Key: HIVE-11973
> URL: https://issues.apache.org/jira/browse/HIVE-11973
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0
>Reporter: sanjiv singh
>Assignee: Yongzhi Chen
> Attachments: HIVE-11973.1.patch, HIVE-11973.2.patch
>
>
> Test DLL :
> {code}
> CREATE TABLE `date_dim`(
>   `d_date_sk` int, 
>   `d_date_id` string, 
>   `d_date` date, 
>   `d_current_week` string, 
>   `d_current_month` string, 
>   `d_current_quarter` string, 
>   `d_current_year` string) ;
> {code}
> Hive query :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date  IN ('2000-03-22','2001-03-22')  ;
> {code}
> In 1.0.0 ,  the above query fails with:
> {code}
> FAILED: SemanticException [Error 10014]: Line 1:180 Wrong arguments 
> ''2001-03-22'': The arguments for IN should be the same type! Types are: 
> {date IN (string, string)}
> {code}
> I changed the query as given to pass the error :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date  IN (CAST('2000-03-22' AS DATE) , CAST('2001-03-22' AS DATE) 
>  )  ;
> {code}
> But it works without casting  :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date   = '2000-03-22' ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11378) Remove hadoop-1 support from master branch

2015-10-21 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-11378:
--
Attachment: HIVE-11378.4.patch

Rebased patch.  Also I think I found the issue that was keeping it from 
properly generating TestMiniTezCliDriver.

> Remove hadoop-1 support from master branch
> --
>
> Key: HIVE-11378
> URL: https://issues.apache.org/jira/browse/HIVE-11378
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 2.0.0
>
> Attachments: HIVE-11378.2.patch, HIVE-11378.3.patch, 
> HIVE-11378.4.patch, HIVE-11378.patch
>
>
> When we branched branch-1 one of the goals was the ability to remove hadoop1 
> support from master.  I propose to do this softly at first by removing it 
> from the poms removing the 20S implementation of the shims.  
> I am not going to remove the shim layer.  That would be much more disruptive. 
>  Also, I haven't done the homework to see if we could, as there may still be 
> incompatibility issues between various versions of hadoop2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11721) non-ascii characters shows improper with "insert into"

2015-10-21 Thread Aleksei Statkevich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968572#comment-14968572
 ] 

Aleksei Statkevich commented on HIVE-11721:
---

Is anything else needed to merge this patch? Also, related issue is solved in 
HIVE-12207.

> non-ascii characters shows improper with "insert into"
> --
>
> Key: HIVE-11721
> URL: https://issues.apache.org/jira/browse/HIVE-11721
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 1.1.0, 1.2.1, 2.0.0
>Reporter: Jun Yin
>Assignee: Aleksei Statkevich
> Attachments: HIVE-11721.1.patch, HIVE-11721.patch
>
>
> Hive: 1.1.0
> hive> create table char_255_noascii as select cast("Garçu 谢谢 Kôkaku 
> ありがとうございますkidôtai한국어" as char(255));
> hive> select * from char_255_noascii;
> OK
> Garçu 谢谢 Kôkaku ありがとうございますkidôtai>한국어
> it shows correct, and also it works good with "LOAD DATA" 
> but when I try another way to insert data as below:
> hive> create table nonascii(t1 char(255));
> OK
> Time taken: 0.125 seconds
> hive> insert into nonascii values("Garçu 谢谢 Kôkaku ありがとうございますkidôtai한국어");
> hive> select * from nonascii;
> OK
> Gar�u "" K�kaku B�LhFTVD~Ykid�tai\m� 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12202) NPE thrown when reading legacy ACID delta files

2015-10-21 Thread Elliot West (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-12202:
---
Attachment: HIVE-12202.0.patch

Attached patch [^HIVE-12202.0.patch].

> NPE thrown when reading legacy ACID delta files
> ---
>
> Key: HIVE-12202
> URL: https://issues.apache.org/jira/browse/HIVE-12202
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: transactions
> Attachments: HIVE-12202.0.patch
>
>
> When reading legacy ACID deltas of the form {{delta_$startTxnId_$endTxnId}} a 
> {{NullPointerException}} is thrown on:
> {code:title=org.apache.hadoop.hive.ql.io.AcidUtils.deserializeDeltas#371}
> if(dmd.getStmtIds().isEmpty()) {
> {code}
> The older ACID data format (pre-Hive 1.3.0) which does not include the 
> statement ID, and code written for that format should still be supported. 
> Therefore the above condition should also include a null check or 
> alternatively {{AcidInputFormat.DeltaMetaData}} should never return null, and 
> return an empty list in this specific scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11565) LLAP: Some counters are incorrect

2015-10-21 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967408#comment-14967408
 ] 

Siddharth Seth commented on HIVE-11565:
---

I don't think the failures are related. The patch affects LLAP and the version 
of Tez being used.

> LLAP: Some counters are incorrect
> -
>
> Key: HIVE-11565
> URL: https://issues.apache.org/jira/browse/HIVE-11565
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Siddharth Seth
> Attachments: HIVE-11565.1.patch, HIVE-11565.1.patch, HIVE-11565.1.txt
>
>
> 1) Tez counters for LLAP are incorrect.
> 2) Some counters, such as cache hit ratio for a fragment, are not propagated.
> We need to make sure that Tez counters for LLAP are usable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12202) NPE thrown when reading legacy ACID delta files

2015-10-21 Thread Elliot West (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967376#comment-14967376
 ] 

Elliot West commented on HIVE-12202:


I've checked to see that {{AcidUtils.serializeDeltas}} is being used correctly 
in conjunction with {{AcidUtils.deserializeDeltas}}. It appears that 
{{serializeDeltas}} does indeed create {{DeltaMetaData}} instances with an 
empty list for the statement IDs for delta paths containing only 
{{$startTxnId}} and {{$endTxnId}}. However, the deserialization process in  
{{AcidInputFormat.DeltaMetaData.readFields(DataInput)}} incorrectly sets 
{{stmtIds}} to {{null}} at line 152 if no statement count was serialized. Hence 
{{AcidUtils.deserializeDeltas}} then gets tripped up by an NPE at line 371.

> NPE thrown when reading legacy ACID delta files
> ---
>
> Key: HIVE-12202
> URL: https://issues.apache.org/jira/browse/HIVE-12202
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: transactions
>
> When reading legacy ACID deltas of the form {{delta_$startTxnId_$endTxnId}} a 
> {{NullPointerException}} is thrown on:
> {code:title=org.apache.hadoop.hive.ql.io.AcidUtils.deserializeDeltas#371}
> if(dmd.getStmtIds().isEmpty()) {
> {code}
> The older ACID data format (pre-Hive 1.3.0) which does not include the 
> statement ID, and code written for that format should still be supported. 
> Therefore the above condition should also include a null check or 
> alternatively {{AcidInputFormat.DeltaMetaData}} should never return null, and 
> return an empty list in this specific scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9582) HCatalog should use IMetaStoreClient interface

2015-10-21 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967419#comment-14967419
 ] 

Roshan Naik commented on HIVE-9582:
---

In this patch, HCatUtil.getHiveMetastoreClient()  uses double checked locking 
pattern
to implement singleton, which is a broken pattern. 

Created HIVE-12221

> HCatalog should use IMetaStoreClient interface
> --
>
> Key: HIVE-9582
> URL: https://issues.apache.org/jira/browse/HIVE-9582
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore
>Affects Versions: 0.14.0, 0.13.1
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>  Labels: hcatalog, metastore, rolling_upgrade
> Fix For: 1.2.0
>
> Attachments: HIVE-9582.1.patch, HIVE-9582.2.patch, HIVE-9582.3.patch, 
> HIVE-9582.4.patch, HIVE-9582.5.patch, HIVE-9582.6.patch, HIVE-9582.7.patch, 
> HIVE-9582.8.patch, HIVE-9583.1.patch
>
>
> Hive uses IMetaStoreClient and it makes using RetryingMetaStoreClient easy. 
> Hence during a failure, the client retries and possibly succeeds. But 
> HCatalog has long been using HiveMetaStoreClient directly and hence failures 
> are costly, especially if they are during the commit stage of a job. Its also 
> not possible to do rolling upgrade of MetaStore Server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12202) NPE thrown when reading legacy ACID delta files

2015-10-21 Thread Elliot West (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967402#comment-14967402
 ] 

Elliot West commented on HIVE-12202:


Some clarification: in my earlier comment, by 'no statement count' I mean a 
statement count < 1. Also, I believe this is a bug and a lack of resilience to 
incorrect usage of the API.

> NPE thrown when reading legacy ACID delta files
> ---
>
> Key: HIVE-12202
> URL: https://issues.apache.org/jira/browse/HIVE-12202
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: transactions
>
> When reading legacy ACID deltas of the form {{delta_$startTxnId_$endTxnId}} a 
> {{NullPointerException}} is thrown on:
> {code:title=org.apache.hadoop.hive.ql.io.AcidUtils.deserializeDeltas#371}
> if(dmd.getStmtIds().isEmpty()) {
> {code}
> The older ACID data format (pre-Hive 1.3.0) which does not include the 
> statement ID, and code written for that format should still be supported. 
> Therefore the above condition should also include a null check or 
> alternatively {{AcidInputFormat.DeltaMetaData}} should never return null, and 
> return an empty list in this specific scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-10-21 Thread Naveen Gangam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967618#comment-14967618
 ] 

Naveen Gangam commented on HIVE-12184:
--

[~ashutoshc] [~xuefuz] Thanks for pointing to these jiras. Sounds like we want 
to simplify the syntax and not make any assumptions. 
Something like 
DESCRIBE [DBNAME.]TABLENAME [COLNAME]
would be the cleanest. 
dbname(optional) and tablename are DOT-separated while table and (optional) 
column are SPACE-separated.

Currently, the implementation already supports the above syntax. But it also 
supports additional forms where table and column can also be DOT-separated that 
will need to be dropped going forward.

However, I have a couple of concerns with changing the grammar.
1) The semantics of this grammar conflicts with the semantics of SELECT queries 
where (optional) table and column are DOT-separated.
2) Such change would not be backward compatible and any attempts to maintain 
backward compatibility will make matter worse. IMHO, even if we do this in a 
major release, it will still break user scripts when they upgrade.

Given that we already have support for all forms (pending this bug), should we 
just update Hive-->help and the documentation to provide guidance on 
recommended syntax? 

Thank you

> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Attachments: HIVE-12184.2.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-10-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967511#comment-14967511
 ] 

Sergey Shelukhin commented on HIVE-11777:
-

Will do before commit

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.01.patch, HIVE-11777.02.patch, 
> HIVE-11777.03.patch, HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-10-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967512#comment-14967512
 ] 

Sergey Shelukhin commented on HIVE-11777:
-

Will do before commit

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.01.patch, HIVE-11777.02.patch, 
> HIVE-11777.03.patch, HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-12220) LLAP: Usability issues with hive.llap.io.cache.orc.size

2015-10-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-12220:
---

Assignee: Sergey Shelukhin

> LLAP: Usability issues with hive.llap.io.cache.orc.size
> ---
>
> Key: HIVE-12220
> URL: https://issues.apache.org/jira/browse/HIVE-12220
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: llap
>Reporter: Carter Shanklin
>Assignee: Sergey Shelukhin
>
> In the llap-daemon site you need to set, among other things,
> llap.daemon.memory.per.instance.mb
> and
> hive.llap.io.cache.orc.size
> The use of hive.llap.io.cache.orc.size caused me some unnecessary problems, 
> initially I entered the value in MB rather than in bytes. Operator error you 
> could say but I look at this as a fraction of the other value which is in mb.
> Second, is this really tied to ORC? E.g. when we have the vectorized text 
> reader will this data be cached as well? Or might it be in the future?
> I would like to propose instead using hive.llap.io.cache.size.mb for this 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12202) NPE thrown when reading legacy ACID delta files

2015-10-21 Thread Elliot West (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967524#comment-14967524
 ] 

Elliot West commented on HIVE-12202:


And another clarification to an earlier comment of mine: when I said 'a lack of 
resilience', I meant 'and not a lack of resilience'. Apologies, not quite with 
it today!

> NPE thrown when reading legacy ACID delta files
> ---
>
> Key: HIVE-12202
> URL: https://issues.apache.org/jira/browse/HIVE-12202
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: transactions
> Attachments: HIVE-12202.0.patch
>
>
> When reading legacy ACID deltas of the form {{delta_$startTxnId_$endTxnId}} a 
> {{NullPointerException}} is thrown on:
> {code:title=org.apache.hadoop.hive.ql.io.AcidUtils.deserializeDeltas#371}
> if(dmd.getStmtIds().isEmpty()) {
> {code}
> The older ACID data format (pre-Hive 1.3.0) which does not include the 
> statement ID, and code written for that format should still be supported. 
> Therefore the above condition should also include a null check or 
> alternatively {{AcidInputFormat.DeltaMetaData}} should never return null, and 
> return an empty list in this specific scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12061) add file type support to file metadata by expr call

2015-10-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12061:

Attachment: HIVE-12061.02.patch

Not sure what is up with the build. Regenerated the thrift again; the nogen 
patch didn't change.
[~alangates] [~daijy] can you review the nogen patch?

> add file type support to file metadata by expr call
> ---
>
> Key: HIVE-12061
> URL: https://issues.apache.org/jira/browse/HIVE-12061
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12061.01.nogen.patch, HIVE-12061.01.patch, 
> HIVE-12061.02.patch, HIVE-12061.nogen.patch, HIVE-12061.patch
>
>
> Expr filtering, automatic caching, etc. should be aware of file types for 
> advanced features. For now only ORC is supported, but I want to add a 
> boundary between ORC-specific and general metastore code, that could later be 
> used for other formats if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12178) LLAP: NPE in LRFU policy

2015-10-21 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967552#comment-14967552
 ] 

Prasanth Jayachandran commented on HIVE-12178:
--

+1

> LLAP: NPE in LRFU policy
> 
>
> Key: HIVE-12178
> URL: https://issues.apache.org/jira/browse/HIVE-12178
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12178.patch
>
>
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.removeFromListUnderLock(LowLevelLrfuCachePolicy.java:346)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.removeFromListAndUnlock(LowLevelLrfuCachePolicy.java:335)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.notifyUnlock(LowLevelLrfuCachePolicy.java:133)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheImpl.unlockBuffer(LowLevelCacheImpl.java:354)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheImpl.releaseBuffers(LowLevelCacheImpl.java:344)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.returnData(OrcEncodedDataReader.java:662)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.returnData(OrcEncodedDataReader.java:74)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.returnSourceData(EncodedDataConsumer.java:131)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:122)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:36)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:405)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:413)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:194)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:191)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:191)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:74)
> at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
> ... 4 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967611#comment-14967611
 ] 

Xuefu Zhang commented on HIVE-12063:


[~szehon], [~ctang.ma], and [~jxiang], can any of you review the patch here? 
Thanks.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12170) normalize HBase metastore connection configuration

2015-10-21 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-12170:
--
Attachment: HIVE-12170.2.patch

Second version of the patch, it adds an info message if the config has already 
been set.  I chose info instead of warning because each thread in the metastore 
or HS2 will end up calling this, so that would be a lot of warnings for nothing.

> normalize HBase metastore connection configuration
> --
>
> Key: HIVE-12170
> URL: https://issues.apache.org/jira/browse/HIVE-12170
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HIVE-12170.2.patch, HIVE-12170.patch
>
>
> Right now there are two ways to get HBaseReadWrite instance in metastore. 
> Both get a threadlocal instance (is there a good reason for that?).
> 1) One is w/o conf and only works if someone called the (2) before, from any 
> thread.
> 2) The other blindly sets a static conf and then gets an instance with that 
> conf, or if someone already happened to call (1) or (2) from this thread, it 
> returns the existing instance with whatever conf was set before (but still 
> resets the current conf to new conf).
> This doesn't make sense even in an already-thread-safe case (like linear 
> CLI-based tests), and can easily lead to bugs as described; the config 
> propagation logic is not good (example - HIVE-12167); some calls just reset 
> config blindly, so there's no point in setting staticConf, other than for the 
> callers of method (1) above who don't have a conf and would rely on the 
> static (which is bad design).
> Having connections with different configs reliably in not possible, and 
> multi-threaded cases would also break - you could even set conf, have it 
> reset and get instance with somebody else's conf. 
> Static should definitely be removed, maybe threadlocal too (HConnection is 
> thread-safe).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12202) NPE thrown when reading legacy ACID delta files

2015-10-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967652#comment-14967652
 ] 

Hive QA commented on HIVE-12202:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12767804/HIVE-12202.0.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9685 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5723/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5723/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5723/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12767804 - PreCommit-HIVE-TRUNK-Build

> NPE thrown when reading legacy ACID delta files
> ---
>
> Key: HIVE-12202
> URL: https://issues.apache.org/jira/browse/HIVE-12202
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: transactions
> Attachments: HIVE-12202.0.patch
>
>
> When reading legacy ACID deltas of the form {{delta_$startTxnId_$endTxnId}} a 
> {{NullPointerException}} is thrown on:
> {code:title=org.apache.hadoop.hive.ql.io.AcidUtils.deserializeDeltas#371}
> if(dmd.getStmtIds().isEmpty()) {
> {code}
> The older ACID data format (pre-Hive 1.3.0) which does not include the 
> statement ID, and code written for that format should still be supported. 
> Therefore the above condition should also include a null check or 
> alternatively {{AcidInputFormat.DeltaMetaData}} should never return null, and 
> return an empty list in this specific scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-10-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967538#comment-14967538
 ] 

Ashutosh Chauhan commented on HIVE-12184:
-

Duplicate of HIVE-11241 & HIVE-11261 ? As discussed on HIVE-11241 our current 
grammar is ambiguous. We should rather fix that ambiguity then trying to guess 
what user intended in his script. cc: [~xuefuz]

> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Attachments: HIVE-12184.2.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12223) Filter on Grouping__ID does not work properly

2015-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-12223:
---
Description: 
Consider the following query:

{noformat}
SELECT key, value, GROUPING__ID, count(*)
FROM T1
GROUP BY key, value
GROUPING SETS ((), (key))
HAVING GROUPING__ID = 1
{noformat}

This query will not return results. The reason is that a "constant" placeholder 
is introduced by SemanticAnalyzer for the GROUPING\__ID column. At execution 
time, this placeholder is replaced by the actual value of the GROUPING\__ID. As 
it is a constant, the Hive optimizer will evaluate statically whether the 
condition is met or not, leading to incorrect results. A possible solution is 
to transform the placeholder constant into a function over the grouping keys.

  was:
Consider the following query:

{noformat}
SELECT key, value, GROUPING__ID, count(*)
FROM T1
GROUP BY key, value
GROUPING SETS ((), (key))
HAVING GROUPING__ID = 1
{noformat}

This query will not return results. The reason is that a "constant" placeholder 
is introduced by SemanticAnalyzer for the GROUPING__ID column. At execution 
time, this placeholder is replaced by the actual value of the GROUPING__ID. As 
it is a constant, the Hive optimizer will evaluate statically whether the 
condition is met or not, leading to incorrect results. A possible solution is 
to transform the placeholder constant into a function over the grouping keys.


> Filter on Grouping__ID does not work properly
> -
>
> Key: HIVE-12223
> URL: https://issues.apache.org/jira/browse/HIVE-12223
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0
>
>
> Consider the following query:
> {noformat}
> SELECT key, value, GROUPING__ID, count(*)
> FROM T1
> GROUP BY key, value
> GROUPING SETS ((), (key))
> HAVING GROUPING__ID = 1
> {noformat}
> This query will not return results. The reason is that a "constant" 
> placeholder is introduced by SemanticAnalyzer for the GROUPING\__ID column. 
> At execution time, this placeholder is replaced by the actual value of the 
> GROUPING\__ID. As it is a constant, the Hive optimizer will evaluate 
> statically whether the condition is met or not, leading to incorrect results. 
> A possible solution is to transform the placeholder constant into a function 
> over the grouping keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967554#comment-14967554
 ] 

Xuefu Zhang commented on HIVE-12184:


Yes. It does sounds like a dupe. The real problem is "A.B" can either mean 
"DB.TABLE" or "TABLE.COLUMN" and there is no way to differentiate it.

The fix is to change the grammar such that we don't have such an ambiguity.

> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Attachments: HIVE-12184.2.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11591) upgrade thrift to 0.9.3 and change generation to use undated annotations

2015-10-21 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967604#comment-14967604
 ] 

Alan Gates commented on HIVE-11591:
---

+1

> upgrade thrift to 0.9.3 and change generation to use undated annotations
> 
>
> Key: HIVE-11591
> URL: https://issues.apache.org/jira/browse/HIVE-11591
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11591.WIP.patch, HIVE-11591.nogen.patch, 
> HIVE-11591.patch
>
>
> Thrift has added class annotations to generated classes; these contain 
> generation date. Because of this, all the Java thrift files change on every 
> re-gen, even if you only make a small change that should not affect bazillion 
> files. We should use undated annotations to avoid this problem.
> This depends on upgrading to Thrift 0.9.3, -which doesn't exist yet-.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12061) add file type support to file metadata by expr call

2015-10-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967660#comment-14967660
 ] 

Hive QA commented on HIVE-12061:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12767824/HIVE-12061.02.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5724/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5724/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5724/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseConnection.java:[68,13]
 error: cannot find symbol
[ERROR] interface HBaseConnection
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseConnection.java:[84,2]
 error: cannot find symbol
[ERROR] interface HBaseConnection
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseConnection.java:[94,2]
 error: cannot find symbol
[ERROR] interface HBaseConnection
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[29,30]
 error: package org.apache.hadoop.hbase does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[30,30]
 error: package org.apache.hadoop.hbase does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[31,30]
 error: package org.apache.hadoop.hbase does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[32,37]
 error: package org.apache.hadoop.hbase.client does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[33,37]
 error: package org.apache.hadoop.hbase.client does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[34,37]
 error: package org.apache.hadoop.hbase.client does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[35,37]
 error: package org.apache.hadoop.hbase.client does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[36,37]
 error: package org.apache.hadoop.hbase.client does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[37,37]
 error: package org.apache.hadoop.hbase.client does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[38,37]
 error: package org.apache.hadoop.hbase.client does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[39,37]
 error: package org.apache.hadoop.hbase.client does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[40,37]
 error: package org.apache.hadoop.hbase.filter does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[41,37]
 error: package org.apache.hadoop.hbase.filter does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[42,37]
 error: package org.apache.hadoop.hbase.filter does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[43,37]
 error: package org.apache.hadoop.hbase.filter does not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseReadWrite.java:[44,62]
 error: package org.apache.hadoop.hbase.protobuf.generated.ClientProtos does 
not exist
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/hbase/PartitionKeyComparator.java:[30,37]

[jira] [Commented] (HIVE-11985) don't store type names in metastore when metastore type names are not used

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968510#comment-14968510
 ] 

Xuefu Zhang commented on HIVE-11985:


Let me try to summarize what we get from this: we error out in case of long 
type names (>2000c) with this patch, while previously we just store whatever 
the user gives, which may end up with truncation. In previous case, user may be 
able to avoid the truncation by manipulating DB configuration, but now w/ this 
patch, such manipulation will not work any more.

Correct me if I'm off.

> don't store type names in metastore when metastore type names are not used
> --
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.02.patch, 
> HIVE-11985.03.patch, HIVE-11985.05.patch, HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12170) normalize HBase metastore connection configuration

2015-10-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967763#comment-14967763
 ] 

Sergey Shelukhin commented on HIVE-12170:
-

+1

> normalize HBase metastore connection configuration
> --
>
> Key: HIVE-12170
> URL: https://issues.apache.org/jira/browse/HIVE-12170
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Alan Gates
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HIVE-12170.2.patch, HIVE-12170.patch
>
>
> Right now there are two ways to get HBaseReadWrite instance in metastore. 
> Both get a threadlocal instance (is there a good reason for that?).
> 1) One is w/o conf and only works if someone called the (2) before, from any 
> thread.
> 2) The other blindly sets a static conf and then gets an instance with that 
> conf, or if someone already happened to call (1) or (2) from this thread, it 
> returns the existing instance with whatever conf was set before (but still 
> resets the current conf to new conf).
> This doesn't make sense even in an already-thread-safe case (like linear 
> CLI-based tests), and can easily lead to bugs as described; the config 
> propagation logic is not good (example - HIVE-12167); some calls just reset 
> config blindly, so there's no point in setting staticConf, other than for the 
> callers of method (1) above who don't have a conf and would rely on the 
> static (which is bad design).
> Having connections with different configs reliably in not possible, and 
> multi-threaded cases would also break - you could even set conf, have it 
> reset and get instance with somebody else's conf. 
> Static should definitely be removed, maybe threadlocal too (HConnection is 
> thread-safe).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills

2015-10-21 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967779#comment-14967779
 ] 

Gopal V commented on HIVE-11306:


[~wzheng]: LGTM - +1.

> Add a bloom-1 filter for Hybrid MapJoin spills
> --
>
> Key: HIVE-11306
> URL: https://issues.apache.org/jira/browse/HIVE-11306
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Wei Zheng
> Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, 
> HIVE-11306.3.patch, HIVE-11306.5.patch, HIVE-11306.6.patch
>
>
> HIVE-9277 implemented Spillable joins for Tez, which suffers from a 
> corner-case performance issue when joining wide small tables against a narrow 
> big table (like a user info table join events stream).
> The fact that the wide table is spilled causes extra IO, even though the nDV 
> of the join key might be in the thousands.
> A cheap bloom-1 filter would add a massive performance gain for such queries, 
> massively cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12201) Tez settings need to be shown in set -v output when execution engine is tez.

2015-10-21 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-12201:
--
Attachment: HIVE-12201.3.patch

Fix failing tests.

> Tez settings need to be shown in set -v output when execution engine is tez.
> 
>
> Key: HIVE-12201
> URL: https://issues.apache.org/jira/browse/HIVE-12201
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.0.1, 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>Priority: Minor
> Attachments: HIVE-12201.1.patch, HIVE-12201.2.patch, 
> HIVE-12201.3.patch
>
>
> The set -v output currently shows configurations for yarn, hdfs etc. but does 
> not show tez settings when tez is set as the execution engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967804#comment-14967804
 ] 

Ashutosh Chauhan commented on HIVE-12189:
-

[~jcamachorodriguez] This one might interest you.

> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow 
> very large
> 
>
> Key: HIVE-12189
> URL: https://issues.apache.org/jira/browse/HIVE-12189
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf 
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
> join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
> a2.hdp_databaseid = nf.hdp_databaseid) 
> join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
> join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
> a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
> join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
> a5.hdp_databaseid = nf.hdp_databaseid) 
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid 
> = nf.hdp_databaseid) 
> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid 
> = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the 
> tables are empty table. If the tables are not empty, it is reported that the 
> compile so slow that it looks like hive is hanging. 
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
> still a lot of time. One of the problem slows ppd down is that list in 
> pushdownPreds can grow very large which makes extractPushdownPreds bad 
> performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
> Operator op, List preds)
> {noformat}
> During run the query above, in the following break point preds  has size of 
> 12051, and most entry of the list is: 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> Following code in extractPushdownPreds will clone all the nodes in preds and 
> do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes 
> startWalking much faster, but we still clone thousands of nodes with same 
> expression. Should we store so many same predicates in the list or just one 
> is good enough?  
> {noformat}
> List startNodes = new ArrayList();
> List clonedPreds = new ArrayList();
> for (ExprNodeDesc node : preds) {
>   ExprNodeDesc clone = node.clone();
>   clonedPreds.add(clone);
>   exprContext.getNewToOldExprMap().put(clone, node);
> }
> startNodes.addAll(clonedPreds);
> egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method 
> public void addFinalCandidate(String alias, ExprNodeDesc expr) 
> and
> public void addPushDowns(String alias, List pushDowns) 
> to only add expr which is not in the PushDown list for an alias?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12225) LineageCtx should release all resources at clear

2015-10-21 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-12225:
---
Attachment: HIVE-12225.1.patch

> LineageCtx should release all resources at clear
> 
>
> Key: HIVE-12225
> URL: https://issues.apache.org/jira/browse/HIVE-12225
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12225.1.patch
>
>
> Somce maps are not released in clear() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12084) Hive queries with ORDER BY and large LIMIT fails with OutOfMemoryError Java heap space

2015-10-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-12084:
-
Attachment: HIVE-12084.4.patch

[~jpullokkaran] Can you please review patch#4.

> Hive queries with ORDER BY and large LIMIT fails with OutOfMemoryError Java 
> heap space
> --
>
> Key: HIVE-12084
> URL: https://issues.apache.org/jira/browse/HIVE-12084
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12084.1.patch, HIVE-12084.2.patch, 
> HIVE-12084.3.patch, HIVE-12084.4.patch
>
>
> STEPS TO REPRODUCE:
> {code}
> CREATE TABLE `sample_07` ( `code` string , `description` string , `total_emp` 
> int , `salary` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS 
> TextFile;
> load data local inpath 'sample_07.csv'  into table sample_07;
> set hive.limit.pushdown.memory.usage=0.;
> select * from sample_07 order by salary LIMIT 9;
> {code}
> This will result in 
> {code}
> Caused by: java.lang.OutOfMemoryError: Java heap space
>   at org.apache.hadoop.hive.ql.exec.TopNHash.initialize(TopNHash.java:113)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initializeOp(ReduceSinkOperator.java:234)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:68)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
> {code}
> The basic issue lies with top n optimization. We need a limit for the top n 
> optimization. Ideally we would detect that the allocated bytes will be bigger 
> than the "limit.pushdown.memory.usage" without trying to alloc it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967665#comment-14967665
 ] 

Xuefu Zhang commented on HIVE-12184:


Since currently Hive assumes that "A.B" represents "TABLE.COLUMN" in this case, 
we should stick to it w/o any further complicating it. This grammar can be 
deprecated, being replaced with "[DB.]TABLE[ COLUMN]" syntax.

> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Attachments: HIVE-12184.2.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-21 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967713#comment-14967713
 ] 

Chao Sun commented on HIVE-12189:
-

Sorry, didn't see this. I'll take a look at the patch today.

> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow 
> very large
> 
>
> Key: HIVE-12189
> URL: https://issues.apache.org/jira/browse/HIVE-12189
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf 
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
> join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
> a2.hdp_databaseid = nf.hdp_databaseid) 
> join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
> join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
> a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
> join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
> a5.hdp_databaseid = nf.hdp_databaseid) 
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid 
> = nf.hdp_databaseid) 
> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid 
> = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the 
> tables are empty table. If the tables are not empty, it is reported that the 
> compile so slow that it looks like hive is hanging. 
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
> still a lot of time. One of the problem slows ppd down is that list in 
> pushdownPreds can grow very large which makes extractPushdownPreds bad 
> performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
> Operator op, List preds)
> {noformat}
> During run the query above, in the following break point preds  has size of 
> 12051, and most entry of the list is: 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> Following code in extractPushdownPreds will clone all the nodes in preds and 
> do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes 
> startWalking much faster, but we still clone thousands of nodes with same 
> expression. Should we store so many same predicates in the list or just one 
> is good enough?  
> {noformat}
> List startNodes = new ArrayList();
> List clonedPreds = new ArrayList();
> for (ExprNodeDesc node : preds) {
>   ExprNodeDesc clone = node.clone();
>   clonedPreds.add(clone);
>   exprContext.getNewToOldExprMap().put(clone, node);
> }
> startNodes.addAll(clonedPreds);
> egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method 
> public void addFinalCandidate(String alias, ExprNodeDesc expr) 
> and
> public void addPushDowns(String alias, List pushDowns) 
> to only add expr which is not in the PushDown list for an alias?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HIVE-12171) LLAP: BuddyAllocator failures when querying uncompressed data

2015-10-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reopened HIVE-12171:
-

Apparently it happens even in unreasonable cases

> LLAP: BuddyAllocator failures when querying uncompressed data
> -
>
> Key: HIVE-12171
> URL: https://issues.apache.org/jira/browse/HIVE-12171
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>
> {code}
> hive> select sum(l_extendedprice * l_discount) as revenue from 
> testing.lineitem where l_shipdate >= '1993-01-01' and l_shipdate < 
> '1994-01-01' ;
> Caused by: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 492; at 0 out of 1
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:176)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.preReadUncompressedStream(EncodedReaderImpl.java:882)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:319)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:413)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:194)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:191)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:191)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:74)
> at 
> org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12187) Release plan once a query is executed

2015-10-21 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967791#comment-14967791
 ] 

Alan Gates commented on HIVE-12187:
---

The change looks good to me from a locks perspective.  [~ekoifman] may want to 
take a look too. 

> Release plan once a query is executed 
> --
>
> Key: HIVE-12187
> URL: https://issues.apache.org/jira/browse/HIVE-12187
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12187.1.patch, HIVE-12187.2.patch
>
>
> Some clients leave query operations open for a while so that they can 
> retrieve the query results later. That means the allocated memory will be 
> kept around too. We should release those resources not needed for query 
> execution any more once it is executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12157) Support unicode for table/column names

2015-10-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12157:
---
Summary:  Support unicode for table/column names  (was: select-clause 
doesn't support unicode alias)

>  Support unicode for table/column names
> ---
>
> Key: HIVE-12157
> URL: https://issues.apache.org/jira/browse/HIVE-12157
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 1.2.1
>Reporter: richard du
>Assignee: richard du
>Priority: Minor
> Attachments: HIVE-12157.01.patch, HIVE-12157.patch
>
>
> Parser will throw exception when I use alias:
> hive> desc test;
> OK
> a   int 
> b   string  
> Time taken: 0.135 seconds, Fetched: 2 row(s)
> hive> select a as 行1 from test limit 10;
> NoViableAltException(302@[134:7: ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN 
> identifier ( COMMA identifier )* RPAREN ) )?])
> at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
> at org.antlr.runtime.DFA.predict(DFA.java:116)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2915)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1373)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:45827)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:41495)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:41402)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40413)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40283)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1590)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:396)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 1:13 cannot recognize input near 'as' '1' 'from' 
> in selection target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11807) Set ORC buffer size in relation to set stripe size

2015-10-21 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967823#comment-14967823
 ] 

Prasanth Jayachandran commented on HIVE-11807:
--

[~owen.omalley] Can you please rebase your patch to trunk?

> Set ORC buffer size in relation to set stripe size
> --
>
> Key: HIVE-11807
> URL: https://issues.apache.org/jira/browse/HIVE-11807
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11807.patch, HIVE-11807.patch
>
>
> A customer produced ORC files with very small stripe sizes (10k rows/stripe) 
> by setting a small 64MB stripe size and 256K buffer size for a 54 column 
> table. At that size, each of the streams only get a buffer or two before the 
> stripe size is reached. The current code uses the available memory instead of 
> the stripe size and thus doesn't shrink the buffer size if the JVM has much 
> more memory than the stripe size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12220) LLAP: Usability issues with hive.llap.io.cache.orc.size

2015-10-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12220:

Attachment: HIVE-12220.patch

The patch with some backward compat so we don't break the scripts and tools

> LLAP: Usability issues with hive.llap.io.cache.orc.size
> ---
>
> Key: HIVE-12220
> URL: https://issues.apache.org/jira/browse/HIVE-12220
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: llap
>Reporter: Carter Shanklin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12220.patch
>
>
> In the llap-daemon site you need to set, among other things,
> llap.daemon.memory.per.instance.mb
> and
> hive.llap.io.cache.orc.size
> The use of hive.llap.io.cache.orc.size caused me some unnecessary problems, 
> initially I entered the value in MB rather than in bytes. Operator error you 
> could say but I look at this as a fraction of the other value which is in mb.
> Second, is this really tied to ORC? E.g. when we have the vectorized text 
> reader will this data be cached as well? Or might it be in the future?
> I would like to propose instead using hive.llap.io.cache.size.mb for this 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password

2015-10-21 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967775#comment-14967775
 ] 

Sushanth Sowmyan commented on HIVE-9013:


Hi [~decster], please let me know if you're planning on updating this jira per 
[~thejas]'s suggestions above - if you don't mind, I can help update this patch 
to get it in. I think this will be a very useful patch to have in.

Thanks!

> Hive set command exposes metastore db password
> --
>
> Key: HIVE-9013
> URL: https://issues.apache.org/jira/browse/HIVE-9013
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch
>
>
> When auth is enabled, we still need set command to set some variables(e.g. 
> mapreduce.job.queuename), but set command alone also list all 
> information(including vars in restrict list), this exposes like 
> "javax.jdo.option.ConnectionPassword"
> I think conf var in the restrict list should also excluded from dump vars 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12221) Concurrency issue in HCatUtil.getHiveMetastoreClient()

2015-10-21 Thread Thiruvel Thirumoolan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967869#comment-14967869
 ] 

Thiruvel Thirumoolan commented on HIVE-12221:
-

Good catch, thanks for raising this!

May I know the problem you ran into? I guess its multiple threads trying to get 
objects before the cache initialization was complete?

> Concurrency issue in HCatUtil.getHiveMetastoreClient() 
> ---
>
> Key: HIVE-12221
> URL: https://issues.apache.org/jira/browse/HIVE-12221
> Project: Hive
>  Issue Type: Bug
>Reporter: Roshan Naik
>
> HCatUtil.getHiveMetastoreClient()  uses double checked locking pattern
> to implement singleton, which is a broken pattern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11216) UDF GenericUDFMapKeys throws NPE when a null map value is passed in

2015-10-21 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-11216:
--
Fix Version/s: 1.0.2

> UDF GenericUDFMapKeys throws NPE when a null map value is passed in
> ---
>
> Key: HIVE-11216
> URL: https://issues.apache.org/jira/browse/HIVE-11216
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.2.0
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Fix For: 1.3.0, 2.0.0, 1.0.2
>
> Attachments: HIVE-11216.1.patch, HIVE-11216.patch
>
>
> We can reproduce the problem as below:
> {noformat}
> hive> show create table map_txt;
> OK
> CREATE  TABLE `map_txt`(
>   `id` int,
>   `content` map)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> ...
> Time taken: 0.233 seconds, Fetched: 18 row(s)
> hive> select * from map_txt;
> OK
> 1   NULL
> Time taken: 0.679 seconds, Fetched: 1 row(s)
> hive> select id, map_keys(content) from map_txt;
> 
> Error during job, obtaining debugging information...
> Examining task ID: task_1435534231122_0025_m_00 (and more) from job 
> job_1435534231122_0025
> Task with the most failures(4):
> -
> Task ID:
>   task_1435534231122_0025_m_00
> URL:
>   
> http://host-10-17-80-40.coe.cloudera.com:8088/taskdetails.jsp?jobid=job_1435534231122_0025=task_1435534231122_0025_m_00
> -
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"id":1,"content":null}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"id":1,"content":null}
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:559)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating 
> map_keys(content)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
> ... 9 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFMapKeys.evaluate(GenericUDFMapKeys.java:64)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79)
> ... 13 more
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
> hive>
> {noformat}
> The error is as below (in mappers):
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFMapKeys.evaluate(GenericUDFMapKeys.java:64)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> at 
> org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.getNewKey(KeyWrapperFactory.java:113)
> at 
>

[jira] [Commented] (HIVE-12225) LineageCtx should release all resources at clear

2015-10-21 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968015#comment-14968015
 ] 

Szehon Ho commented on HIVE-12225:
--

Simple, +1

> LineageCtx should release all resources at clear
> 
>
> Key: HIVE-12225
> URL: https://issues.apache.org/jira/browse/HIVE-12225
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12225.1.patch
>
>
> Somce maps are not released in clear() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12221) Concurrency issue in HCatUtil.getHiveMetastoreClient()

2015-10-21 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967832#comment-14967832
 ] 

Sushanth Sowmyan commented on HIVE-12221:
-

Per Roshan's mail to me, adding in a reference : 
https://en.wikipedia.org/wiki/Double-checked_locking#Usage_in_Java

> Concurrency issue in HCatUtil.getHiveMetastoreClient() 
> ---
>
> Key: HIVE-12221
> URL: https://issues.apache.org/jira/browse/HIVE-12221
> Project: Hive
>  Issue Type: Bug
>Reporter: Roshan Naik
>
> HCatUtil.getHiveMetastoreClient()  uses double checked locking pattern
> to implement singleton, which is a broken pattern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12223) Filter on Grouping__ID does not work properly

2015-10-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967851#comment-14967851
 ] 

Hive QA commented on HIVE-12223:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12767828/HIVE-12223.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 9684 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_cube1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_window
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_grouping_operators
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_grouping_sets
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_grouping_sets
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_cube1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_rollup1
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5725/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5725/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5725/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12767828 - PreCommit-HIVE-TRUNK-Build

> Filter on Grouping__ID does not work properly
> -
>
> Key: HIVE-12223
> URL: https://issues.apache.org/jira/browse/HIVE-12223
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12223.patch
>
>
> Consider the following query:
> {noformat}
> SELECT key, value, GROUPING__ID, count(*)
> FROM T1
> GROUP BY key, value
> GROUPING SETS ((), (key))
> HAVING GROUPING__ID = 1
> {noformat}
> This query will not return results. The reason is that a "constant" 
> placeholder is introduced by SemanticAnalyzer for the GROUPING\__ID column. 
> At execution time, this placeholder is replaced by the actual value of the 
> GROUPING\__ID. As it is a constant, the Hive optimizer will evaluate 
> statically whether the condition is met or not, leading to incorrect results. 
> A possible solution is to transform the placeholder constant into a function 
> over the grouping keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12061) add file type support to file metadata by expr call

2015-10-21 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967862#comment-14967862
 ] 

Alan Gates commented on HIVE-12061:
---

I definitely like decoupling this from ORC.  The only concern I have is that I 
never intended HBaseReadWrite to be available outside the package, which this 
forces it to be (I think it already was for some tool or another, but that's 
only because Java's packaging is borked).  But I don't see a way around it, so, 
oh well.

+1

> add file type support to file metadata by expr call
> ---
>
> Key: HIVE-12061
> URL: https://issues.apache.org/jira/browse/HIVE-12061
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12061.01.nogen.patch, HIVE-12061.01.patch, 
> HIVE-12061.02.patch, HIVE-12061.nogen.patch, HIVE-12061.patch
>
>
> Expr filtering, automatic caching, etc. should be aware of file types for 
> advanced features. For now only ORC is supported, but I want to add a 
> boundary between ORC-specific and general metastore code, that could later be 
> used for other formats if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-12222) Define port range in property for RPCServer

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967996#comment-14967996
 ] 

Xuefu Zhang edited comment on HIVE-1 at 10/21/15 9:38 PM:
--

[~alee526], are you interested in contributing on this?


was (Author: xuefuz):
[~alee526], are you going to contribute on this?

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11807) Set ORC buffer size in relation to set stripe size

2015-10-21 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967878#comment-14967878
 ] 

Prasanth Jayachandran commented on HIVE-11807:
--

Also patch for branch-1

> Set ORC buffer size in relation to set stripe size
> --
>
> Key: HIVE-11807
> URL: https://issues.apache.org/jira/browse/HIVE-11807
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11807.patch, HIVE-11807.patch
>
>
> A customer produced ORC files with very small stripe sizes (10k rows/stripe) 
> by setting a small 64MB stripe size and 256K buffer size for a 54 column 
> table. At that size, each of the streams only get a buffer or two before the 
> stripe size is reached. The current code uses the available memory instead of 
> the stripe size and thus doesn't shrink the buffer size if the JVM has much 
> more memory than the stripe size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12224) Remove HOLD_DDLTIME

2015-10-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12224:

Attachment: HIVE-12224.patch

> Remove HOLD_DDLTIME
> ---
>
> Key: HIVE-12224
> URL: https://issues.apache.org/jira/browse/HIVE-12224
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12224.patch
>
>
> This arcane feature was introduced long ago via HIVE-1394 It was broken as 
> soon as it landed, HIVE-1442 and is thus useless. Fact that no one has fixed 
> it since informs that its not really used by anyone. Better is to remove it 
> so no one hits the bug of HIVE-1442



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967996#comment-14967996
 ] 

Xuefu Zhang commented on HIVE-1:


[~alee526], are you going to contribute on this?

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11216) UDF GenericUDFMapKeys throws NPE when a null map value is passed in

2015-10-21 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-11216:
--
Fix Version/s: 1.2.2

> UDF GenericUDFMapKeys throws NPE when a null map value is passed in
> ---
>
> Key: HIVE-11216
> URL: https://issues.apache.org/jira/browse/HIVE-11216
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.2.0
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Fix For: 1.3.0, 2.0.0, 1.0.2, 1.2.2
>
> Attachments: HIVE-11216.1.patch, HIVE-11216.patch
>
>
> We can reproduce the problem as below:
> {noformat}
> hive> show create table map_txt;
> OK
> CREATE  TABLE `map_txt`(
>   `id` int,
>   `content` map)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> ...
> Time taken: 0.233 seconds, Fetched: 18 row(s)
> hive> select * from map_txt;
> OK
> 1   NULL
> Time taken: 0.679 seconds, Fetched: 1 row(s)
> hive> select id, map_keys(content) from map_txt;
> 
> Error during job, obtaining debugging information...
> Examining task ID: task_1435534231122_0025_m_00 (and more) from job 
> job_1435534231122_0025
> Task with the most failures(4):
> -
> Task ID:
>   task_1435534231122_0025_m_00
> URL:
>   
> http://host-10-17-80-40.coe.cloudera.com:8088/taskdetails.jsp?jobid=job_1435534231122_0025=task_1435534231122_0025_m_00
> -
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"id":1,"content":null}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"id":1,"content":null}
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:559)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating 
> map_keys(content)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
> ... 9 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFMapKeys.evaluate(GenericUDFMapKeys.java:64)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79)
> ... 13 more
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
> hive>
> {noformat}
> The error is as below (in mappers):
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFMapKeys.evaluate(GenericUDFMapKeys.java:64)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> at 
> org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.getNewKey(KeyWrapperFactory.java:113)
> at 
>

[jira] [Commented] (HIVE-12170) normalize HBase metastore connection configuration

2015-10-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968077#comment-14968077
 ] 

Hive QA commented on HIVE-12170:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12767843/HIVE-12170.2.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9697 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5726/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5726/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5726/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12767843 - PreCommit-HIVE-TRUNK-Build

> normalize HBase metastore connection configuration
> --
>
> Key: HIVE-12170
> URL: https://issues.apache.org/jira/browse/HIVE-12170
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Alan Gates
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HIVE-12170.2.patch, HIVE-12170.patch
>
>
> Right now there are two ways to get HBaseReadWrite instance in metastore. 
> Both get a threadlocal instance (is there a good reason for that?).
> 1) One is w/o conf and only works if someone called the (2) before, from any 
> thread.
> 2) The other blindly sets a static conf and then gets an instance with that 
> conf, or if someone already happened to call (1) or (2) from this thread, it 
> returns the existing instance with whatever conf was set before (but still 
> resets the current conf to new conf).
> This doesn't make sense even in an already-thread-safe case (like linear 
> CLI-based tests), and can easily lead to bugs as described; the config 
> propagation logic is not good (example - HIVE-12167); some calls just reset 
> config blindly, so there's no point in setting staticConf, other than for the 
> callers of method (1) above who don't have a conf and would rely on the 
> static (which is bad design).
> Having connections with different configs reliably in not possible, and 
> multi-threaded cases would also break - you could even set conf, have it 
> reset and get instance with somebody else's conf. 
> Static should definitely be removed, maybe threadlocal too (HConnection is 
> thread-safe).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968082#comment-14968082
 ] 

Xuefu Zhang commented on HIVE-12063:


Thanks, Szehon. Please note, this is actually not that far from my original 
thought in HIVE-7373. My point there was that we shouldn't append zeros or 
trimming trailing zeros. The patch here doesn't append zeros internally, but 
mainly formatting output according to the output schema. (HIVE-7373 failed in 
this because it changed the internal representation.) This is in line with 
other DBs, though I'm not aware of any SQL standard on this. Yes, I said that 
the practice of outputting with appending zeros was questionable, but it makes 
sense in Hive's case as Hive aggressively trims 0.0, 0.00, 0.00 etc all the 
way to 0, which is too confusing.

BTW, all vectorization tests passed. [~jdere] or [~hagleitn], please review and 
comment. Thanks.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11985) don't store type names in metastore when metastore type names are not used

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968111#comment-14968111
 ] 

Xuefu Zhang commented on HIVE-11985:


[~sershe], could you explained a little on your new approach. I cannot follow 
through the patch to the extent of full understanding. It would be nice if a RB 
entry can be provided as the changes have become non-trial.

> don't store type names in metastore when metastore type names are not used
> --
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.02.patch, 
> HIVE-11985.03.patch, HIVE-11985.05.patch, HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12218) Unable to create a like table for an hbase backed table

2015-10-21 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968025#comment-14968025
 ] 

Chaoyu Tang commented on HIVE-12218:


Thanks [~xuefuz] for reviewing and committing the patch.

> Unable to create a like table for an hbase backed table
> ---
>
> Key: HIVE-12218
> URL: https://issues.apache.org/jira/browse/HIVE-12218
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 2.0.0
>
> Attachments: HIVE-12218.patch
>
>
> For an HBase backed table:
> {code}
> CREATE TABLE hbasetbl (key string, state string, country string, country_id 
> int)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = "info:state,info:country,info:country_id"
> );
> {code}
> Create its like table using query such as 
> create table hbasetbl_like like hbasetbl;
> It fails with error:
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> org.apache.hadoop.hive.ql.metadata.HiveException: must specify an InputFormat 
> class



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-21 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968041#comment-14968041
 ] 

Szehon Ho commented on HIVE-12063:
--

Looks like it goes against your original thought in HIVE-7373, but makes sense 
to me.  I assume there's no sql-standard rule about this, and its just db 
implementation detail?

I dont know too much about the vectorization part, you might also want to check 
with some of those authors.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12187) Release plan once a query is executed

2015-10-21 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968114#comment-14968114
 ] 

Jimmy Xiang commented on HIVE-12187:


The change to the locks is to avoid some lock leakage actually. So my question 
is actually is to confirm that the leakage is not intentional.

The lock releasing behavior is not changed at all.  When a new query is 
executed, a new Context object is created. At this moment, without the patch, 
if the locks in the old Context are not released, we can not access them any 
more from the Context. With the patch, we save the locks, if any, in a new 
list, so that we can release them later.

> Release plan once a query is executed 
> --
>
> Key: HIVE-12187
> URL: https://issues.apache.org/jira/browse/HIVE-12187
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12187.1.patch, HIVE-12187.2.patch
>
>
> Some clients leave query operations open for a while so that they can 
> retrieve the query results later. That means the allocated memory will be 
> kept around too. We should release those resources not needed for query 
> execution any more once it is executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12228) Hive Error When query nested query with UDF returns Struct type

2015-10-21 Thread Wenlei Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenlei Xie updated HIVE-12228:
--
Description: 
The following simple nested query with UDF returns Struct would fail on Hive 
0.13.1 . The UDF java code is attached as {{SimpleStruct.java}}

{noformat}
ADD JAR simplestruct.jar;
CREATE TEMPORARY FUNCTION simplestruct AS 'test.SimpleStruct';

SELECT *
  FROM (
SELECT *
from mytest
 ) subquery
WHERE simplestruct(subquery.testStr).first
{noformat}

The error message is 
{noformat}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {"testint":1,"testname":"haha","teststr":"hehe"}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more
Caused by: java.lang.RuntimeException: cannot find field teststr from [0:_col0, 
1:_col1, 2:_col2]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
..
{noformat}

The query works fine if we replace the UDF returns Boolean. By comparing the 
query plan, we note when using the {{SimpleStruct}} UDF, the query plan is 
{noformat}
  TableScan
Select Operator
  Filter Operator
Select Operator
{noformat}
The first Select Operator would rename the columns to {{col_k}}, which cause 
this trouble. If we use some UDF returns Boolean, the query plan becomes 
{noformat}
  TableScan
Filter Operator
  Select Operator
{noformat}

It looks like the Query Planner failed to push down the Filter Operator when 
the predicate is based on a UDF returns Struct. 

This bug was fixed in Hive 1.2.1, but we cannot find the ticket to fix it.


Appendix: 
The table {{mytest}} is created in the following way
{noformat}
CREATE TABLE mytest(testInt INT, testName STRING, testStr STRING) ROW FORMAT 
DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE mytest;
{noformat}
The file {{test.txt}} is a simple CSV file.
{noformat}
1,haha,hehe
2,my,test
{noformat}

  was:
The following simple nested query with UDF returns Struct would fail on Hive 
0.13.1 . The UDF java code is attached.

{noformat}
ADD JAR simplestruct.jar;
CREATE TEMPORARY FUNCTION simplestruct AS 'test.SimpleStruct';

SELECT *
  FROM (
SELECT *
from mytest
 ) subquery
WHERE simplestruct(subquery.testStr).first
{noformat}

The error message is 
{noformat}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {"testint":1,"testname":"haha","teststr":"hehe"}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more
Caused by: java.lang.RuntimeException: cannot find field teststr from [0:_col0, 
1:_col1, 2:_col2]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
..
{noformat}

The query works fine if we replace the UDF returns Boolean. By comparing the 
query plan, we note when using the {{SimpleStruct}} UDF, the query plan is 
{noformat}
  TableScan
Select Operator
  Filter Operator
Select Operator
{noformat}
The first Select Operator would rename the columns to {{col_k}}, which cause 
this trouble. If we use some UDF returns Boolean, the query plan becomes 
{noformat}
  TableScan
Filter Operator
  Select Operator
{noformat}

It looks like the Query Planner failed to push down the Filter Operator when 
the predicate is based on a UDF returns Struct. 

This bug was fixed in Hive 1.2.1, but we cannot find the ticket to fix it.


Appendix: 
The table {{mytest}} is created in the following way
{noformat}
CREATE TABLE mytest(testInt INT, testName STRING, testStr STRING) ROW FORMAT 
DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE mytest;
{noformat}
The file {{test.txt}} is a simple CSV file.
{noformat}
1,haha,hehe
2,my,test
{noformat}


> Hive Error When query nested query with UDF returns Struct type
> ---
>
> Key: HIVE-12228
> URL: https://issues.apache.org/jira/browse/HIVE-12228
> Project: Hive
>  Issue Type:

[jira] [Commented] (HIVE-12187) Release plan once a query is executed

2015-10-21 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968088#comment-14968088
 ] 

Eugene Koifman commented on HIVE-12187:
---

[~jxiang], could you elaborate a little on what the intent is?
It seems like it will release locks before the cursor that this query produced 
has been consumed/closed.  That is contrary to how most DB access layers 
behave.  

> Release plan once a query is executed 
> --
>
> Key: HIVE-12187
> URL: https://issues.apache.org/jira/browse/HIVE-12187
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12187.1.patch, HIVE-12187.2.patch
>
>
> Some clients leave query operations open for a while so that they can 
> retrieve the query results later. That means the allocated memory will be 
> kept around too. We should release those resources not needed for query 
> execution any more once it is executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11985) don't store type names in metastore when metastore type names are not used

2015-10-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968117#comment-14968117
 ] 

Sergey Shelukhin commented on HIVE-11985:
-

RB is available at https://reviews.apache.org/r/38862/.
The recent changes only fix the corner case like alter table add serde, 
otherwise approach is the same; when the serde is deserializer-based for 
schema, it doesn't store columns in Metastore. For Avro, based on how table is 
set up, both storing and not storing the schema is supported.

> don't store type names in metastore when metastore type names are not used
> --
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.02.patch, 
> HIVE-11985.03.patch, HIVE-11985.05.patch, HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12228) Hive Error When query nested query with UDF returns Struct type

2015-10-21 Thread Wenlei Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenlei Xie updated HIVE-12228:
--
Attachment: SimpleStruct.java

The {{SimpleStruct}} UDF java code to reproduce the issue.

> Hive Error When query nested query with UDF returns Struct type
> ---
>
> Key: HIVE-12228
> URL: https://issues.apache.org/jira/browse/HIVE-12228
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Planning, UDF
>Affects Versions: 0.13.1
>Reporter: Wenlei Xie
> Attachments: SimpleStruct.java
>
>
> The following simple nested query with UDF returns Struct would fail on Hive 
> 0.13.1 . The UDF java code is attached.
> {noformat}
> ADD JAR simplestruct.jar;
> CREATE TEMPORARY FUNCTION simplestruct AS 'test.SimpleStruct';
> SELECT *
>   FROM (
> SELECT *
> from mytest
>  ) subquery
> WHERE simplestruct(subquery.testStr).first
> {noformat}
> The error message is 
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"testint":1,"testname":"haha","teststr":"hehe"}
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
> ... 8 more
> Caused by: java.lang.RuntimeException: cannot find field teststr from 
> [0:_col0, 1:_col1, 2:_col2]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
> ..
> {noformat}
> The query works fine if we replace the UDF returns Boolean. By comparing the 
> query plan, we note when using the {{SimpleStruct}} UDF, the query plan is 
> {noformat}
>   TableScan
> Select Operator
>   Filter Operator
> Select Operator
> {noformat}
> The first Select Operator would rename the columns to {{col_k}}, which cause 
> this trouble. If we use some UDF returns Boolean, the query plan becomes 
> {noformat}
>   TableScan
> Filter Operator
>   Select Operator
> {noformat}
> It looks like the Query Planner failed to push down the Filter Operator when 
> the predicate is based on a UDF returns Struct. 
> This bug was fixed in Hive 1.2.1, but we cannot find the ticket to fix it.
> Appendix: 
> The table {{mytest}} is created in the following way
> {noformat}
> CREATE TABLE mytest(testInt INT, testName STRING, testStr STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE mytest;
> {noformat}
> The file {{test.txt}} is a simple CSV file.
> {noformat}
> 1,haha,hehe
> 2,my,test
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12228) Hive Error for nested query with UDF returns Struct type

2015-10-21 Thread Wenlei Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenlei Xie updated HIVE-12228:
--
Summary: Hive Error for nested query with UDF returns Struct type  (was: 
Hive Error When query nested query with UDF returns Struct type)

> Hive Error for nested query with UDF returns Struct type
> 
>
> Key: HIVE-12228
> URL: https://issues.apache.org/jira/browse/HIVE-12228
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Planning, UDF
>Affects Versions: 0.13.1
>Reporter: Wenlei Xie
> Attachments: SimpleStruct.java
>
>
> The following simple nested query with UDF returns Struct would fail on Hive 
> 0.13.1 . The UDF java code is attached as {{SimpleStruct.java}}
> {noformat}
> ADD JAR simplestruct.jar;
> CREATE TEMPORARY FUNCTION simplestruct AS 'test.SimpleStruct';
> SELECT *
>   FROM (
> SELECT *
> from mytest
>  ) subquery
> WHERE simplestruct(subquery.testStr).first
> {noformat}
> The error message is 
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"testint":1,"testname":"haha","teststr":"hehe"}
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
> ... 8 more
> Caused by: java.lang.RuntimeException: cannot find field teststr from 
> [0:_col0, 1:_col1, 2:_col2]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
> ..
> {noformat}
> The query works fine if we replace the UDF returns Boolean. By comparing the 
> query plan, we note when using the {{SimpleStruct}} UDF, the query plan is 
> {noformat}
>   TableScan
> Select Operator
>   Filter Operator
> Select Operator
> {noformat}
> The first Select Operator would rename the columns to {{col_k}}, which cause 
> this trouble. If we use some UDF returns Boolean, the query plan becomes 
> {noformat}
>   TableScan
> Filter Operator
>   Select Operator
> {noformat}
> It looks like the Query Planner failed to push down the Filter Operator when 
> the predicate is based on a UDF returns Struct. 
> This bug was fixed in Hive 1.2.1, but we cannot find the ticket to fix it.
> Appendix: 
> The table {{mytest}} is created in the following way
> {noformat}
> CREATE TABLE mytest(testInt INT, testName STRING, testStr STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE mytest;
> {noformat}
> The file {{test.txt}} is a simple CSV file.
> {noformat}
> 1,haha,hehe
> 2,my,test
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11973) IN operator fails when the column type is DATE

2015-10-21 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968344#comment-14968344
 ] 

Yongzhi Chen commented on HIVE-11973:
-

IN use ReturnObjectInspectorResolver.update which end up use 
FunctionRegistry.getCommonClass method. This method is too strict. The elem in 
(val1, val2...) more like union all style of type conversion. (For types in 
elem, val1, val2... can be used in union all).  I will try the approach to let 
IN UDF use conversionHelper.updateForUnionAll.

> IN operator fails when the column type is DATE 
> ---
>
> Key: HIVE-11973
> URL: https://issues.apache.org/jira/browse/HIVE-11973
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0
>Reporter: sanjiv singh
>Assignee: Yongzhi Chen
> Attachments: HIVE-11973.1.patch
>
>
> Test DLL :
> {code}
> CREATE TABLE `date_dim`(
>   `d_date_sk` int, 
>   `d_date_id` string, 
>   `d_date` date, 
>   `d_current_week` string, 
>   `d_current_month` string, 
>   `d_current_quarter` string, 
>   `d_current_year` string) ;
> {code}
> Hive query :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date  IN ('2000-03-22','2001-03-22')  ;
> {code}
> In 1.0.0 ,  the above query fails with:
> {code}
> FAILED: SemanticException [Error 10014]: Line 1:180 Wrong arguments 
> ''2001-03-22'': The arguments for IN should be the same type! Types are: 
> {date IN (string, string)}
> {code}
> I changed the query as given to pass the error :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date  IN (CAST('2000-03-22' AS DATE) , CAST('2001-03-22' AS DATE) 
>  )  ;
> {code}
> But it works without casting  :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date   = '2000-03-22' ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12225) LineageCtx should release all resources at clear

2015-10-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968346#comment-14968346
 ] 

Hive QA commented on HIVE-12225:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12767858/HIVE-12225.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9698 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5728/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5728/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5728/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12767858 - PreCommit-HIVE-TRUNK-Build

> LineageCtx should release all resources at clear
> 
>
> Key: HIVE-12225
> URL: https://issues.apache.org/jira/browse/HIVE-12225
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12225.1.patch
>
>
> Somce maps are not released in clear() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10807) Invalidate basic stats for insert queries if autogather=false

2015-10-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10807:

Attachment: HIVE-10807.7.patch

> Invalidate basic stats for insert queries if autogather=false
> -
>
> Key: HIVE-10807
> URL: https://issues.apache.org/jira/browse/HIVE-10807
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.0
>Reporter: Gopal V
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10807.2.patch, HIVE-10807.3.patch, 
> HIVE-10807.4.patch, HIVE-10807.5.patch, HIVE-10807.6.patch, 
> HIVE-10807.7.patch, HIVE-10807.patch
>
>
> if stats.autogather=false leads to incorrect basic stats in case of insert 
> statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11973) IN operator fails when the column type is DATE

2015-10-21 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11973:

Attachment: HIVE-11973.2.patch

> IN operator fails when the column type is DATE 
> ---
>
> Key: HIVE-11973
> URL: https://issues.apache.org/jira/browse/HIVE-11973
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0
>Reporter: sanjiv singh
>Assignee: Yongzhi Chen
> Attachments: HIVE-11973.1.patch, HIVE-11973.2.patch
>
>
> Test DLL :
> {code}
> CREATE TABLE `date_dim`(
>   `d_date_sk` int, 
>   `d_date_id` string, 
>   `d_date` date, 
>   `d_current_week` string, 
>   `d_current_month` string, 
>   `d_current_quarter` string, 
>   `d_current_year` string) ;
> {code}
> Hive query :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date  IN ('2000-03-22','2001-03-22')  ;
> {code}
> In 1.0.0 ,  the above query fails with:
> {code}
> FAILED: SemanticException [Error 10014]: Line 1:180 Wrong arguments 
> ''2001-03-22'': The arguments for IN should be the same type! Types are: 
> {date IN (string, string)}
> {code}
> I changed the query as given to pass the error :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date  IN (CAST('2000-03-22' AS DATE) , CAST('2001-03-22' AS DATE) 
>  )  ;
> {code}
> But it works without casting  :
> {code}
> SELECT *  
> FROM   date_dim 
> WHERE d_date   = '2000-03-22' ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements

2015-10-21 Thread Chengbing Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968361#comment-14968361
 ] 

Chengbing Liu commented on HIVE-11901:
--

Failed tests are not related.

> StorageBasedAuthorizationProvider requires write permission on table for 
> SELECT statements
> --
>
> Key: HIVE-11901
> URL: https://issues.apache.org/jira/browse/HIVE-11901
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HIVE-11901.01.patch, HIVE-11901.02.patch
>
>
> With HIVE-7895, it will require write permission on the table directory even 
> for a SELECT statement.
> Looking at the stacktrace, it seems the method 
> {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, 
> Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats 
> a null partition as a CREATE statement, which can also be a SELECT.
> We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first   
> in order to tell which statement it is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements

2015-10-21 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968373#comment-14968373
 ] 

Thejas M Nair commented on HIVE-11901:
--

[~chengbing.liu] Thanks for adding the tests for the case where 
StorageBasedAuthorization is used in the client side.
Can you also please add a test case for StorageBasedAuthorization when used in 
metastore server, as that is the recommended mode for StorageBasedAuthorization 
?

A quick way would be to add this to 
TestStorageBasedMetastoreAuthorizationReads.java - 
{code}
  @Test
  public void testReadTableSuccessWithReadOnly() throws Exception {
readTableByOtherUser("-r--r--r--", true);
  }
{code}



> StorageBasedAuthorizationProvider requires write permission on table for 
> SELECT statements
> --
>
> Key: HIVE-11901
> URL: https://issues.apache.org/jira/browse/HIVE-11901
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HIVE-11901.01.patch, HIVE-11901.02.patch
>
>
> With HIVE-7895, it will require write permission on the table directory even 
> for a SELECT statement.
> Looking at the stacktrace, it seems the method 
> {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, 
> Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats 
> a null partition as a CREATE statement, which can also be a SELECT.
> We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first   
> in order to tell which statement it is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11985) don't store type names in metastore when metastore type names are not used

2015-10-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967498#comment-14967498
 ] 

Sergey Shelukhin commented on HIVE-11985:
-

The HCatalog test failed due to some unrelated cluster setup issue:
{noformat}
Caused by: java.io.FileNotFoundException: File 
file:/tmp/hadoop-yarn/staging/history/done does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:376)
at 
org.apache.hadoop.fs.DelegateToFileSystem.listStatus(DelegateToFileSystem.java:149)
{noformat}
It passes locally. The others were broken on master at the time.
[~xuefuz] [~alangates] [~ashutoshc] can you take a look at the latest patch?


> don't store type names in metastore when metastore type names are not used
> --
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.02.patch, 
> HIVE-11985.03.patch, HIVE-11985.05.patch, HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12223) Filter on Grouping__ID does not work properly

2015-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-12223:
---
Attachment: HIVE-12223.patch

> Filter on Grouping__ID does not work properly
> -
>
> Key: HIVE-12223
> URL: https://issues.apache.org/jira/browse/HIVE-12223
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12223.patch
>
>
> Consider the following query:
> {noformat}
> SELECT key, value, GROUPING__ID, count(*)
> FROM T1
> GROUP BY key, value
> GROUPING SETS ((), (key))
> HAVING GROUPING__ID = 1
> {noformat}
> This query will not return results. The reason is that a "constant" 
> placeholder is introduced by SemanticAnalyzer for the GROUPING\__ID column. 
> At execution time, this placeholder is replaced by the actual value of the 
> GROUPING\__ID. As it is a constant, the Hive optimizer will evaluate 
> statically whether the condition is met or not, leading to incorrect results. 
> A possible solution is to transform the placeholder constant into a function 
> over the grouping keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11100) Beeline should escape semi-colon in queries

2015-10-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968388#comment-14968388
 ] 

Xuefu Zhang commented on HIVE-11100:


What makes ';' start appearing in the connection string and did that happen 
before this one? Since ';' is reserved for query terminator, it seems 
reasonable to require escaping it all the time.

> Beeline should escape semi-colon in queries
> ---
>
> Key: HIVE-11100
> URL: https://issues.apache.org/jira/browse/HIVE-11100
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.0, 1.1.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11100.patch
>
>
> Beeline should escape the semicolon in queries. for example, the query like 
> followings:
> CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ';' LINES TERMINATED BY '\n';
> or 
> CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY '\;' LINES TERMINATED BY '\n';
> both failed.
> But the 2nd query with semicolon escaped with "\" works in CLI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements

2015-10-21 Thread Chengbing Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HIVE-11901:
-
Attachment: HIVE-11901.03.patch

Patch updated.

> StorageBasedAuthorizationProvider requires write permission on table for 
> SELECT statements
> --
>
> Key: HIVE-11901
> URL: https://issues.apache.org/jira/browse/HIVE-11901
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HIVE-11901.01.patch, HIVE-11901.02.patch, 
> HIVE-11901.03.patch
>
>
> With HIVE-7895, it will require write permission on the table directory even 
> for a SELECT statement.
> Looking at the stacktrace, it seems the method 
> {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, 
> Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats 
> a null partition as a CREATE statement, which can also be a SELECT.
> We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first   
> in order to tell which statement it is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-21 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968542#comment-14968542
 ] 

Chao Sun commented on HIVE-12189:
-

Patch looks good to me. +1. I don't think we should add predicates that are 
semantically the same.


> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow 
> very large
> 
>
> Key: HIVE-12189
> URL: https://issues.apache.org/jira/browse/HIVE-12189
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf 
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
> join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
> a2.hdp_databaseid = nf.hdp_databaseid) 
> join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
> join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
> a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
> join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
> a5.hdp_databaseid = nf.hdp_databaseid) 
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid 
> = nf.hdp_databaseid) 
> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid 
> = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the 
> tables are empty table. If the tables are not empty, it is reported that the 
> compile so slow that it looks like hive is hanging. 
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
> still a lot of time. One of the problem slows ppd down is that list in 
> pushdownPreds can grow very large which makes extractPushdownPreds bad 
> performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
> Operator op, List preds)
> {noformat}
> During run the query above, in the following break point preds  has size of 
> 12051, and most entry of the list is: 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> Following code in extractPushdownPreds will clone all the nodes in preds and 
> do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes 
> startWalking much faster, but we still clone thousands of nodes with same 
> expression. Should we store so many same predicates in the list or just one 
> is good enough?  
> {noformat}
> List startNodes = new ArrayList();
> List clonedPreds = new ArrayList();
> for (ExprNodeDesc node : preds) {
>   ExprNodeDesc clone = node.clone();
>   clonedPreds.add(clone);
>   exprContext.getNewToOldExprMap().put(clone, node);
> }
> startNodes.addAll(clonedPreds);
> egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method 
> public void addFinalCandidate(String alias, ExprNodeDesc expr) 
> and
> public void addPushDowns(String alias, List pushDowns) 
> to only add expr which is not in the PushDown list for an alias?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12213) Investigating the test failure TestHCatClient.testTableSchemaPropagation

2015-10-21 Thread Aleksei Statkevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei Statkevich updated HIVE-12213:
--
Attachment: HIVE-12213.2.patch

Tests failed because now fast stats for empty tables are not populated as you 
suggested. I re-generated outputs for failed tests. I also moved the check 
after "do_not_update_stats" check because otherwise this marker property might 
not be removed as planned.


> Investigating the test failure TestHCatClient.testTableSchemaPropagation
> 
>
> Key: HIVE-12213
> URL: https://issues.apache.org/jira/browse/HIVE-12213
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aleksei Statkevich
>Priority: Minor
> Attachments: HIVE-12213.2.patch, HIVE-12213.patch, HIVE-12231.1.patch
>
>
> The test has been failing for some time with following error.
> {noformat}
> Error Message
> Table after deserialization should have been identical to sourceTable. 
> expected:<[TABLE_PROPERTIES]> but was:<[]>
> Stacktrace
> java.lang.AssertionError: Table after deserialization should have been 
> identical to sourceTable. expected:<[TABLE_PROPERTIES]> but was:<[]>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation(TestHCatClient.java:1065)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-11718) JDBC ResultSet.setFetchSize(0) returns no results

2015-10-21 Thread Aleksei Statkevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei Statkevich reassigned HIVE-11718:
-

Assignee: Aleksei Statkevich

> JDBC ResultSet.setFetchSize(0) returns no results
> -
>
> Key: HIVE-11718
> URL: https://issues.apache.org/jira/browse/HIVE-11718
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.1
>Reporter: Son Nguyen
>Assignee: Aleksei Statkevich
> Attachments: HIVE-11718.patch
>
>
> Hi,
> According to JDBC document, the driver setFetchSize(0) should ignore, but 
> Hive JDBC driver returns no result.
> Our product uses setFetchSize to fine tune performance, sometimes we would 
> like to leave setFetchSize(0) up to the driver to make best guess of the 
> fetch size.
> Thanks
> Son Nguyen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12230) custom UDF configure() not called in Vectorization mode

2015-10-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12230:

Attachment: HIVE-12230.01.patch

> custom UDF configure() not called in Vectorization mode
> ---
>
> Key: HIVE-12230
> URL: https://issues.apache.org/jira/browse/HIVE-12230
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12230.01.patch
>
>
> PROBLEM:
> A custom UDF that overrides the configure()
> {code}
> @Override
>   public void configure(MapredContext context) {
>   greeting = "Hello ";
>   }
> {code}
> In vectorization mode, it is not called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12084) Hive queries with ORDER BY and large LIMIT fails with OutOfMemoryError Java heap space

2015-10-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-12084:
-
Attachment: (was: HIVE-12084.4.patch)

> Hive queries with ORDER BY and large LIMIT fails with OutOfMemoryError Java 
> heap space
> --
>
> Key: HIVE-12084
> URL: https://issues.apache.org/jira/browse/HIVE-12084
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12084.1.patch, HIVE-12084.2.patch, 
> HIVE-12084.3.patch, HIVE-12084.4.patch
>
>
> STEPS TO REPRODUCE:
> {code}
> CREATE TABLE `sample_07` ( `code` string , `description` string , `total_emp` 
> int , `salary` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS 
> TextFile;
> load data local inpath 'sample_07.csv'  into table sample_07;
> set hive.limit.pushdown.memory.usage=0.;
> select * from sample_07 order by salary LIMIT 9;
> {code}
> This will result in 
> {code}
> Caused by: java.lang.OutOfMemoryError: Java heap space
>   at org.apache.hadoop.hive.ql.exec.TopNHash.initialize(TopNHash.java:113)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initializeOp(ReduceSinkOperator.java:234)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:68)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
> {code}
> The basic issue lies with top n optimization. We need a limit for the top n 
> optimization. Ideally we would detect that the allocated bytes will be bigger 
> than the "limit.pushdown.memory.usage" without trying to alloc it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12084) Hive queries with ORDER BY and large LIMIT fails with OutOfMemoryError Java heap space

2015-10-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-12084:
-
Attachment: HIVE-12084.4.patch

> Hive queries with ORDER BY and large LIMIT fails with OutOfMemoryError Java 
> heap space
> --
>
> Key: HIVE-12084
> URL: https://issues.apache.org/jira/browse/HIVE-12084
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12084.1.patch, HIVE-12084.2.patch, 
> HIVE-12084.3.patch, HIVE-12084.4.patch
>
>
> STEPS TO REPRODUCE:
> {code}
> CREATE TABLE `sample_07` ( `code` string , `description` string , `total_emp` 
> int , `salary` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS 
> TextFile;
> load data local inpath 'sample_07.csv'  into table sample_07;
> set hive.limit.pushdown.memory.usage=0.;
> select * from sample_07 order by salary LIMIT 9;
> {code}
> This will result in 
> {code}
> Caused by: java.lang.OutOfMemoryError: Java heap space
>   at org.apache.hadoop.hive.ql.exec.TopNHash.initialize(TopNHash.java:113)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initializeOp(ReduceSinkOperator.java:234)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:68)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
> {code}
> The basic issue lies with top n optimization. We need a limit for the top n 
> optimization. Ideally we would detect that the allocated bytes will be bigger 
> than the "limit.pushdown.memory.usage" without trying to alloc it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-21 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968303#comment-14968303
 ] 

Jason Dere commented on HIVE-12063:
---

I'm off the next 2 days, I'll try to take a look next week

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12206) ClassNotFound Exception during query compilation with Tez and Union query and GenericUDFs

2015-10-21 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-12206:
--
Attachment: HIVE-12206.1.patch

Initial patch with test.

> ClassNotFound Exception during query compilation with Tez and Union query and 
> GenericUDFs
> -
>
> Key: HIVE-12206
> URL: https://issues.apache.org/jira/browse/HIVE-12206
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, UDF
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-12206.1.patch
>
>
> {noformat}
> -- union query without UDF
> explain
> select * from (select key + key from src limit 1) a
> union all
> select * from (select key + key from src limit 1) b;
> add jar /tmp/udf-2.2.0-snapshot.jar;
> create temporary function myudf as 
> 'com.aginity.amp.hive.udf.UniqueNumberGenerator';
> -- Now try the query with the UDF
> explain
> select myudf()from (select key from src limit 1) a
> union all
> select myudf() from (select key from src limit 1) a;
> {noformat}
> Got error:
> {noformat}
> 2015-10-16 17:00:55,557 ERROR ql.Driver (SessionState.java:printError(963)) - 
> FAILED: KryoException Unable to find class: 
> com.aginity.amp.hive.udf.UniqueNumberGenerator
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
> parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.LimitOperator)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: com.aginity.amp.hive.udf.UniqueNumberGenerator
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
> parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.LimitOperator)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
>

[jira] [Commented] (HIVE-11100) Beeline should escape semi-colon in queries

2015-10-21 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968308#comment-14968308
 ] 

Szehon Ho commented on HIVE-11100:
--

Hey [~ctang.ma] [~xuefuz], wondering if you realized that this is backward 
compatible for connect strings that involve semicolon, for example connecting 
to security or dynamic service discovery?

eg, beeline -u 'jdbc:hive2://localhost:1/default;principal=hive/host@realm'
beeline -u 
"jdbc:hive2://localhost:2181/\;serviceDiscoveryMode=zooKeeper\;zooKeeperNamespace=hiveserver2"

> Beeline should escape semi-colon in queries
> ---
>
> Key: HIVE-11100
> URL: https://issues.apache.org/jira/browse/HIVE-11100
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.0, 1.1.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11100.patch
>
>
> Beeline should escape the semicolon in queries. for example, the query like 
> followings:
> CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ';' LINES TERMINATED BY '\n';
> or 
> CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY '\;' LINES TERMINATED BY '\n';
> both failed.
> But the 2nd query with semicolon escaped with "\" works in CLI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11895) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix udaf_percentile_approx_23.q

2015-10-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968201#comment-14968201
 ] 

Ashutosh Chauhan commented on HIVE-11895:
-

+1

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix 
> udaf_percentile_approx_23.q
> -
>
> Key: HIVE-11895
> URL: https://issues.apache.org/jira/browse/HIVE-11895
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11895.01.patch, HIVE-11895.02.patch, 
> HIVE-11895.03.patch
>
>
> Due to a type conversion problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11895) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix udaf_percentile_approx_23.q

2015-10-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11895:
---
Attachment: HIVE-11895.03.patch

address [~ashutoshc]'s comments.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix 
> udaf_percentile_approx_23.q
> -
>
> Key: HIVE-11895
> URL: https://issues.apache.org/jira/browse/HIVE-11895
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11895.01.patch, HIVE-11895.02.patch, 
> HIVE-11895.03.patch
>
>
> Due to a type conversion problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12186) Upgrade Hive to Calcite 1.5

2015-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-12186:
---
Attachment: HIVE-12186.01.patch

> Upgrade Hive to Calcite 1.5
> ---
>
> Key: HIVE-12186
> URL: https://issues.apache.org/jira/browse/HIVE-12186
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 2.0.0
>
> Attachments: HIVE-12186.01.patch, HIVE-12186.patch
>
>
> CLEAR LIBRARY CACHE
> Upgrade Hive to Calcite 1.5.0-incubating.
> There is currently a snapshot release, which is close to what will be in 1.5. 
> First, we will test and check any possible issues against the snapshot, so we 
> can upgrade quicker once the release is out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12220) LLAP: Usability issues with hive.llap.io.cache.orc.size

2015-10-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968211#comment-14968211
 ] 

Hive QA commented on HIVE-12220:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12767855/HIVE-12220.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9661 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.initializationError
org.apache.hadoop.hive.llap.cache.TestLowLevelLrfuCachePolicy.testLfuExtreme
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5727/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5727/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5727/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12767855 - PreCommit-HIVE-TRUNK-Build

> LLAP: Usability issues with hive.llap.io.cache.orc.size
> ---
>
> Key: HIVE-12220
> URL: https://issues.apache.org/jira/browse/HIVE-12220
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Carter Shanklin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12220.patch
>
>
> In the llap-daemon site you need to set, among other things,
> llap.daemon.memory.per.instance.mb
> and
> hive.llap.io.cache.orc.size
> The use of hive.llap.io.cache.orc.size caused me some unnecessary problems, 
> initially I entered the value in MB rather than in bytes. Operator error you 
> could say but I look at this as a fraction of the other value which is in mb.
> Second, is this really tied to ORC? E.g. when we have the vectorized text 
> reader will this data be cached as well? Or might it be in the future?
> I would like to propose instead using hive.llap.io.cache.size.mb for this 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2015-10-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11981:

Attachment: HIVE-11981.08.patch

> ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
> --
>
> Key: HIVE-11981
> URL: https://issues.apache.org/jira/browse/HIVE-11981
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11981.01.patch, HIVE-11981.02.patch, 
> HIVE-11981.03.patch, HIVE-11981.05.patch, HIVE-11981.06.patch, 
> HIVE-11981.07.patch, HIVE-11981.08.patch, ORC Schema Evolution Issues.docx
>
>
> High priority issues with schema evolution for the ORC file format.
> Schema evolution here is limited to adding new columns and a few cases of 
> column type-widening (e.g. int to bigint).
> Renaming columns, deleting column, moving columns and other schema evolution 
> were not pursued due to lack of importance and lack of time.  Also, it 
> appears a much more sophisticated metadata would be needed to support them.
> The biggest issues for users have been adding new columns for ACID table 
> (HIVE-11421 Support Schema evolution for ACID tables) and vectorization 
> (HIVE-10598 Vectorization borks when column is added to table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements

2015-10-21 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968397#comment-14968397
 ] 

Thejas M Nair commented on HIVE-11901:
--

Thanks for the update!
+1 pending tests.


> StorageBasedAuthorizationProvider requires write permission on table for 
> SELECT statements
> --
>
> Key: HIVE-11901
> URL: https://issues.apache.org/jira/browse/HIVE-11901
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HIVE-11901.01.patch, HIVE-11901.02.patch, 
> HIVE-11901.03.patch
>
>
> With HIVE-7895, it will require write permission on the table directory even 
> for a SELECT statement.
> Looking at the stacktrace, it seems the method 
> {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, 
> Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats 
> a null partition as a CREATE statement, which can also be a SELECT.
> We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first   
> in order to tell which statement it is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11473) Upgrade Spark dependency to 1.5 [Spark Branch]

2015-10-21 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968406#comment-14968406
 ] 

Rui Li commented on HIVE-11473:
---

Hi Xuefu, any progress on this one?

> Upgrade Spark dependency to 1.5 [Spark Branch]
> --
>
> Key: HIVE-11473
> URL: https://issues.apache.org/jira/browse/HIVE-11473
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Jimmy Xiang
>Assignee: Rui Li
> Attachments: HIVE-11473.1-spark.patch, HIVE-11473.2-spark.patch, 
> HIVE-11473.3-spark.patch, HIVE-11473.3-spark.patch
>
>
> In Spark 1.5, SparkListener interface is changed. So HoS may fail to create 
> the spark client if the un-implemented event callback method is invoked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 127 matches

Mail list logo