[jira] [Updated] (HIVE-15647) Combination of a boolean condition and null-safe comparison leads to NPE

2017-01-24 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-15647:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Combination of a boolean condition and null-safe comparison leads to NPE
> 
>
> Key: HIVE-15647
> URL: https://issues.apache.org/jira/browse/HIVE-15647
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Carter Shanklin
>Assignee: Remus Rusanu
>Priority: Minor
> Attachments: HIVE-15647.01.patch, HIVE-15647.02.patch, 
> HIVE-15647.03.patch, HIVE-15647.04.patch
>
>
> Here's a simple example with the foodmart database:
> {code}
> hive> explain select count(*) from
> > sales_fact_1997 join store on sales_fact_1997.store_id = store.store_id
> > where ((store.salad_bar)) and ((store_number) <=> (customer_id));
> FAILED: NullPointerException null
> {code}
> This happens on trunk and on HDP 2.5.3 / Hive 2. If you use = the NPE doesn't 
> happen. If you remove the boolean condition the NPE doesn't happen.
> {code}
> FAILED: NullPointerException null
> 2016-12-13T18:23:33,604 ERROR [c4b7242e-1252-4709-8adf-22f631af75e8 main] 
> ql.Driver: FAILED: NullPointerException null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory$ConstantPropagateFilterProc.process(ConstantPropagateProcFactory.java:1047)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConstantPropagate$ConstantPropagateWalker.walk(ConstantPropagate.java:151)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConstantPropagate.transform(ConstantPropagate.java:120)
>   at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:242)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10913)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:75)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:435)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:326)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1169)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1095)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1083)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15579) Support HADOOP_PROXY_USER for secure impersonation in hive metastore client

2017-01-24 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837341#comment-15837341
 ] 

Lefty Leverenz commented on HIVE-15579:
---

Does this need to be documented in the wiki?

> Support HADOOP_PROXY_USER for secure impersonation in hive metastore client
> ---
>
> Key: HIVE-15579
> URL: https://issues.apache.org/jira/browse/HIVE-15579
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Nandakumar
> Fix For: 2.2.0
>
> Attachments: HIVE-15579.000.patch, HIVE-15579.001.patch, 
> HIVE-15579.002.patch, HIVE-15579.003.patch, HIVE-15579.003.patch
>
>
> Hadoop clients support HADOOP_PROXY_USER for secure impersonation. It would 
> be useful to have similar feature for hive metastore client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15718) Fix the NullPointer problem caused by split phase

2017-01-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837339#comment-15837339
 ] 

Hive QA commented on HIVE-15718:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849206/HIVE-15718.001.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10996 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3165/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3165/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3165/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849206 - PreCommit-HIVE-Build

> Fix the NullPointer problem caused by split phase
> -
>
> Key: HIVE-15718
> URL: https://issues.apache.org/jira/browse/HIVE-15718
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Attachments: HIVE-15718.001.patch
>
>
> VectorizedParquetRecordReader.initialize() will throw NullPointer Exception 
> because the input split is null. This split should be ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15587) Using ChangeManager to copy files in ReplCopyTask

2017-01-24 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837334#comment-15837334
 ] 

Vaibhav Gumashta commented on HIVE-15587:
-

+1. Only small comment is to use transactional event listener in 
TestReplicationScenarios. I'll make that change in HIVE-15642.

> Using ChangeManager to copy files in ReplCopyTask 
> --
>
> Key: HIVE-15587
> URL: https://issues.apache.org/jira/browse/HIVE-15587
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15587.1.patch, HIVE-15587.2.patch
>
>
> Currently ReplCopyTask copy files directly from source repo. The files in the 
> source repo may have been dropped or change. We shall use checksum 
> transferred to ReplCopyTask to verify. If different, retrieve file from 
> cmroot instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15722) LLAP: Avoid marking a query as complete if the AMReporter runs into an error

2017-01-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15722:
--
Status: Patch Available  (was: Open)

> LLAP: Avoid marking a query as complete if the AMReporter runs into an error
> 
>
> Key: HIVE-15722
> URL: https://issues.apache.org/jira/browse/HIVE-15722
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15722.01.patch
>
>
> When the AMReporter runs into an error (typically intermittent), we end up 
> killing all fragments on the daemon. This is done by marking the query as 
> complete.
> The AM would continue to try scheduling on this node - which would lead to 
> task failures if the daemon structures are updated.
> Instead of clearing the structures, it's better to kill the fragments, and 
> let a queryComplete call come in from the AM.
> Later, we could make enhancements in the AM to avoid such nodes. That's not 
> simple though, since the AM will not find out what happened due to the 
> communication failure from the daemon.
> Leads to 
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): Dag 
> query16 already complete. Rejecting fragment [Map 7, 29, 0]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerFragment(QueryTracker.java:149)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:226)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:487)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:101)
>   at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:16728)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15722) LLAP: Avoid marking a query as complete if the AMReporter runs into an error

2017-01-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15722:
--
Attachment: HIVE-15722.01.patch

[~prasanth_j] - could you please review.

> LLAP: Avoid marking a query as complete if the AMReporter runs into an error
> 
>
> Key: HIVE-15722
> URL: https://issues.apache.org/jira/browse/HIVE-15722
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15722.01.patch
>
>
> When the AMReporter runs into an error (typically intermittent), we end up 
> killing all fragments on the daemon. This is done by marking the query as 
> complete.
> The AM would continue to try scheduling on this node - which would lead to 
> task failures if the daemon structures are updated.
> Instead of clearing the structures, it's better to kill the fragments, and 
> let a queryComplete call come in from the AM.
> Later, we could make enhancements in the AM to avoid such nodes. That's not 
> simple though, since the AM will not find out what happened due to the 
> communication failure from the daemon.
> Leads to 
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): Dag 
> query16 already complete. Rejecting fragment [Map 7, 29, 0]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerFragment(QueryTracker.java:149)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:226)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:487)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:101)
>   at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:16728)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15714) backport HIVE-11985 (and HIVE-12601) to branch-1

2017-01-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837318#comment-15837318
 ] 

Ashutosh Chauhan commented on HIVE-15714:
-

+1

For next release we should get rid of currently hackish solution of conf 
variable and instead have a marker interface on serde to differentiate which 
serdes use metastore to store their schemas. Can you create a follow-up jira 
for that?

> backport HIVE-11985 (and HIVE-12601) to branch-1
> 
>
> Key: HIVE-15714
> URL: https://issues.apache.org/jira/browse/HIVE-15714
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15714-branch-1.patch
>
>
> Backport HIVE-11985 (and HIVE-12601) to branch-1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15721) Allow IN/NOT IN correlated subquery with aggregates

2017-01-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837311#comment-15837311
 ] 

Ashutosh Chauhan commented on HIVE-15721:
-

[~vgarg] Can you create a RB request for it?

> Allow  IN/NOT IN correlated subquery with aggregates
> 
>
> Key: HIVE-15721
> URL: https://issues.apache.org/jira/browse/HIVE-15721
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15721.1.patch
>
>
> With HIVE-15544 IN/NOT IN correlated subqueries with aggregates were disabled 
> since re-writting them into JOIN could have produced wrong result.
> Wrong results would occur if subquery produces zero row, since aggregate 
> always produce result lower such query into LEFT JOIN or SEMI JOIN would not 
> take these case into consideration.
> We propose to allow such queries with an added run time check which will 
> throw an error/exception if subquery produces zero row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15709) Vectorization: Fix performance issue with using LazyBinaryUtils.writeVInt and locking / thread local storage

2017-01-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837292#comment-15837292
 ] 

Hive QA commented on HIVE-15709:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849203/HIVE-15709.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 60 failed/errored test(s), 10996 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_opt_vectorization]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_partitioned]
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_llap] 
(batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_vector_dynpart_hashjoin_1]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_vector_dynpart_hashjoin_2]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_adaptor_usage_mode]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_auto_smb_mapjoin_14]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_between_in]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_mapjoin1]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_coalesce_2]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_coalesce_3]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_complex_all]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_3]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_5]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_aggregate]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_mapjoin]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_distinct_2]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_3]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_mapjoin]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_reduce]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_inner_join]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_mapjoin]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join30]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join_filters]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join_nulls]
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_left_outer_join2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_leftsemi_mapjoin]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_nullsafe_join]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_orderby_5]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join0]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join1]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join3]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join4]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join6]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partitioned_date_time]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_reduce_groupby_decimal]
 (batchId=144)

[jira] [Commented] (HIVE-15655) Optimizer: Allow config option to disable n-way JOIN merging

2017-01-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837239#comment-15837239
 ] 

Hive QA commented on HIVE-15655:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849199/HIVE-15655.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10982 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=116)

[groupby_map_ppr_multi_distinct.q,vectorization_13.q,mapjoin_mapjoin.q,union2.q,join41.q,groupby8_map.q,cbo_subq_not_in.q,identity_project_remove_skip.q,stats5.q,groupby8_map_skew.q,nullgroup2.q,mapjoin_subquery.q,bucket2.q,smb_mapjoin_1.q,union_remove_8.q]
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3163/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3163/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3163/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849199 - PreCommit-HIVE-Build

> Optimizer: Allow config option to disable n-way JOIN merging 
> -
>
> Key: HIVE-15655
> URL: https://issues.apache.org/jira/browse/HIVE-15655
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-15655.1.patch, HIVE-15655.2.patch, 
> HIVE-15655.2.patch, HIVE-15655.3.patch
>
>
> N-way Joins in Tez produce bad runtime plans whenever they are left-outer 
> joins with map-joins.
> This is something which should have a safety setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15721) Allow IN/NOT IN correlated subquery with aggregates

2017-01-24 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15721:
---
Attachment: HIVE-15721.1.patch

> Allow  IN/NOT IN correlated subquery with aggregates
> 
>
> Key: HIVE-15721
> URL: https://issues.apache.org/jira/browse/HIVE-15721
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15721.1.patch
>
>
> With HIVE-15544 IN/NOT IN correlated subqueries with aggregates were disabled 
> since re-writting them into JOIN could have produced wrong result.
> Wrong results would occur if subquery produces zero row, since aggregate 
> always produce result lower such query into LEFT JOIN or SEMI JOIN would not 
> take these case into consideration.
> We propose to allow such queries with an added run time check which will 
> throw an error/exception if subquery produces zero row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15650) LLAP: Set perflogger to DEBUG level for llap daemons

2017-01-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837189#comment-15837189
 ] 

Hive QA commented on HIVE-15650:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12847911/HIVE-15650.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10981 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=97)

[parallel_join1.q,union27.q,union12.q,groupby7_map_multi_single_reducer.q,varchar_join1.q,join7.q,join_reorder4.q,skewjoinopt2.q,bucketsortoptimize_insert_2.q,smb_mapjoin_17.q,script_env_var1.q,groupby7_map.q,groupby3.q,bucketsortoptimize_insert_8.q,union20.q]
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3162/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3162/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3162/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12847911 - PreCommit-HIVE-Build

> LLAP: Set perflogger to DEBUG level for llap daemons
> 
>
> Key: HIVE-15650
> URL: https://issues.apache.org/jira/browse/HIVE-15650
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Logging
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15650.1.patch
>
>
> During Hive2 dev, the PerfLogger was moved to DEBUG levels only making it 
> impossible to debug timings from LLAP logs without manually editing 
> log4j2.properties and redeploying LLAP.
> Enable PerfLogger by default on LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15721) Allow IN/NOT IN correlated subquery with aggregates

2017-01-24 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15721:
---
Status: Patch Available  (was: Open)

> Allow  IN/NOT IN correlated subquery with aggregates
> 
>
> Key: HIVE-15721
> URL: https://issues.apache.org/jira/browse/HIVE-15721
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15721.1.patch
>
>
> With HIVE-15544 IN/NOT IN correlated subqueries with aggregates were disabled 
> since re-writting them into JOIN could have produced wrong result.
> Wrong results would occur if subquery produces zero row, since aggregate 
> always produce result lower such query into LEFT JOIN or SEMI JOIN would not 
> take these case into consideration.
> We propose to allow such queries with an added run time check which will 
> throw an error/exception if subquery produces zero row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14949) Enforce that target:source is not 1:N

2017-01-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14949:
--
Attachment: HIVE-14949.05.patch

> Enforce that target:source is not 1:N
> -
>
> Key: HIVE-14949
> URL: https://issues.apache.org/jira/browse/HIVE-14949
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, 
> HIVE-14949.03.patch, HIVE-14949.03.patch, HIVE-14949.04.patch, 
> HIVE-14949.05.patch
>
>
> If > 1 row on source side matches the same row on target side that means that 
>  we are forced update (or delete) the same row in target more than once as 
> part of the same SQL statement.  This should raise an error per SQL Spec
> ISO/IEC 9075-2:2011(E)
> Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B
> There is no sure way to do this via static analysis of the query.
> Can we add something to ROJ operator to pay attention to ROW__ID of target 
> side row and compare it with ROW__ID of target side of previous row output?  
> If they are the same, that means > 1 source row matched.
> Or perhaps just mark each row in the hash table that it matched.  And if it 
> matches again, throw an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15721) Allow IN/NOT IN correlated subquery with aggregates

2017-01-24 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15721:
---
Labels: sub-query  (was: )

> Allow  IN/NOT IN correlated subquery with aggregates
> 
>
> Key: HIVE-15721
> URL: https://issues.apache.org/jira/browse/HIVE-15721
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
>
> With HIVE-15544 IN/NOT IN correlated subqueries with aggregates were disabled 
> since re-writting them into JOIN could have produced wrong result.
> Wrong results would occur if subquery produces zero row, since aggregate 
> always produce result lower such query into LEFT JOIN or SEMI JOIN would not 
> take these case into consideration.
> We propose to allow such queries with an added run time check which will 
> throw an error/exception if subquery produces zero row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15485) Investigate the DoAs failure in HoS

2017-01-24 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-15485:
---
Status: Patch Available  (was: Open)

> Investigate the DoAs failure in HoS
> ---
>
> Key: HIVE-15485
> URL: https://issues.apache.org/jira/browse/HIVE-15485
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-15485.patch
>
>
> With DoAs enabled, HoS failed with following errors:
> {code}
> Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
> systest tries to renew a token with renewer hive
>   at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7543)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:674)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:999)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)
> {code}
> It is related to the change from HIVE-14383. It looks like that SparkSubmit 
> logs in Kerberos with passed in hive principal/keytab and then tries to 
> create a hdfs delegation token for user systest with renewer hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15672) LLAP text cache: improve first query perf II

2017-01-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15672:

Attachment: HIVE-15672.WIP.patch

Need to sort out the includes and propagate them to when async data is cached; 
make sure cleanup on error makes sense; and clean up the code.

> LLAP text cache: improve first query perf II
> 
>
> Key: HIVE-15672
> URL: https://issues.apache.org/jira/browse/HIVE-15672
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15672.WIP.patch
>
>
> 4) Send VRB to the pipeline and write ORC in parallel (in background).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15485) Investigate the DoAs failure in HoS

2017-01-24 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-15485:
---
Attachment: HIVE-15485.patch

Given that we are not able to support both doAs and delegation token renewal in 
Spark at this moment (see comments in SPARK-5493, SPARK-19143 etc), and doAs is 
more common case in Hive, so when doAs is enabled, we will use kinit instead of 
passing the principal/keytab to Spark. I could not thought of other ways to 
make both work. [~xuefuz], do you have any thought? If you agree on that, could 
you help review the patch? Thanks. 

> Investigate the DoAs failure in HoS
> ---
>
> Key: HIVE-15485
> URL: https://issues.apache.org/jira/browse/HIVE-15485
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-15485.patch
>
>
> With DoAs enabled, HoS failed with following errors:
> {code}
> Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
> systest tries to renew a token with renewer hive
>   at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7543)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:674)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:999)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)
> {code}
> It is related to the change from HIVE-14383. It looks like that SparkSubmit 
> logs in Kerberos with passed in hive principal/keytab and then tries to 
> create a hdfs delegation token for user systest with renewer hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15698) Vectorization support for min/max/bloomfilter runtime filtering

2017-01-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837136#comment-15837136
 ] 

Hive QA commented on HIVE-15698:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849190/HIVE-15698.1.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11002 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_dynamic_semijoin_reduction]
 (batchId=4)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_llap] 
(batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3161/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3161/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3161/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849190 - PreCommit-HIVE-Build

> Vectorization support for min/max/bloomfilter runtime filtering
> ---
>
> Key: HIVE-15698
> URL: https://issues.apache.org/jira/browse/HIVE-15698
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15698.1.patch
>
>
> Enable vectorized execution for HIVE-15269.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15664:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-15664.04.patch, HIVE-15664.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15653) Some ALTER TABLE commands drop table stats

2017-01-24 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837089#comment-15837089
 ] 

Pengcheng Xiong commented on HIVE-15653:


LGTM +1 pending tests. Thanks [~ctang.ma] for the patch.

> Some ALTER TABLE commands drop table stats
> --
>
> Key: HIVE-15653
> URL: https://issues.apache.org/jira/browse/HIVE-15653
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
>Reporter: Alexander Behm
>Assignee: Chaoyu Tang
>Priority: Critical
> Attachments: HIVE-15653.1.patch, HIVE-15653.2.patch, HIVE-15653.patch
>
>
> Some ALTER TABLE commands drop the table stats. That may make sense for some 
> ALTER TABLE operations, but certainly not for others. Personally, I I think 
> ALTER TABLE should only change what was requested by the user without any 
> side effects that may be unclear to users. In particular, collecting stats 
> can be an expensive operation so it's rather inconvenient for users if they 
> get wiped accidentally.
> Repro:
> {code}
> create table t (i int);
> insert into t values(1);
> analyze table t compute statistics;
> alter table t set tblproperties('test'='test');
> hive> describe formatted t;
> OK
> # col_namedata_type   comment 
>
> i int 
>
> # Detailed Table Information   
> Database: default  
> Owner:abehm
> CreateTime:   Tue Jan 17 18:13:34 PST 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Retention:0
> Location: hdfs://localhost:20500/test-warehouse/t  
> Table Type:   MANAGED_TABLE
> Table Parameters:  
>   COLUMN_STATS_ACCURATE   false   
>   last_modified_byabehm   
>   last_modified_time  1484705748  
>   numFiles1   
>   numRows -1  
>   rawDataSize -1  
>   testtest
>   totalSize   2   
>   transient_lastDdlTime   1484705748  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> Time taken: 0.169 seconds, Fetched: 34 row(s)
> {code}
> The same behavior can be observed with several other ALTER TABLE commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15653) Some ALTER TABLE commands drop table stats

2017-01-24 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-15653:
---
Attachment: HIVE-15653.2.patch

Revised the patch based on the review.

> Some ALTER TABLE commands drop table stats
> --
>
> Key: HIVE-15653
> URL: https://issues.apache.org/jira/browse/HIVE-15653
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
>Reporter: Alexander Behm
>Assignee: Chaoyu Tang
>Priority: Critical
> Attachments: HIVE-15653.1.patch, HIVE-15653.2.patch, HIVE-15653.patch
>
>
> Some ALTER TABLE commands drop the table stats. That may make sense for some 
> ALTER TABLE operations, but certainly not for others. Personally, I I think 
> ALTER TABLE should only change what was requested by the user without any 
> side effects that may be unclear to users. In particular, collecting stats 
> can be an expensive operation so it's rather inconvenient for users if they 
> get wiped accidentally.
> Repro:
> {code}
> create table t (i int);
> insert into t values(1);
> analyze table t compute statistics;
> alter table t set tblproperties('test'='test');
> hive> describe formatted t;
> OK
> # col_namedata_type   comment 
>
> i int 
>
> # Detailed Table Information   
> Database: default  
> Owner:abehm
> CreateTime:   Tue Jan 17 18:13:34 PST 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Retention:0
> Location: hdfs://localhost:20500/test-warehouse/t  
> Table Type:   MANAGED_TABLE
> Table Parameters:  
>   COLUMN_STATS_ACCURATE   false   
>   last_modified_byabehm   
>   last_modified_time  1484705748  
>   numFiles1   
>   numRows -1  
>   rawDataSize -1  
>   testtest
>   totalSize   2   
>   transient_lastDdlTime   1484705748  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> Time taken: 0.169 seconds, Fetched: 34 row(s)
> {code}
> The same behavior can be observed with several other ALTER TABLE commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14949) Enforce that target:source is not 1:N

2017-01-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837079#comment-15837079
 ] 

Hive QA commented on HIVE-14949:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849184/HIVE-14949.04.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10982 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=125)

[table_access_keys_stats.q,bucketmapjoin11.q,auto_join4.q,mapjoin_decimal.q,join34.q,nullgroup.q,mergejoins_mixed.q,sort.q,stats8.q,auto_join28.q,join17.q,union17.q,skewjoinopt11.q,groupby1_map.q,load_dyn_part11.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] 
(batchId=67)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3160/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3160/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3160/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849184 - PreCommit-HIVE-Build

> Enforce that target:source is not 1:N
> -
>
> Key: HIVE-14949
> URL: https://issues.apache.org/jira/browse/HIVE-14949
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, 
> HIVE-14949.03.patch, HIVE-14949.03.patch, HIVE-14949.04.patch
>
>
> If > 1 row on source side matches the same row on target side that means that 
>  we are forced update (or delete) the same row in target more than once as 
> part of the same SQL statement.  This should raise an error per SQL Spec
> ISO/IEC 9075-2:2011(E)
> Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B
> There is no sure way to do this via static analysis of the query.
> Can we add something to ROJ operator to pay attention to ROW__ID of target 
> side row and compare it with ROW__ID of target side of previous row output?  
> If they are the same, that means > 1 source row matched.
> Or perhaps just mark each row in the hash table that it matched.  And if it 
> matches again, throw an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15573) Vectorization: ACID shuffle ReduceSink is not specialized

2017-01-24 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-15573:
---
Attachment: acid-test.svg

> Vectorization: ACID shuffle ReduceSink is not specialized 
> --
>
> Key: HIVE-15573
> URL: https://issues.apache.org/jira/browse/HIVE-15573
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions, Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Matt McCline
> Fix For: 2.2.0
>
> Attachments: acid-test.svg, HIVE-15573.01.patch, HIVE-15573.02.patch, 
> HIVE-15573.03.patch, screenshot-1.png
>
>
> The ACID shuffle disabled murmur hash for the shuffle, due to the bucketing 
> requirements demanding the writable hashcode for the shuffles.
> {code}
> boolean useUniformHash = desc.getReducerTraits().contains(UNIFORM);
> if (!useUniformHash) {
>   return false;
> }
> {code}
> This check protects the fast ReduceSink ops from being used in ACID inserts.
> A specialized case for the following pattern will make ACID insert much 
> faster.
> {code}
> Reduce Output Operator
>   sort order: 
>   Map-reduce partition columns: _col0 (type: bigint)
>   value expressions:  
> {code}
> !screenshot-1.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15573) Vectorization: ACID shuffle ReduceSink is not specialized

2017-01-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837074#comment-15837074
 ] 

Gopal V commented on HIVE-15573:


[~mmccline]: LGTM -  +1 tests pending. 

Nits on the LOG.debug(), wrap the ones which do Arrays. calls with an 
isDebugEnabled.

There needs to be a guard-rail to check the 2 enums together, in one place. Not 
all combinations of {{BucketNumKind}} x {{PartitionHashCodeKind 
PartitionHashCodeKind}} matrix are valid.

Also final variables in the loop are very useful to catch issues ahead of time 
- moving these into the loop + finals, means the compiler ensures no left over 
state from a previous row & that all branches perform assignments to all 
variables.

{code}
+  int batchIndex;
+  int bucketNum;
+  int hashCode;
+  int keyLength;
{code}

> Vectorization: ACID shuffle ReduceSink is not specialized 
> --
>
> Key: HIVE-15573
> URL: https://issues.apache.org/jira/browse/HIVE-15573
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions, Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Matt McCline
> Fix For: 2.2.0
>
> Attachments: HIVE-15573.01.patch, HIVE-15573.02.patch, 
> HIVE-15573.03.patch, screenshot-1.png
>
>
> The ACID shuffle disabled murmur hash for the shuffle, due to the bucketing 
> requirements demanding the writable hashcode for the shuffles.
> {code}
> boolean useUniformHash = desc.getReducerTraits().contains(UNIFORM);
> if (!useUniformHash) {
>   return false;
> }
> {code}
> This check protects the fast ReduceSink ops from being used in ACID inserts.
> A specialized case for the following pattern will make ACID insert much 
> faster.
> {code}
> Reduce Output Operator
>   sort order: 
>   Map-reduce partition columns: _col0 (type: bigint)
>   value expressions:  
> {code}
> !screenshot-1.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15719) Hive cast decimal to int is not consistent with postgres or oracle.

2017-01-24 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15719:
---
Description: 
drop table dt;
create table dt (c decimal(38,25));
insert into dt values (-999.61000), 
(-999.24000);
select cast(c as int) from dt;


Hive will return {code}
+---+--+
|   c   |
+---+--+
| -999  |
| -999  |
+---+--+
{code}

Others will return {code}
   c
---
 -1000
  -999
(2 rows)
{code}

> Hive cast decimal to int is not consistent with postgres or oracle.
> ---
>
> Key: HIVE-15719
> URL: https://issues.apache.org/jira/browse/HIVE-15719
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> drop table dt;
> create table dt (c decimal(38,25));
> insert into dt values (-999.61000), 
> (-999.24000);
> select cast(c as int) from dt;
> Hive will return {code}
> +---+--+
> |   c   |
> +---+--+
> | -999  |
> | -999  |
> +---+--+
> {code}
> Others will return {code}
>c
> ---
>  -1000
>   -999
> (2 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15718) Fix the NullPointer problem caused by split phase

2017-01-24 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837026#comment-15837026
 ] 

Ferdinand Xu commented on HIVE-15718:
-

Thanks Colin for the patch. Could you please add some UT with this patch?

> Fix the NullPointer problem caused by split phase
> -
>
> Key: HIVE-15718
> URL: https://issues.apache.org/jira/browse/HIVE-15718
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Attachments: HIVE-15718.001.patch
>
>
> VectorizedParquetRecordReader.initialize() will throw NullPointer Exception 
> because the input split is null. This split should be ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15718) Fix the NullPointer problem caused by split phase

2017-01-24 Thread Colin Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837021#comment-15837021
 ] 

Colin Ma commented on HIVE-15718:
-

[~Ferd], can you help to review the patch, thanks.

> Fix the NullPointer problem caused by split phase
> -
>
> Key: HIVE-15718
> URL: https://issues.apache.org/jira/browse/HIVE-15718
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Attachments: HIVE-15718.001.patch
>
>
> VectorizedParquetRecordReader.initialize() will throw NullPointer Exception 
> because the input split is null. This split should be ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15718) Fix the NullPointer problem caused by split phase

2017-01-24 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-15718:

Attachment: HIVE-15718.001.patch

> Fix the NullPointer problem caused by split phase
> -
>
> Key: HIVE-15718
> URL: https://issues.apache.org/jira/browse/HIVE-15718
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Attachments: HIVE-15718.001.patch
>
>
> VectorizedParquetRecordReader.initialize() will throw NullPointer Exception 
> because the input split is null. This split should be ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15718) Fix the NullPointer problem caused by split phase

2017-01-24 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-15718:

Status: Patch Available  (was: Open)

> Fix the NullPointer problem caused by split phase
> -
>
> Key: HIVE-15718
> URL: https://issues.apache.org/jira/browse/HIVE-15718
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Attachments: HIVE-15718.001.patch
>
>
> VectorizedParquetRecordReader.initialize() will throw NullPointer Exception 
> because the input split is null. This split should be ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15718) Fix the NullPointer problem caused by split phase

2017-01-24 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-15718:

Attachment: (was: HIVE-15718.001.patch)

> Fix the NullPointer problem caused by split phase
> -
>
> Key: HIVE-15718
> URL: https://issues.apache.org/jira/browse/HIVE-15718
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
>
> VectorizedParquetRecordReader.initialize() will throw NullPointer Exception 
> because the input split is null. This split should be ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15718) Fix the NullPointer problem caused by split phase

2017-01-24 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-15718:

Attachment: HIVE-15718.001.patch

> Fix the NullPointer problem caused by split phase
> -
>
> Key: HIVE-15718
> URL: https://issues.apache.org/jira/browse/HIVE-15718
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
>
> VectorizedParquetRecordReader.initialize() will throw NullPointer Exception 
> because the input split is null. This split should be ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837011#comment-15837011
 ] 

Hive QA commented on HIVE-15664:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849183/HIVE-15664.04.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10996 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3159/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3159/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3159/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849183 - PreCommit-HIVE-Build

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15664.04.patch, HIVE-15664.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15709) Vectorization: Fix performance issue with using LazyBinaryUtils.writeVInt and locking / thread local storage

2017-01-24 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15709:

Status: Patch Available  (was: Open)

> Vectorization: Fix performance issue with using LazyBinaryUtils.writeVInt and 
> locking / thread local storage
> 
>
> Key: HIVE-15709
> URL: https://issues.apache.org/jira/browse/HIVE-15709
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
> Attachments: HIVE-15709.01.patch
>
>
> Showed up in performance analysis.  Easy solution: allocate temp VInt and use 
> it each time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15709) Vectorization: Fix performance issue with using LazyBinaryUtils.writeVInt and locking / thread local storage

2017-01-24 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15709:

Attachment: HIVE-15709.01.patch

> Vectorization: Fix performance issue with using LazyBinaryUtils.writeVInt and 
> locking / thread local storage
> 
>
> Key: HIVE-15709
> URL: https://issues.apache.org/jira/browse/HIVE-15709
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
> Attachments: HIVE-15709.01.patch
>
>
> Showed up in performance analysis.  Easy solution: allocate temp VInt and use 
> it each time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15655) Optimizer: Allow config option to disable n-way JOIN merging

2017-01-24 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-15655:
---
Attachment: HIVE-15655.3.patch

> Optimizer: Allow config option to disable n-way JOIN merging 
> -
>
> Key: HIVE-15655
> URL: https://issues.apache.org/jira/browse/HIVE-15655
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-15655.1.patch, HIVE-15655.2.patch, 
> HIVE-15655.2.patch, HIVE-15655.3.patch
>
>
> N-way Joins in Tez produce bad runtime plans whenever they are left-outer 
> joins with map-joins.
> This is something which should have a safety setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15717) Class "org.apache.hive.beeline.Rows.Row" constructor is CPU consuming due to exception handling

2017-01-24 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-15717:
--
Attachment: Screen Shot 2017-01-24 at 3.11.09 PM.png

> Class "org.apache.hive.beeline.Rows.Row" constructor is CPU consuming due to 
> exception handling
> ---
>
> Key: HIVE-15717
> URL: https://issues.apache.org/jira/browse/HIVE-15717
> Project: Hive
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: Screen Shot 2017-01-24 at 3.11.09 PM.png, Screen Shot 
> 2017-01-24 at 3.15.14 PM.png
>
>
> The exception handling from the 3 methods calls (rowDeleted, rowDeleted and 
> rowUpdated). The implementation of these methods in 
> org.apache.hive.jdbc.HiveBaseResultSet class is just throwing 
> SQLException("Method not supported”), i.e. no real implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15655) Optimizer: Allow config option to disable n-way JOIN merging

2017-01-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836928#comment-15836928
 ] 

Hive QA commented on HIVE-15655:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849176/HIVE-15655.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 219 failed/errored test(s), 10975 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=117)

[join39.q,bucketsortoptimize_insert_7.q,vector_distinct_2.q,bucketmapjoin10.q,join11.q,union13.q,auto_sortmerge_join_16.q,windowing.q,union_remove_3.q,skewjoinopt7.q,stats7.q,annotate_stats_join.q,multi_insert_lateral_view.q,ptf_streaming.q,join_1to1.q]
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_nway_join]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.ql.TestTxnCommands.exchangePartition (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testDelete (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testDeleteIn (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testErrors (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testExplicitRollback (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testImplicitRollback (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testInsertOverwrite (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeDeleteUpdate (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeNegative (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeNegative2 (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeOnTezEdges (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeType2SCD01 (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeType2SCD02 (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeUpdateDelete (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testMultipleDelete (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testMultipleInserts (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testQuotedIdentifier (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testQuotedIdentifier2 (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testReadMyOwnInsert (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testSetClauseFakeColumn (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testSimpleAcidInsert (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testTimeOutReaper (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testUpdateDeleteOfInserts 
(batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands.testUpdateOfInserts (batchId=275)
org.apache.hadoop.hive.ql.TestTxnCommands2.testACIDwithSchemaEvolutionAndCompaction
 (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testAcidWithSchemaEvolution 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testAlterTable (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testBucketizedInputFormat 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testCompactWithDelete (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testDeleteIn (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testDynamicPartitionsMerge 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testDynamicPartitionsMerge2 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testFailHeartbeater (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testFileSystemUnCaching (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testInitiatorWithMultipleFailedCompactions
 (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testInsertOverwriteWithSelfJoin 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMerge (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMerge2 (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMerge3 (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMergeWithPredicate (batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMultiInsertStatement 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2.testNoHistory (batchId=263)

[jira] [Commented] (HIVE-15700) BytesColumnVector can get stuck trying to resize byte buffer

2017-01-24 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836894#comment-15836894
 ] 

Jason Dere commented on HIVE-15700:
---

Test failures are all previously existing failures.

> BytesColumnVector can get stuck trying to resize byte buffer
> 
>
> Key: HIVE-15700
> URL: https://issues.apache.org/jira/browse/HIVE-15700
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15700.1.patch
>
>
> While looking at HIVE-15698, hit an issue where one of the reducers was stuck 
> in the following stack trace:
> {noformat}
> Thread 12735: (state = IN_JAVA)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.increaseBufferSpace(int)
>  @bci=22, line=245 (Compiled frame; information may be imprecise)
>  - org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(int, 
> byte[], int, int) @bci=18, line=150 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,
>  int, int, boolean) @bci=536, line=442 (Compiled frame)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,
>  int) @bci=110, line=761 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(org.apache.hadoop.io.BytesWritable,
>  java.lang.Iterable, byte) @bci=184, line=444 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector() 
> @bci=119, line=388 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord() @bci=8, 
> line=239 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run() @bci=124, 
> line=319 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(java.util.Map,
>  java.util.Map) @bci=30, line=185 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(java.util.Map, 
> java.util.Map) @bci=159, line=168 (Interpreted frame)
>  - org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run() @bci=65, 
> line=370 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable$1.run() @bci=133, line=73 
> (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable$1.run() @bci=1, line=61 
> (Interpreted frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Interpreted frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1724 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() @bci=38, 
> line=61 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() @bci=1, 
> line=37 (Interpreted frame)
>  - org.apache.tez.common.CallableWithNdc.call() @bci=8, line=36 (Interpreted 
> frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {noformat}
> The reducer's input was 167 9MB binary values coming from the previous map 
> job. Per [~gopalv] the BytesColumnVector is stuck trying to reallocate/copy 
> all of these values into the same memory buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15698) Vectorization support for min/max/bloomfilter runtime filtering

2017-01-24 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15698:
--
Attachment: HIVE-15698.1.patch

Adds vectorized support for ExprNodeDynamicValue, BETWEEN() with DynamicValue, 
bloom_filter() aggregation function, and in_bloom_filter()

> Vectorization support for min/max/bloomfilter runtime filtering
> ---
>
> Key: HIVE-15698
> URL: https://issues.apache.org/jira/browse/HIVE-15698
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15698.1.patch
>
>
> Enable vectorized execution for HIVE-15269.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15698) Vectorization support for min/max/bloomfilter runtime filtering

2017-01-24 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15698:
--
Status: Patch Available  (was: Open)

> Vectorization support for min/max/bloomfilter runtime filtering
> ---
>
> Key: HIVE-15698
> URL: https://issues.apache.org/jira/browse/HIVE-15698
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15698.1.patch
>
>
> Enable vectorized execution for HIVE-15269.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15717) Class "org.apache.hive.beeline.Rows.Row" constructor is CPU consuming due to exception handling

2017-01-24 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836884#comment-15836884
 ] 

Tao Li commented on HIVE-15717:
---

I have attached the screen shot of the CPU profile. Instantiating the exception 
instances are CPU intensive which causes extra latency when beeline is used to 
display results (up to tens of seconds when displaying 1 million rows). 
According the CPU profile, this exception related cost is roughly half of total 
CPU cost from beeline, so reducing this cost should significantly improve the 
user experience.

[~thejas] Please let me know your thoughts.

> Class "org.apache.hive.beeline.Rows.Row" constructor is CPU consuming due to 
> exception handling
> ---
>
> Key: HIVE-15717
> URL: https://issues.apache.org/jira/browse/HIVE-15717
> Project: Hive
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: Screen Shot 2017-01-24 at 3.15.14 PM.png
>
>
> The exception handling from the 3 methods calls (rowDeleted, rowDeleted and 
> rowUpdated). The implementation of these methods in 
> org.apache.hive.jdbc.HiveBaseResultSet class is just throwing 
> SQLException("Method not supported”), i.e. no real implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15717) Class "org.apache.hive.beeline.Rows.Row" constructor is CPU consuming due to exception handling

2017-01-24 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-15717:
--
Attachment: Screen Shot 2017-01-24 at 3.15.14 PM.png

> Class "org.apache.hive.beeline.Rows.Row" constructor is CPU consuming due to 
> exception handling
> ---
>
> Key: HIVE-15717
> URL: https://issues.apache.org/jira/browse/HIVE-15717
> Project: Hive
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: Screen Shot 2017-01-24 at 3.15.14 PM.png
>
>
> The exception handling from the 3 methods calls (rowDeleted, rowDeleted and 
> rowUpdated). The implementation of these methods in 
> org.apache.hive.jdbc.HiveBaseResultSet class is just throwing 
> SQLException("Method not supported”), i.e. no real implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15578) Simplify IdentifiersParser

2017-01-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836871#comment-15836871
 ] 

Ashutosh Chauhan commented on HIVE-15578:
-

+1

> Simplify IdentifiersParser
> --
>
> Key: HIVE-15578
> URL: https://issues.apache.org/jira/browse/HIVE-15578
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15578.01.patch, HIVE-15578.02.patch, 
> HIVE-15578.03.patch, HIVE-15578.04.patch, HIVE-15578.05.patch
>
>
> before: 1.72M LOC in IdentifiersParser, after: 1.41M



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14949) Enforce that target:source is not 1:N

2017-01-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14949:
--
Attachment: HIVE-14949.04.patch

> Enforce that target:source is not 1:N
> -
>
> Key: HIVE-14949
> URL: https://issues.apache.org/jira/browse/HIVE-14949
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, 
> HIVE-14949.03.patch, HIVE-14949.03.patch, HIVE-14949.04.patch
>
>
> If > 1 row on source side matches the same row on target side that means that 
>  we are forced update (or delete) the same row in target more than once as 
> part of the same SQL statement.  This should raise an error per SQL Spec
> ISO/IEC 9075-2:2011(E)
> Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B
> There is no sure way to do this via static analysis of the query.
> Can we add something to ROJ operator to pay attention to ROW__ID of target 
> side row and compare it with ROW__ID of target side of previous row output?  
> If they are the same, that means > 1 source row matched.
> Or perhaps just mark each row in the hash table that it matched.  And if it 
> matches again, throw an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15578) Simplify IdentifiersParser

2017-01-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836837#comment-15836837
 ] 

Hive QA commented on HIVE-15578:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849174/HIVE-15578.05.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10983 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestJdbcWithDBTokenStore - did not produce a TEST-*.xml file (likely timed out) 
(batchId=229)
TestThriftHttpCLIServiceFeatures - did not produce a TEST-*.xml file (likely 
timed out) (batchId=211)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3157/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3157/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3157/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849174 - PreCommit-HIVE-Build

> Simplify IdentifiersParser
> --
>
> Key: HIVE-15578
> URL: https://issues.apache.org/jira/browse/HIVE-15578
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15578.01.patch, HIVE-15578.02.patch, 
> HIVE-15578.03.patch, HIVE-15578.04.patch, HIVE-15578.05.patch
>
>
> before: 1.72M LOC in IdentifiersParser, after: 1.41M



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-24 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836839#comment-15836839
 ] 

Prasanth Jayachandran commented on HIVE-15664:
--

lgtm, +1. Pending tests

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15664.04.patch, HIVE-15664.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15664:

Attachment: HIVE-15664.04.patch

Updated the config setting. Will rename the class in another change, it would 
be pain to merge

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15664.04.patch, HIVE-15664.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15546) Optimize Utilities.getInputPaths() so each listStatus of a partition is done in parallel

2017-01-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836792#comment-15836792
 ] 

Sahil Takiar commented on HIVE-15546:
-

Thanks Sergio!

> Optimize Utilities.getInputPaths() so each listStatus of a partition is done 
> in parallel
> 
>
> Key: HIVE-15546
> URL: https://issues.apache.org/jira/browse/HIVE-15546
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.2.0
>
> Attachments: HIVE-15546.1.patch, HIVE-15546.2.patch, 
> HIVE-15546.3.patch, HIVE-15546.4.patch, HIVE-15546.5.patch, HIVE-15546.6.patch
>
>
> When running on blobstores (like S3) where metadata operations (like 
> listStatus) are costly, Utilities.getInputPaths() can add significant 
> overhead when setting up the input paths for an MR / Spark / Tez job.
> The method performs a listStatus on all input paths in order to check if the 
> path is empty. If the path is empty, a dummy file is created for the given 
> partition. This is all done sequentially. This can be really slow when there 
> are a lot of empty partitions. Even when all partitions have input data, this 
> can take a long time.
> We should either:
> (1) Just remove the logic to check if each input path is empty, and handle 
> any edge cases accordingly.
> (2) Multi-thread the listStatus calls



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15546) Optimize Utilities.getInputPaths() so each listStatus of a partition is done in parallel

2017-01-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-15546:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~stakiar]. I committed this to master.

> Optimize Utilities.getInputPaths() so each listStatus of a partition is done 
> in parallel
> 
>
> Key: HIVE-15546
> URL: https://issues.apache.org/jira/browse/HIVE-15546
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.2.0
>
> Attachments: HIVE-15546.1.patch, HIVE-15546.2.patch, 
> HIVE-15546.3.patch, HIVE-15546.4.patch, HIVE-15546.5.patch, HIVE-15546.6.patch
>
>
> When running on blobstores (like S3) where metadata operations (like 
> listStatus) are costly, Utilities.getInputPaths() can add significant 
> overhead when setting up the input paths for an MR / Spark / Tez job.
> The method performs a listStatus on all input paths in order to check if the 
> path is empty. If the path is empty, a dummy file is created for the given 
> partition. This is all done sequentially. This can be really slow when there 
> are a lot of empty partitions. Even when all partitions have input data, this 
> can take a long time.
> We should either:
> (1) Just remove the logic to check if each input path is empty, and handle 
> any edge cases accordingly.
> (2) Multi-thread the listStatus calls



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15655) Optimizer: Allow config option to disable n-way JOIN merging

2017-01-24 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-15655:
---
Attachment: HIVE-15655.2.patch

Reup

> Optimizer: Allow config option to disable n-way JOIN merging 
> -
>
> Key: HIVE-15655
> URL: https://issues.apache.org/jira/browse/HIVE-15655
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-15655.1.patch, HIVE-15655.2.patch, 
> HIVE-15655.2.patch
>
>
> N-way Joins in Tez produce bad runtime plans whenever they are left-outer 
> joins with map-joins.
> This is something which should have a safety setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15578) Simplify IdentifiersParser

2017-01-24 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15578:
---
Status: Patch Available  (was: Open)

> Simplify IdentifiersParser
> --
>
> Key: HIVE-15578
> URL: https://issues.apache.org/jira/browse/HIVE-15578
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15578.01.patch, HIVE-15578.02.patch, 
> HIVE-15578.03.patch, HIVE-15578.04.patch, HIVE-15578.05.patch
>
>
> before: 1.72M LOC in IdentifiersParser, after: 1.41M



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15578) Simplify IdentifiersParser

2017-01-24 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15578:
---
Attachment: HIVE-15578.05.patch

drop 04.patch. 05.patch is a patch with minor changes to 03.patch ccing 
[~ashutoshc]

> Simplify IdentifiersParser
> --
>
> Key: HIVE-15578
> URL: https://issues.apache.org/jira/browse/HIVE-15578
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15578.01.patch, HIVE-15578.02.patch, 
> HIVE-15578.03.patch, HIVE-15578.04.patch, HIVE-15578.05.patch
>
>
> before: 1.72M LOC in IdentifiersParser, after: 1.41M



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15546) Optimize Utilities.getInputPaths() so each listStatus of a partition is done in parallel

2017-01-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836776#comment-15836776
 ] 

Sergio Peña commented on HIVE-15546:


Yes. Those tests are not related to this patch.

> Optimize Utilities.getInputPaths() so each listStatus of a partition is done 
> in parallel
> 
>
> Key: HIVE-15546
> URL: https://issues.apache.org/jira/browse/HIVE-15546
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15546.1.patch, HIVE-15546.2.patch, 
> HIVE-15546.3.patch, HIVE-15546.4.patch, HIVE-15546.5.patch, HIVE-15546.6.patch
>
>
> When running on blobstores (like S3) where metadata operations (like 
> listStatus) are costly, Utilities.getInputPaths() can add significant 
> overhead when setting up the input paths for an MR / Spark / Tez job.
> The method performs a listStatus on all input paths in order to check if the 
> path is empty. If the path is empty, a dummy file is created for the given 
> partition. This is all done sequentially. This can be really slow when there 
> are a lot of empty partitions. Even when all partitions have input data, this 
> can take a long time.
> We should either:
> (1) Just remove the logic to check if each input path is empty, and handle 
> any edge cases accordingly.
> (2) Multi-thread the listStatus calls



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15578) Simplify IdentifiersParser

2017-01-24 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15578:
---
Status: Open  (was: Patch Available)

> Simplify IdentifiersParser
> --
>
> Key: HIVE-15578
> URL: https://issues.apache.org/jira/browse/HIVE-15578
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15578.01.patch, HIVE-15578.02.patch, 
> HIVE-15578.03.patch, HIVE-15578.04.patch, HIVE-15578.05.patch
>
>
> before: 1.72M LOC in IdentifiersParser, after: 1.41M



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15546) Optimize Utilities.getInputPaths() so each listStatus of a partition is done in parallel

2017-01-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836769#comment-15836769
 ] 

Sahil Takiar commented on HIVE-15546:
-

[~spena] can this be merged?

> Optimize Utilities.getInputPaths() so each listStatus of a partition is done 
> in parallel
> 
>
> Key: HIVE-15546
> URL: https://issues.apache.org/jira/browse/HIVE-15546
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15546.1.patch, HIVE-15546.2.patch, 
> HIVE-15546.3.patch, HIVE-15546.4.patch, HIVE-15546.5.patch, HIVE-15546.6.patch
>
>
> When running on blobstores (like S3) where metadata operations (like 
> listStatus) are costly, Utilities.getInputPaths() can add significant 
> overhead when setting up the input paths for an MR / Spark / Tez job.
> The method performs a listStatus on all input paths in order to check if the 
> path is empty. If the path is empty, a dummy file is created for the given 
> partition. This is all done sequentially. This can be really slow when there 
> are a lot of empty partitions. Even when all partitions have input data, this 
> can take a long time.
> We should either:
> (1) Just remove the logic to check if each input path is empty, and handle 
> any edge cases accordingly.
> (2) Multi-thread the listStatus calls



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15667) TestBlobstoreCliDriver tests are failing due to output differences

2017-01-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836764#comment-15836764
 ] 

Hive QA commented on HIVE-15667:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849164/HIVE-15667.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10994 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucket5] 
(batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[list_bucket_dml_10]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[reduce_deduplicate]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3155/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3155/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3155/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849164 - PreCommit-HIVE-Build

> TestBlobstoreCliDriver tests are failing due to output differences
> --
>
> Key: HIVE-15667
> URL: https://issues.apache.org/jira/browse/HIVE-15667
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-15667.1.patch, HIVE-15667.2.patch
>
>
> All itests/hive-blobstore are failing and their .q.out files need to be 
> updated.
> CC: [~poeppt]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15716) Add TPCDS query14.q to HivePerfCliDriver

2017-01-24 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836705#comment-15836705
 ] 

Pengcheng Xiong commented on HIVE-15716:


most complicated query so far.

> Add TPCDS query14.q to HivePerfCliDriver
> 
>
> Key: HIVE-15716
> URL: https://issues.apache.org/jira/browse/HIVE-15716
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14949) Enforce that target:source is not 1:N

2017-01-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836670#comment-15836670
 ] 

Hive QA commented on HIVE-14949:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849162/HIVE-14949.03.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 10992 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ctas] 
(batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions]
 (batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_directory]
 (batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions]
 (batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_table]
 (batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[write_final_output_blobstore]
 (batchId=231)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_subquery] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] 
(batchId=67)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge] 
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucket5] 
(batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[list_bucket_dml_10]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[reduce_deduplicate]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeOnTezEdges (batchId=275)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3154/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3154/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3154/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 22 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849162 - PreCommit-HIVE-Build

> Enforce that target:source is not 1:N
> -
>
> Key: HIVE-14949
> URL: https://issues.apache.org/jira/browse/HIVE-14949
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, 
> HIVE-14949.03.patch, HIVE-14949.03.patch
>
>
> If > 1 row on source side matches the same row on target side that means that 
>  we are forced update (or delete) the same row in target more than once as 
> part of the same SQL statement.  This should raise an error per SQL Spec
> ISO/IEC 9075-2:2011(E)
> Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B
> There is no sure way to do this via static analysis of the query.
> Can we add something to ROJ operator to pay attention to ROW__ID of target 
> side row and compare it with ROW__ID of target side of previous row output?  
> If they are the same, that means > 1 source row matched.
> Or perhaps just mark each row in the hash table that it matched.  And if it 
> matches again, throw an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15667) TestBlobstoreCliDriver tests are failing due to output differences

2017-01-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-15667:
---
Attachment: HIVE-15667.2.patch

> TestBlobstoreCliDriver tests are failing due to output differences
> --
>
> Key: HIVE-15667
> URL: https://issues.apache.org/jira/browse/HIVE-15667
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-15667.1.patch, HIVE-15667.2.patch
>
>
> All itests/hive-blobstore are failing and their .q.out files need to be 
> updated.
> CC: [~poeppt]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15667) TestBlobstoreCliDriver tests are failing due to output differences

2017-01-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-15667:
---
Status: Patch Available  (was: Reopened)

Attaching a new patch with the tests updated to reflect the changes from 
HIVE-15591

> TestBlobstoreCliDriver tests are failing due to output differences
> --
>
> Key: HIVE-15667
> URL: https://issues.apache.org/jira/browse/HIVE-15667
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-15667.1.patch, HIVE-15667.2.patch
>
>
> All itests/hive-blobstore are failing and their .q.out files need to be 
> updated.
> CC: [~poeppt]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-15667) TestBlobstoreCliDriver tests are failing due to output differences

2017-01-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña reopened HIVE-15667:


> TestBlobstoreCliDriver tests are failing due to output differences
> --
>
> Key: HIVE-15667
> URL: https://issues.apache.org/jira/browse/HIVE-15667
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-15667.1.patch, HIVE-15667.2.patch
>
>
> All itests/hive-blobstore are failing and their .q.out files need to be 
> updated.
> CC: [~poeppt]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13014) RetryingMetaStoreClient is retrying acid related calls too aggressievley

2017-01-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13014:
--
  Resolution: Fixed
   Fix Version/s: 2.2.0
Target Version/s: 2.2.0  (was: 1.3.0, 2.2.0)
  Status: Resolved  (was: Patch Available)

committed to master 
thanks Alan for the review

> RetryingMetaStoreClient is retrying acid related calls too aggressievley
> 
>
> Key: HIVE-13014
> URL: https://issues.apache.org/jira/browse/HIVE-13014
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-13014.01.patch, HIVE-13014.02.patch, 
> HIVE-13014.03.patch, HIVE-13014.04.patch, HIVE-13014.05.patch, 
> HIVE-13014.06.patch, HIVE-13014.07.patch
>
>
> Not all metastore operations are idempotent.  For example, commit_txn() 
> consists of 
> 1. request from client to server
> 2. server action
> 3. ack to client
> If network connection is broken after (or during) 2 but before 3 happens, 
> RetryingMetastoreClient will retry the operation thus causing an attempt to 
> commit the same txn twice (sometimes in concurrently)
> The 2nd attempt is guaranteed to fail and thus return an error to the caller 
> (which doesn't know the operation is being retried), while the first attempt 
> has actually succeeded.  Thus the caller thinks commit failed and will likely 
> attempt to redo the transactions - not what we want in most cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15715) RetryingMetaStoreClient is retrying operations too aggressievley

2017-01-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15715:
--
Summary: RetryingMetaStoreClient is retrying operations too aggressievley  
(was: RetryingMetaStoreClient is retrying operations too aggresievley)

> RetryingMetaStoreClient is retrying operations too aggressievley
> 
>
> Key: HIVE-15715
> URL: https://issues.apache.org/jira/browse/HIVE-15715
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>
> RetryingMetaStoreClient is retrying operations w/o paying attention to 
> whether the operation is safe to retry.
> common/src/java/org/apache/hadoop/hive/common/classification/RetrySemantics.java
>  should be used to annotate metastore ops with their retry semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13014) RetryingMetaStoreClient is retrying acid related calls too aggressievley

2017-01-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13014:
--
Summary: RetryingMetaStoreClient is retrying acid related calls too 
aggressievley  (was: RetryingMetaStoreClient is retrying acid related calls too 
aggresievley)

> RetryingMetaStoreClient is retrying acid related calls too aggressievley
> 
>
> Key: HIVE-13014
> URL: https://issues.apache.org/jira/browse/HIVE-13014
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-13014.01.patch, HIVE-13014.02.patch, 
> HIVE-13014.03.patch, HIVE-13014.04.patch, HIVE-13014.05.patch, 
> HIVE-13014.06.patch, HIVE-13014.07.patch
>
>
> Not all metastore operations are idempotent.  For example, commit_txn() 
> consists of 
> 1. request from client to server
> 2. server action
> 3. ack to client
> If network connection is broken after (or during) 2 but before 3 happens, 
> RetryingMetastoreClient will retry the operation thus causing an attempt to 
> commit the same txn twice (sometimes in concurrently)
> The 2nd attempt is guaranteed to fail and thus return an error to the caller 
> (which doesn't know the operation is being retried), while the first attempt 
> has actually succeeded.  Thus the caller thinks commit failed and will likely 
> attempt to redo the transactions - not what we want in most cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15667) TestBlobstoreCliDriver tests are failing due to output differences

2017-01-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836625#comment-15836625
 ] 

Sergio Peña commented on HIVE-15667:


Thanks [~taoli-hwx]. The output just needs to be updated. The tests were 
working for a few tests runs before, and I see they started to fail due to 
HIVE-15591. This is the new output that differs:

{noformat}
102d101
<   column.name.delimiter ,
121d119
< column.name.delimiter ,
367d364
<   column.name.delimiter ,
388d384
< column.name.delimiter ,
642d637
<   column.name.delimiter ,
661d655
< column.name.delimiter ,
{noformat}

> TestBlobstoreCliDriver tests are failing due to output differences
> --
>
> Key: HIVE-15667
> URL: https://issues.apache.org/jira/browse/HIVE-15667
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-15667.1.patch
>
>
> All itests/hive-blobstore are failing and their .q.out files need to be 
> updated.
> CC: [~poeppt]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15698) Vectorization support for min/max/bloomfilter runtime filtering

2017-01-24 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-15698:
-

Assignee: Jason Dere

> Vectorization support for min/max/bloomfilter runtime filtering
> ---
>
> Key: HIVE-15698
> URL: https://issues.apache.org/jira/browse/HIVE-15698
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Enable vectorized execution for HIVE-15269.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15269) Dynamic Min-Max/BloomFilter runtime-filtering for Tez

2017-01-24 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836607#comment-15836607
 ] 

Jason Dere commented on HIVE-15269:
---

Note: RuntimeValuesInfo.java was missing the Apache header, I added this when I 
committed the patch.

> Dynamic Min-Max/BloomFilter runtime-filtering for Tez
> -
>
> Key: HIVE-15269
> URL: https://issues.apache.org/jira/browse/HIVE-15269
> Project: Hive
>  Issue Type: New Feature
>  Components: Tez
>Reporter: Jason Dere
>Assignee: Deepak Jaiswal
> Fix For: 2.2.0
>
> Attachments: HIVE-15269.10.patch, HIVE-15269.11.patch, 
> HIVE-15269.12.patch, HIVE-15269.13.patch, HIVE-15269.14.patch, 
> HIVE-15269.15.patch, HIVE-15269.16.patch, HIVE-15269.17.patch, 
> HIVE-15269.18.patch, HIVE-15269.19.patch, HIVE-15269.1.patch, 
> HIVE-15269.2.patch, HIVE-15269.3.patch, HIVE-15269.4.patch, 
> HIVE-15269.5.patch, HIVE-15269.6.patch, HIVE-15269.7.patch, 
> HIVE-15269.8.patch, HIVE-15269.9.patch
>
>
> If a dimension table and fact table are joined:
> {noformat}
> select *
> from store join store_sales on (store.id = store_sales.store_id)
> where store.s_store_name = 'My Store'
> {noformat}
> One optimization that can be done is to get the min/max store id values that 
> come out of the scan/filter of the store table, and send this min/max value 
> (via Tez edge) to the task which is scanning the store_sales table.
> We can add a BETWEEN(min, max) predicate to the store_sales TableScan, where 
> this predicate can be pushed down to the storage handler (for example for ORC 
> formats). Pushing a min/max predicate to the ORC reader would allow us to 
> avoid having to entire whole row groups during the table scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14949) Enforce that target:source is not 1:N

2017-01-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14949:
--
Attachment: HIVE-14949.03.patch

> Enforce that target:source is not 1:N
> -
>
> Key: HIVE-14949
> URL: https://issues.apache.org/jira/browse/HIVE-14949
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, 
> HIVE-14949.03.patch, HIVE-14949.03.patch
>
>
> If > 1 row on source side matches the same row on target side that means that 
>  we are forced update (or delete) the same row in target more than once as 
> part of the same SQL statement.  This should raise an error per SQL Spec
> ISO/IEC 9075-2:2011(E)
> Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B
> There is no sure way to do this via static analysis of the query.
> Can we add something to ROJ operator to pay attention to ROW__ID of target 
> side row and compare it with ROW__ID of target side of previous row output?  
> If they are the same, that means > 1 source row matched.
> Or perhaps just mark each row in the hash table that it matched.  And if it 
> matches again, throw an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14949) Enforce that target:source is not 1:N

2017-01-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14949:
--
Status: Patch Available  (was: Open)

> Enforce that target:source is not 1:N
> -
>
> Key: HIVE-14949
> URL: https://issues.apache.org/jira/browse/HIVE-14949
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, 
> HIVE-14949.03.patch
>
>
> If > 1 row on source side matches the same row on target side that means that 
>  we are forced update (or delete) the same row in target more than once as 
> part of the same SQL statement.  This should raise an error per SQL Spec
> ISO/IEC 9075-2:2011(E)
> Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B
> There is no sure way to do this via static analysis of the query.
> Can we add something to ROJ operator to pay attention to ROW__ID of target 
> side row and compare it with ROW__ID of target side of previous row output?  
> If they are the same, that means > 1 source row matched.
> Or perhaps just mark each row in the hash table that it matched.  And if it 
> matches again, throw an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14949) Enforce that target:source is not 1:N

2017-01-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14949:
--
Attachment: HIVE-14949.03.patch

> Enforce that target:source is not 1:N
> -
>
> Key: HIVE-14949
> URL: https://issues.apache.org/jira/browse/HIVE-14949
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, 
> HIVE-14949.03.patch
>
>
> If > 1 row on source side matches the same row on target side that means that 
>  we are forced update (or delete) the same row in target more than once as 
> part of the same SQL statement.  This should raise an error per SQL Spec
> ISO/IEC 9075-2:2011(E)
> Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B
> There is no sure way to do this via static analysis of the query.
> Can we add something to ROJ operator to pay attention to ROW__ID of target 
> side row and compare it with ROW__ID of target side of previous row output?  
> If they are the same, that means > 1 source row matched.
> Or perhaps just mark each row in the hash table that it matched.  And if it 
> matches again, throw an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14016) Vectorization: VectorGroupByRollupOperator and VectorGroupByCubeOperator

2017-01-24 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-14016:
---

Assignee: Matt McCline  (was: Gopal V)

> Vectorization: VectorGroupByRollupOperator and VectorGroupByCubeOperator
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-14016) Vectorization: VectorGroupByRollupOperator and VectorGroupByCubeOperator

2017-01-24 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-14016 started by Matt McCline.
---
> Vectorization: VectorGroupByRollupOperator and VectorGroupByCubeOperator
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15269) Dynamic Min-Max/BloomFilter runtime-filtering for Tez

2017-01-24 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15269:
--
Component/s: Tez

> Dynamic Min-Max/BloomFilter runtime-filtering for Tez
> -
>
> Key: HIVE-15269
> URL: https://issues.apache.org/jira/browse/HIVE-15269
> Project: Hive
>  Issue Type: New Feature
>  Components: Tez
>Reporter: Jason Dere
>Assignee: Deepak Jaiswal
> Fix For: 2.2.0
>
> Attachments: HIVE-15269.10.patch, HIVE-15269.11.patch, 
> HIVE-15269.12.patch, HIVE-15269.13.patch, HIVE-15269.14.patch, 
> HIVE-15269.15.patch, HIVE-15269.16.patch, HIVE-15269.17.patch, 
> HIVE-15269.18.patch, HIVE-15269.19.patch, HIVE-15269.1.patch, 
> HIVE-15269.2.patch, HIVE-15269.3.patch, HIVE-15269.4.patch, 
> HIVE-15269.5.patch, HIVE-15269.6.patch, HIVE-15269.7.patch, 
> HIVE-15269.8.patch, HIVE-15269.9.patch
>
>
> If a dimension table and fact table are joined:
> {noformat}
> select *
> from store join store_sales on (store.id = store_sales.store_id)
> where store.s_store_name = 'My Store'
> {noformat}
> One optimization that can be done is to get the min/max store id values that 
> come out of the scan/filter of the store table, and send this min/max value 
> (via Tez edge) to the task which is scanning the store_sales table.
> We can add a BETWEEN(min, max) predicate to the store_sales TableScan, where 
> this predicate can be pushed down to the storage handler (for example for ORC 
> formats). Pushing a min/max predicate to the ORC reader would allow us to 
> avoid having to entire whole row groups during the table scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15269) Dynamic Min-Max/BloomFilter runtime-filtering for Tez

2017-01-24 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15269:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks [~djaiswal].

> Dynamic Min-Max/BloomFilter runtime-filtering for Tez
> -
>
> Key: HIVE-15269
> URL: https://issues.apache.org/jira/browse/HIVE-15269
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jason Dere
>Assignee: Deepak Jaiswal
> Fix For: 2.2.0
>
> Attachments: HIVE-15269.10.patch, HIVE-15269.11.patch, 
> HIVE-15269.12.patch, HIVE-15269.13.patch, HIVE-15269.14.patch, 
> HIVE-15269.15.patch, HIVE-15269.16.patch, HIVE-15269.17.patch, 
> HIVE-15269.18.patch, HIVE-15269.19.patch, HIVE-15269.1.patch, 
> HIVE-15269.2.patch, HIVE-15269.3.patch, HIVE-15269.4.patch, 
> HIVE-15269.5.patch, HIVE-15269.6.patch, HIVE-15269.7.patch, 
> HIVE-15269.8.patch, HIVE-15269.9.patch
>
>
> If a dimension table and fact table are joined:
> {noformat}
> select *
> from store join store_sales on (store.id = store_sales.store_id)
> where store.s_store_name = 'My Store'
> {noformat}
> One optimization that can be done is to get the min/max store id values that 
> come out of the scan/filter of the store table, and send this min/max value 
> (via Tez edge) to the task which is scanning the store_sales table.
> We can add a BETWEEN(min, max) predicate to the store_sales TableScan, where 
> this predicate can be pushed down to the storage handler (for example for ORC 
> formats). Pushing a min/max predicate to the ORC reader would allow us to 
> avoid having to entire whole row groups during the table scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15708) Upgrade calcite version to 1.11

2017-01-24 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned HIVE-15708:
---

Assignee: Remus Rusanu

> Upgrade calcite version to 1.11
> ---
>
> Key: HIVE-15708
> URL: https://issues.apache.org/jira/browse/HIVE-15708
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Remus Rusanu
>
> Currently we are on 1.10 Need to upgrade calcite version to 1.11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15472) JDBC: Standalone jar is missing ZK dependencies

2017-01-24 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-15472:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Patch committed to master.
Thanks Tao!


> JDBC: Standalone jar is missing ZK dependencies
> ---
>
> Key: HIVE-15472
> URL: https://issues.apache.org/jira/browse/HIVE-15472
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Tao Li
> Fix For: 2.2.0
>
> Attachments: HIVE-15472.1.patch, HIVE-15472.2.patch
>
>
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/curator/RetryPolicy
>   at org.apache.hive.jdbc.Utils.configureConnParams(Utils.java:514)
>   at org.apache.hive.jdbc.Utils.parseURL(Utils.java:434)
>   at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:132)
>   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:247)
>   at JDBCExecutor.getConnection(JDBCExecutor.java:65)
>   at JDBCExecutor.executeStatement(JDBCExecutor.java:104)
>   at JDBCExecutor.executeSQLFile(JDBCExecutor.java:81)
>   at JDBCExecutor.main(JDBCExecutor.java:183)
> Caused by: java.lang.ClassNotFoundException: org.apache.curator.RetryPolicy
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15541) Hive OOM when ATSHook enabled and ATS goes down

2017-01-24 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15541:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Hive OOM when ATSHook enabled and ATS goes down
> ---
>
> Key: HIVE-15541
> URL: https://issues.apache.org/jira/browse/HIVE-15541
> Project: Hive
>  Issue Type: Bug
>  Components: Hooks
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 2.2.0
>
> Attachments: HIVE-15541.1.patch, HIVE-15541.2.patch, 
> HIVE-15541.3.patch, HIVE-15541.4.patch
>
>
> The ATS API used by the Hive ATSHook is a blocking call, if ATS goes down 
> this can block the ATSHook executor, while the hook continues to submit work 
> to the executor with each query.
> Over time the buildup of queued items can cause OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15472) JDBC: Standalone jar is missing ZK dependencies

2017-01-24 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836434#comment-15836434
 ] 

Tao Li commented on HIVE-15472:
---

[~thejas] Test failures are unrelated and being tracked in HIVE-15058 and 
HIVE-15667.



> JDBC: Standalone jar is missing ZK dependencies
> ---
>
> Key: HIVE-15472
> URL: https://issues.apache.org/jira/browse/HIVE-15472
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Tao Li
> Attachments: HIVE-15472.1.patch, HIVE-15472.2.patch
>
>
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/curator/RetryPolicy
>   at org.apache.hive.jdbc.Utils.configureConnParams(Utils.java:514)
>   at org.apache.hive.jdbc.Utils.parseURL(Utils.java:434)
>   at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:132)
>   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:247)
>   at JDBCExecutor.getConnection(JDBCExecutor.java:65)
>   at JDBCExecutor.executeStatement(JDBCExecutor.java:104)
>   at JDBCExecutor.executeSQLFile(JDBCExecutor.java:81)
>   at JDBCExecutor.main(JDBCExecutor.java:183)
> Caused by: java.lang.ClassNotFoundException: org.apache.curator.RetryPolicy
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15541) Hive OOM when ATSHook enabled and ATS goes down

2017-01-24 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836467#comment-15836467
 ] 

Jason Dere commented on HIVE-15541:
---

Test were broken by HIVE-15297.
About fireAndForget() and conf, that variable is not final so it does not get 
used by the Runnable. Will remove.

> Hive OOM when ATSHook enabled and ATS goes down
> ---
>
> Key: HIVE-15541
> URL: https://issues.apache.org/jira/browse/HIVE-15541
> Project: Hive
>  Issue Type: Bug
>  Components: Hooks
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15541.1.patch, HIVE-15541.2.patch, 
> HIVE-15541.3.patch, HIVE-15541.4.patch
>
>
> The ATS API used by the Hive ATSHook is a blocking call, if ATS goes down 
> this can block the ATSHook executor, while the hook continues to submit work 
> to the executor with each query.
> Over time the buildup of queued items can cause OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15667) TestBlobstoreCliDriver tests are failing due to output differences

2017-01-24 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836435#comment-15836435
 ] 

Tao Li commented on HIVE-15667:
---

[~spena] I am still seeing test failures related to TestBlobstoreCliDriver. 
Please see HIVE-15472 for details. Here are some examples below.

TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ctas] 
(batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions]
 (batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=231)

> TestBlobstoreCliDriver tests are failing due to output differences
> --
>
> Key: HIVE-15667
> URL: https://issues.apache.org/jira/browse/HIVE-15667
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-15667.1.patch
>
>
> All itests/hive-blobstore are failing and their .q.out files need to be 
> updated.
> CC: [~poeppt]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15714) backport HIVE-11985 (and HIVE-12601) to branch-1

2017-01-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836451#comment-15836451
 ] 

Sergey Shelukhin commented on HIVE-15714:
-

[~ashutoshc] can you take a look?

> backport HIVE-11985 (and HIVE-12601) to branch-1
> 
>
> Key: HIVE-15714
> URL: https://issues.apache.org/jira/browse/HIVE-15714
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15714-branch-1.patch
>
>
> Backport HIVE-11985 (and HIVE-12601) to branch-1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13014) RetryingMetaStoreClient is retrying acid related calls too aggresievley

2017-01-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836396#comment-15836396
 ] 

Alan Gates commented on HIVE-13014:
---

+1 for the patch.

On extending this to non-ACID calls, I think it would be good to file a JIRA 
for it, even if we can't get to it right away.

> RetryingMetaStoreClient is retrying acid related calls too aggresievley
> ---
>
> Key: HIVE-13014
> URL: https://issues.apache.org/jira/browse/HIVE-13014
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-13014.01.patch, HIVE-13014.02.patch, 
> HIVE-13014.03.patch, HIVE-13014.04.patch, HIVE-13014.05.patch, 
> HIVE-13014.06.patch, HIVE-13014.07.patch
>
>
> Not all metastore operations are idempotent.  For example, commit_txn() 
> consists of 
> 1. request from client to server
> 2. server action
> 3. ack to client
> If network connection is broken after (or during) 2 but before 3 happens, 
> RetryingMetastoreClient will retry the operation thus causing an attempt to 
> commit the same txn twice (sometimes in concurrently)
> The 2nd attempt is guaranteed to fail and thus return an error to the caller 
> (which doesn't know the operation is being retried), while the first attempt 
> has actually succeeded.  Thus the caller thinks commit failed and will likely 
> attempt to redo the transactions - not what we want in most cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12269) workaround for the issue with schema-based serdes storing types in metastore

2017-01-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12269:

Resolution: Invalid
Status: Resolved  (was: Patch Available)

> workaround for the issue with schema-based serdes storing types in metastore
> 
>
> Key: HIVE-12269
> URL: https://issues.apache.org/jira/browse/HIVE-12269
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12269.patch
>
>
> Since HIVE-11985 is not getting any traction I will put a hack patch that 
> will expand the current broken behavior from databases that silently truncate 
> the type name to the databases that don't. Then we'd either wait for HBase 
> metastore, for HIVE-11985 or cross fingers and hope for the best!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15580) Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-24 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836414#comment-15836414
 ] 

Xuefu Zhang commented on HIVE-15580:


Hi [~Ferd] and [~dapengsun], I'm wondering if you guys could help measure the 
performance impact of the patch here? We at Uber don't have a dedicated 
environment, so getting accurate measurement is challenging. It would be great 
if you guys can help. Based on the result, we may have some followup work to 
do. Thanks.

> Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark
> -
>
> Key: HIVE-15580
> URL: https://issues.apache.org/jira/browse/HIVE-15580
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 2.2.0
>
> Attachments: HIVE-15580.1.patch, HIVE-15580.1.patch, 
> HIVE-15580.2.patch, HIVE-15580.2.patch, HIVE-15580.3.patch, 
> HIVE-15580.4.patch, HIVE-15580.5.patch, HIVE-15580.patch
>
>
> Currently, orderBy (sortBy) and groupBy in Hive on Spark uses unbounded 
> memory. For orderBy, Hive accumulates key groups using ArrayList (described 
> in HIVE-15527). For groupBy, Hive currently uses Spark's groupByKey operator, 
> which has a shortcoming of not being able to spill to disk within a key 
> group. Thus, for large key group, memory usage is also unbounded.
> It's likely that this will impact performance. We will profile and optimize 
> afterwards. We could also make this change configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15714) backport HIVE-11985 (and HIVE-12601) to branch-1

2017-01-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15714:

Attachment: HIVE-15714-branch-1.patch

> backport HIVE-11985 (and HIVE-12601) to branch-1
> 
>
> Key: HIVE-15714
> URL: https://issues.apache.org/jira/browse/HIVE-15714
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15714-branch-1.patch
>
>
> Backport HIVE-11985 (and HIVE-12601) to branch-1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)