[jira] [Commented] (HIVE-13866) flatten callstack for directSQL errors

2016-06-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325707#comment-15325707
 ] 

Hive QA commented on HIVE-13866:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12809339/HIVE-13866.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10223 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/81/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/81/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-81/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12809339 - PreCommit-HIVE-MASTER-Build

> flatten callstack for directSQL errors
> --
>
> Key: HIVE-13866
> URL: https://issues.apache.org/jira/browse/HIVE-13866
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13866.01.patch, HIVE-13866.patch
>
>
> These errors look like final errors and confuse people. The callstack may be 
> useful if it's some datanucleus/db issue, but it needs to be flattened and 
> logged with a warning that this is not a final query error and that there's a 
> fallback



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13913) LLAP: introduce backpressure to recordreader

2016-06-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325650#comment-15325650
 ] 

Hive QA commented on HIVE-13913:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12809343/HIVE-13913.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10223 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_llap_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_llap_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_llap_uncompressed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_llap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_hybridgrace_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llap_nullscan
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_join_part_col_char
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.ql.TestTxnCommands.testSimpleAcidInsert
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/80/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/80/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-80/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12809343 - PreCommit-HIVE-MASTER-Build

> LLAP: introduce backpressure to recordreader
> 
>
> Key: HIVE-13913
> URL: https://issues.apache.org/jira/browse/HIVE-13913
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13913.01.patch, HIVE-13913.02.patch, 
> HIVE-13913.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13921) Fix spark on yarn tests for HoS

2016-06-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325631#comment-15325631
 ] 

Ashutosh Chauhan commented on HIVE-13921:
-

I see. Lets use this jira for golden file update. {{INSERT OVERWRITE 
DIRECTORY}} bug should be reproducible outside of this easily in TestCliDriver. 
Lets come up with standalone test case for it and track it in separate jira.

> Fix spark on yarn tests for HoS
> ---
>
> Key: HIVE-13921
> URL: https://issues.apache.org/jira/browse/HIVE-13921
> Project: Hive
>  Issue Type: Test
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13921.1.patch
>
>
> {{index_bitmap3}} and {{constprog_partitioner}} have been failing. Let's fix 
> them here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13970) refactor LLAPIF splits - get rid of SubmitWorkInfo

2016-06-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13970:

Assignee: Sergey Shelukhin
  Status: Patch Available  (was: Open)

> refactor LLAPIF splits - get rid of SubmitWorkInfo
> --
>
> Key: HIVE-13970
> URL: https://issues.apache.org/jira/browse/HIVE-13970
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13970.only.patch, HIVE-13970.patch
>
>
> First we build the signable vertex spec, convert it into bytes (as we 
> should), and put it inside SubmitWorkInfo. Then we serialize that into byte[] 
> and put it into LlapInputSplit. Then we serialize that to return... We should 
> get rid of one of the steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13970) refactor LLAPIF splits - get rid of SubmitWorkInfo

2016-06-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13970:

Attachment: HIVE-13970.patch
HIVE-13970.only.patch

A simple patch to remove SWI, merging it with LLAP split (plus the same w/some 
other patch for Hive QA)
I wonder if we should make it protobuf instead of writable...

[~hagleitn] [~sseth] fyi

> refactor LLAPIF splits - get rid of SubmitWorkInfo
> --
>
> Key: HIVE-13970
> URL: https://issues.apache.org/jira/browse/HIVE-13970
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
> Attachments: HIVE-13970.only.patch, HIVE-13970.patch
>
>
> First we build the signable vertex spec, convert it into bytes (as we 
> should), and put it inside SubmitWorkInfo. Then we serialize that into byte[] 
> and put it into LlapInputSplit. Then we serialize that to return... We should 
> get rid of one of the steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13990) Client should not check dfs.namenode.acls.enabled to determine if extended ACLs are supported

2016-06-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325627#comment-15325627
 ] 

Ashutosh Chauhan commented on HIVE-13990:
-

Usual practice is to first commit on master and then do backports. Would you 
like to put up a patch against master?

> Client should not check dfs.namenode.acls.enabled to determine if extended 
> ACLs are supported
> -
>
> Key: HIVE-13990
> URL: https://issues.apache.org/jira/browse/HIVE-13990
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Chris Drome
> Attachments: HIVE-13990-branch-1.patch
>
>
> dfs.namenode.acls.enabled is a server side configuration and the client 
> should not presume to know how the server is configured. Barring a method for 
> querying the NN whether ACLs are supported the client should try and catch 
> the appropriate exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems

2016-06-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325602#comment-15325602
 ] 

Sergey Shelukhin commented on HIVE-13901:
-

+1 pending tests and [~ashutoshc] feedback

> Hivemetastore add partitions can be slow depending on filesystems
> -
>
> Key: HIVE-13901
> URL: https://issues.apache.org/jira/browse/HIVE-13901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch
>
>
> Depending on FS, creating external tables & adding partitions can be 
> expensive (e.g msck which adds all partitions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13986) LLAP: kill Tez AM on token errors from plugin

2016-06-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325604#comment-15325604
 ] 

Sergey Shelukhin commented on HIVE-13986:
-

Test failures are unrelated

> LLAP: kill Tez AM on token errors from plugin
> -
>
> Key: HIVE-13986
> URL: https://issues.apache.org/jira/browse/HIVE-13986
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13986.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-13971) Address testcase failures of acid_globallimit.q and etc

2016-06-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-13971.

Resolution: Fixed

> Address testcase failures of acid_globallimit.q and etc
> ---
>
> Key: HIVE-13971
> URL: https://issues.apache.org/jira/browse/HIVE-13971
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13971) Address testcase failures of acid_globallimit.q and etc

2016-06-10 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325599#comment-15325599
 ] 

Pengcheng Xiong commented on HIVE-13971:


list_bucket_dml_12.q,list_bucket_dml_13.q

> Address testcase failures of acid_globallimit.q and etc
> ---
>
> Key: HIVE-13971
> URL: https://issues.apache.org/jira/browse/HIVE-13971
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13971) Address testcase failures of acid_globallimit.q and etc

2016-06-10 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325594#comment-15325594
 ] 

Pengcheng Xiong commented on HIVE-13971:


update test cases using java8

> Address testcase failures of acid_globallimit.q and etc
> ---
>
> Key: HIVE-13971
> URL: https://issues.apache.org/jira/browse/HIVE-13971
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13971) Address testcase failures of acid_globallimit.q and etc

2016-06-10 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325594#comment-15325594
 ] 

Pengcheng Xiong edited comment on HIVE-13971 at 6/11/16 1:14 AM:
-

update test case golden files using java8


was (Author: pxiong):
update test cases using java8

> Address testcase failures of acid_globallimit.q and etc
> ---
>
> Key: HIVE-13971
> URL: https://issues.apache.org/jira/browse/HIVE-13971
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13971) Address testcase failures of acid_globallimit.q and etc

2016-06-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13971:
---
Summary: Address testcase failures of acid_globallimit.q and etc  (was: 
Address testcase failures of acid_globallimit.q and acid_table_stats.q)

> Address testcase failures of acid_globallimit.q and etc
> ---
>
> Key: HIVE-13971
> URL: https://issues.apache.org/jira/browse/HIVE-13971
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-13971) Address testcase failures of acid_globallimit.q and acid_table_stats.q

2016-06-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reopened HIVE-13971:


> Address testcase failures of acid_globallimit.q and acid_table_stats.q
> --
>
> Key: HIVE-13971
> URL: https://issues.apache.org/jira/browse/HIVE-13971
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13827) LLAPIF: authentication on the output channel

2016-06-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13827:

Attachment: HIVE-13827.01.patch

Finished and rebased the patch

> LLAPIF: authentication on the output channel
> 
>
> Key: HIVE-13827
> URL: https://issues.apache.org/jira/browse/HIVE-13827
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13827.01.patch, HIVE-13827.patch
>
>
> The current thinking is that we'd send the token. There's no protocol on the 
> channel right now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13988) zero length file is being created for empty bucket in tez mode

2016-06-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325553#comment-15325553
 ] 

Hive QA commented on HIVE-13988:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12809325/HIVE-13988.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10223 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_SortUnionTransposeRule
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_join_transpose
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit_ppd_optimizer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_limit
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_limit
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/79/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/79/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-79/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12809325 - PreCommit-HIVE-MASTER-Build

> zero length file is being created for empty bucket in tez mode
> --
>
> Key: HIVE-13988
> URL: https://issues.apache.org/jira/browse/HIVE-13988
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13988.01.patch
>
>
> Even though bucket is empty, zero length file is being created in tez mode. 
> steps to reproduce the issue:
> {noformat}
> hive> set hive.execution.engine;
> hive.execution.engine=tez
> hive> drop table if exists emptybucket_orc;
> OK
> Time taken: 5.416 seconds
> hive> create table emptybucket_orc(age int) clustered by (age) sorted by 
> (age) into 99 buckets stored as orc;
> OK
> Time taken: 0.493 seconds
> hive> insert into table emptybucket_orc select distinct(age) from 
> studenttab10k limit 0;
> Query ID = hrt_qa_20160523231955_8b981be7-68c4-4416-8a48-5f8c7ff551c3
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1464045121842_0002)
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED  
> --
> Map 1 ..  llap SUCCEEDED  1  100  
>  0   0  
> Reducer 2 ..  llap SUCCEEDED  1  100  
>  0   0  
> Reducer 3 ..  llap SUCCEEDED  1  100  
>  0   0  
> Reducer 4 ..  llap SUCCEEDED 99 9900  
>  0   0  
> --
> VERTICES: 04/04  [==>>] 100%  ELAPSED TIME: 11.00 s   
>  
> --
> Loading data to table default.emptybucket_orc
> OK
> Time taken: 16.907 seconds
> hive> dfs -ls /apps/hive/warehouse/emptybucket_orc;
> Found 99 items
> -rwxrwxrwx   3 hrt_qa hdfs  0 2016-05-23 23:20 
> /apps/hive/warehouse/emptybucket_orc/00_0
> -rwxrwxrwx   3 hrt_qa hdfs  0 2016-05-23 23:20 
> /apps/hive/warehouse/emptybucket_orc/01_0
> ..
> {noformat}
> Expected behavior:
> In tez mode, zero length file shouldn't get created on hdfs if bucket is empty



--

[jira] [Updated] (HIVE-13984) Use multi-threaded approach to listing files for msck

2016-06-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13984:
---
Status: Patch Available  (was: Open)

address [~hsubramaniyan]'s comments.

> Use multi-threaded approach to listing files for msck
> -
>
> Key: HIVE-13984
> URL: https://issues.apache.org/jira/browse/HIVE-13984
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13984) Use multi-threaded approach to listing files for msck

2016-06-10 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325534#comment-15325534
 ] 

Pengcheng Xiong commented on HIVE-13984:


The tests results are good.

> Use multi-threaded approach to listing files for msck
> -
>
> Key: HIVE-13984
> URL: https://issues.apache.org/jira/browse/HIVE-13984
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13984) Use multi-threaded approach to listing files for msck

2016-06-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325535#comment-15325535
 ] 

Ashutosh Chauhan commented on HIVE-13984:
-

[~prasanth_j] You are familiar with multi-threaded listStatus code in ORC. This 
is also very similiar. Can you help review this?

> Use multi-threaded approach to listing files for msck
> -
>
> Key: HIVE-13984
> URL: https://issues.apache.org/jira/browse/HIVE-13984
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13984) Use multi-threaded approach to listing files for msck

2016-06-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13984:
---
Attachment: HIVE-13984.02.patch

> Use multi-threaded approach to listing files for msck
> -
>
> Key: HIVE-13984
> URL: https://issues.apache.org/jira/browse/HIVE-13984
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13984) Use multi-threaded approach to listing files for msck

2016-06-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13984:
---
Status: Open  (was: Patch Available)

> Use multi-threaded approach to listing files for msck
> -
>
> Key: HIVE-13984
> URL: https://issues.apache.org/jira/browse/HIVE-13984
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems

2016-06-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325520#comment-15325520
 ] 

Ashutosh Chauhan commented on HIVE-13901:
-

* Are any of the failures related ?
* I think we should pick different name for config like: 
hive.metastore.fshandler.threads or something similar.
* [~sershe] Can you take another look at the patch?

> Hivemetastore add partitions can be slow depending on filesystems
> -
>
> Key: HIVE-13901
> URL: https://issues.apache.org/jira/browse/HIVE-13901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch
>
>
> Depending on FS, creating external tables & adding partitions can be 
> expensive (e.g msck which adds all partitions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13771) LLAPIF: generate app ID

2016-06-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13771:

Attachment: HIVE-13771.01.patch

Rebase (noop)


> LLAPIF: generate app ID
> ---
>
> Key: HIVE-13771
> URL: https://issues.apache.org/jira/browse/HIVE-13771
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13771.01.patch, HIVE-13771.patch
>
>
> See comments in the HIVE-13675 patch. The uniqueness needs to be ensured; the 
> user may be allowed to supply a prefix (e.g. his YARN app Id, if any) for 
> ease of tracking



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13771) LLAPIF: generate app ID

2016-06-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13771:

Attachment: (was: HIVE-13771.01.wo.13731.patch)

> LLAPIF: generate app ID
> ---
>
> Key: HIVE-13771
> URL: https://issues.apache.org/jira/browse/HIVE-13771
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13771.01.patch, HIVE-13771.patch
>
>
> See comments in the HIVE-13675 patch. The uniqueness needs to be ensured; the 
> user may be allowed to supply a prefix (e.g. his YARN app Id, if any) for 
> ease of tracking



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13771) LLAPIF: generate app ID

2016-06-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13771:

Attachment: (was: HIVE-13771.01.patch)

> LLAPIF: generate app ID
> ---
>
> Key: HIVE-13771
> URL: https://issues.apache.org/jira/browse/HIVE-13771
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13771.01.patch, HIVE-13771.patch
>
>
> See comments in the HIVE-13675 patch. The uniqueness needs to be ensured; the 
> user may be allowed to supply a prefix (e.g. his YARN app Id, if any) for 
> ease of tracking



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13731) LLAP: return LLAP token with the splits

2016-06-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13731:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master.

> LLAP: return LLAP token with the splits
> ---
>
> Key: HIVE-13731
> URL: https://issues.apache.org/jira/browse/HIVE-13731
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-13731.01.patch, HIVE-13731.01.wo.13675-13443.patch, 
> HIVE-13731.02.patch, HIVE-13731.03.patch, HIVE-13731.patch, 
> HIVE-13731.wo.13444-13675-13443.patch
>
>
> Need to return the token with the splits, then take it in LLAPIF and make 
> sure it's used when talking to LLAP



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-06-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan reassigned HIVE-13995:


Assignee: Hari Sankar Sivarama Subramaniyan

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13617) LLAP: support non-vectorized execution in IO

2016-06-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13617:

Attachment: HIVE-13617.06.patch

More q file updates

> LLAP: support non-vectorized execution in IO
> 
>
> Key: HIVE-13617
> URL: https://issues.apache.org/jira/browse/HIVE-13617
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13617-wo-11417.patch, HIVE-13617-wo-11417.patch, 
> HIVE-13617.01.patch, HIVE-13617.03.patch, HIVE-13617.04.patch, 
> HIVE-13617.05.patch, HIVE-13617.06.patch, HIVE-13617.patch, HIVE-13617.patch, 
> HIVE-15396-with-oi.patch
>
>
> Two approaches - a separate decoding path, into rows instead of VRBs; or 
> decoding VRBs into rows on a higher level (the original LlapInputFormat). I 
> think the latter might be better - it's not a hugely important path, and perf 
> in non-vectorized case is not the best anyway, so it's better to make do with 
> much less new code and architectural disruption. 
> Some ORC patches in progress introduce an easy to reuse (or so I hope, 
> anyway) VRB-to-row conversion, so we should just use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325434#comment-15325434
 ] 

Sergey Shelukhin commented on HIVE-13957:
-

Test failures are unrelated.

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, 
> HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list.
> This can cause queries to produce incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13617) LLAP: support non-vectorized execution in IO

2016-06-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325427#comment-15325427
 ] 

Sergey Shelukhin commented on HIVE-13617:
-

[~spena] I have a question; I added a test (orc_llap_nonvector) to a separate 
minillap.query.files variable, and that to excludeQueryFile for standard CLI 
tests (it's ok to run that test in regular CliDriver, but it's pretty useless). 
However, the test has been run by HiveQA in the CliDriver anyway... does the 
configuration only propagate on commit? I can add the out file now and remove 
it after commit.

> LLAP: support non-vectorized execution in IO
> 
>
> Key: HIVE-13617
> URL: https://issues.apache.org/jira/browse/HIVE-13617
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13617-wo-11417.patch, HIVE-13617-wo-11417.patch, 
> HIVE-13617.01.patch, HIVE-13617.03.patch, HIVE-13617.04.patch, 
> HIVE-13617.05.patch, HIVE-13617.patch, HIVE-13617.patch, 
> HIVE-15396-with-oi.patch
>
>
> Two approaches - a separate decoding path, into rows instead of VRBs; or 
> decoding VRBs into rows on a higher level (the original LlapInputFormat). I 
> think the latter might be better - it's not a hugely important path, and perf 
> in non-vectorized case is not the best anyway, so it's better to make do with 
> much less new code and architectural disruption. 
> Some ORC patches in progress introduce an easy to reuse (or so I hope, 
> anyway) VRB-to-row conversion, so we should just use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13731) LLAP: return LLAP token with the splits

2016-06-10 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325418#comment-15325418
 ] 

Jason Dere commented on HIVE-13731:
---

+1

> LLAP: return LLAP token with the splits
> ---
>
> Key: HIVE-13731
> URL: https://issues.apache.org/jira/browse/HIVE-13731
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13731.01.patch, HIVE-13731.01.wo.13675-13443.patch, 
> HIVE-13731.02.patch, HIVE-13731.03.patch, HIVE-13731.patch, 
> HIVE-13731.wo.13444-13675-13443.patch
>
>
> Need to return the token with the splits, then take it in LLAPIF and make 
> sure it's used when talking to LLAP



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13984) Use multi-threaded approach to listing files for msck

2016-06-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325406#comment-15325406
 ] 

Hive QA commented on HIVE-13984:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12809322/HIVE-13984.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10223 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/78/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/78/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-78/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12809322 - PreCommit-HIVE-MASTER-Build

> Use multi-threaded approach to listing files for msck
> -
>
> Key: HIVE-13984
> URL: https://issues.apache.org/jira/browse/HIVE-13984
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13984.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint

2016-06-10 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325394#comment-15325394
 ] 

Vaibhav Gumashta commented on HIVE-13725:
-

[~ekoifman] Sorry should've done that before. Just made it patch available

> ACID: Streaming API should synchronize calls when multiple threads use the 
> same endpoint
> 
>
> Key: HIVE-13725
> URL: https://issues.apache.org/jira/browse/HIVE-13725
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Metastore, Transactions
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Critical
>  Labels: ACID, Streaming
> Attachments: HIVE-13725.1.patch
>
>
> Currently, the streaming endpoint creates a metastore client which gets used 
> for RPC. The client itself is not internally thread safe. Therefore, the API 
> methods should provide the relevant synchronization so that the methods can 
> be called from different threads. A sample use case is as follows:
> 1. Thread 1 creates a streaming endpoint and opens a txn batch.
> 2. Thread 2 heartbeats the txn batch.
> With the current impl, this can result in an "out of sequence response", 
> since the response of the calls in thread1 might end up going to thread2 and 
> vice-versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint

2016-06-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13725:

Status: Patch Available  (was: Open)

> ACID: Streaming API should synchronize calls when multiple threads use the 
> same endpoint
> 
>
> Key: HIVE-13725
> URL: https://issues.apache.org/jira/browse/HIVE-13725
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Metastore, Transactions
>Affects Versions: 2.0.0, 1.2.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Critical
>  Labels: ACID, Streaming
> Attachments: HIVE-13725.1.patch
>
>
> Currently, the streaming endpoint creates a metastore client which gets used 
> for RPC. The client itself is not internally thread safe. Therefore, the API 
> methods should provide the relevant synchronization so that the methods can 
> be called from different threads. A sample use case is as follows:
> 1. Thread 1 creates a streaming endpoint and opens a txn batch.
> 2. Thread 2 heartbeats the txn batch.
> With the current impl, this can result in an "out of sequence response", 
> since the response of the calls in thread1 might end up going to thread2 and 
> vice-versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint

2016-06-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325387#comment-15325387
 ] 

Eugene Koifman commented on HIVE-13725:
---

[~vgumashta] should this be Patch Available?  For some reason I don't see 
Submit Patch button in this ticket?!

> ACID: Streaming API should synchronize calls when multiple threads use the 
> same endpoint
> 
>
> Key: HIVE-13725
> URL: https://issues.apache.org/jira/browse/HIVE-13725
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Metastore, Transactions
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Critical
>  Labels: ACID, Streaming
> Attachments: HIVE-13725.1.patch
>
>
> Currently, the streaming endpoint creates a metastore client which gets used 
> for RPC. The client itself is not internally thread safe. Therefore, the API 
> methods should provide the relevant synchronization so that the methods can 
> be called from different threads. A sample use case is as follows:
> 1. Thread 1 creates a streaming endpoint and opens a txn batch.
> 2. Thread 2 heartbeats the txn batch.
> With the current impl, this can result in an "out of sequence response", 
> since the response of the calls in thread1 might end up going to thread2 and 
> vice-versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint

2016-06-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-13725:
-

Assignee: Eugene Koifman  (was: Vaibhav Gumashta)

> ACID: Streaming API should synchronize calls when multiple threads use the 
> same endpoint
> 
>
> Key: HIVE-13725
> URL: https://issues.apache.org/jira/browse/HIVE-13725
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Metastore, Transactions
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Eugene Koifman
>Priority: Critical
>  Labels: ACID, Streaming
> Attachments: HIVE-13725.1.patch
>
>
> Currently, the streaming endpoint creates a metastore client which gets used 
> for RPC. The client itself is not internally thread safe. Therefore, the API 
> methods should provide the relevant synchronization so that the methods can 
> be called from different threads. A sample use case is as follows:
> 1. Thread 1 creates a streaming endpoint and opens a txn batch.
> 2. Thread 2 heartbeats the txn batch.
> With the current impl, this can result in an "out of sequence response", 
> since the response of the calls in thread1 might end up going to thread2 and 
> vice-versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint

2016-06-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13725:
--
Assignee: Vaibhav Gumashta  (was: Eugene Koifman)

> ACID: Streaming API should synchronize calls when multiple threads use the 
> same endpoint
> 
>
> Key: HIVE-13725
> URL: https://issues.apache.org/jira/browse/HIVE-13725
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Metastore, Transactions
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Critical
>  Labels: ACID, Streaming
> Attachments: HIVE-13725.1.patch
>
>
> Currently, the streaming endpoint creates a metastore client which gets used 
> for RPC. The client itself is not internally thread safe. Therefore, the API 
> methods should provide the relevant synchronization so that the methods can 
> be called from different threads. A sample use case is as follows:
> 1. Thread 1 creates a streaming endpoint and opens a txn batch.
> 2. Thread 2 heartbeats the txn batch.
> With the current impl, this can result in an "out of sequence response", 
> since the response of the calls in thread1 might end up going to thread2 and 
> vice-versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13788) hive msck listpartitions need to make use of directSQL instead of datanucleus

2016-06-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325344#comment-15325344
 ] 

Ashutosh Chauhan commented on HIVE-13788:
-

+1

> hive msck listpartitions need to make use of directSQL instead of datanucleus
> -
>
> Key: HIVE-13788
> URL: https://issues.apache.org/jira/browse/HIVE-13788
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Minor
> Attachments: HIVE-13788.1.patch, HIVE-13788.2.patch, 
> msck_call_stack_with_fix.png, msck_stack_trace.png
>
>
> Currently, for tables having 1000s of partitions too many DB calls are made 
> via datanucleus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13788) hive msck listpartitions need to make use of directSQL instead of datanucleus

2016-06-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13788:
-
Attachment: HIVE-13788.2.patch

> hive msck listpartitions need to make use of directSQL instead of datanucleus
> -
>
> Key: HIVE-13788
> URL: https://issues.apache.org/jira/browse/HIVE-13788
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Minor
> Attachments: HIVE-13788.1.patch, HIVE-13788.2.patch, 
> msck_call_stack_with_fix.png, msck_stack_trace.png
>
>
> Currently, for tables having 1000s of partitions too many DB calls are made 
> via datanucleus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13930) upgrade Hive to latest Hadoop version

2016-06-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13930:

Attachment: HIVE-13930.01.patch

Trying again, new dependencies for some Hive classes. Spark tests still failed 
for me locally due to CNF, but that CNF was in Hadoop for a class that hasn't 
moved to a different package, so it might just be local issue.

> upgrade Hive to latest Hadoop version
> -
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13930) upgrade Hive to latest Hadoop version

2016-06-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13930:

Attachment: (was: HIVE-13930.01.patch)

> upgrade Hive to latest Hadoop version
> -
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13930) upgrade Hive to latest Hadoop version

2016-06-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13930:

Attachment: HIVE-13930.01.patch

> upgrade Hive to latest Hadoop version
> -
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13788) hive msck listpartitions need to make use of directSQL instead of datanucleus

2016-06-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13788:
-
Status: Open  (was: Patch Available)

> hive msck listpartitions need to make use of directSQL instead of datanucleus
> -
>
> Key: HIVE-13788
> URL: https://issues.apache.org/jira/browse/HIVE-13788
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Minor
> Attachments: HIVE-13788.1.patch, HIVE-13788.2.patch, 
> msck_call_stack_with_fix.png, msck_stack_trace.png
>
>
> Currently, for tables having 1000s of partitions too many DB calls are made 
> via datanucleus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13788) hive msck listpartitions need to make use of directSQL instead of datanucleus

2016-06-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13788:
-
Status: Patch Available  (was: Open)

> hive msck listpartitions need to make use of directSQL instead of datanucleus
> -
>
> Key: HIVE-13788
> URL: https://issues.apache.org/jira/browse/HIVE-13788
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Minor
> Attachments: HIVE-13788.1.patch, HIVE-13788.2.patch, 
> msck_call_stack_with_fix.png, msck_stack_trace.png
>
>
> Currently, for tables having 1000s of partitions too many DB calls are made 
> via datanucleus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13833) Add an initial delay when starting the heartbeat

2016-06-10 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13833:
-
Attachment: HIVE-13833.3.patch

> Add an initial delay when starting the heartbeat
> 
>
> Key: HIVE-13833
> URL: https://issues.apache.org/jira/browse/HIVE-13833
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Minor
> Attachments: HIVE-13833.1.patch, HIVE-13833.2.patch, 
> HIVE-13833.3.patch
>
>
> Since the scheduling of heartbeat happens immediately after lock acquisition, 
> it's unnecessary to send heartbeat at the time when locks is acquired. Add an 
> initial delay to skip this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in

2016-06-10 Thread Abdullah Yousufi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325252#comment-15325252
 ] 

Abdullah Yousufi commented on HIVE-13964:
-

I attached a new patch addressing the exit error issue. You also should not get 
the "No such file or directory" error.

> Add a parameter to beeline to allow a properties file to be passed in
> -
>
> Key: HIVE-13964
> URL: https://issues.apache.org/jira/browse/HIVE-13964
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch, 
> HIVE-13964.03.patch
>
>
> HIVE-6652 removed the ability to pass in a properties file as a beeline 
> parameter. It may be a useful feature to be able to pass the file in is a 
> parameter, such as --property-file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in

2016-06-10 Thread Abdullah Yousufi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated HIVE-13964:

Attachment: HIVE-13964.03.patch

> Add a parameter to beeline to allow a properties file to be passed in
> -
>
> Key: HIVE-13964
> URL: https://issues.apache.org/jira/browse/HIVE-13964
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch, 
> HIVE-13964.03.patch
>
>
> HIVE-6652 removed the ability to pass in a properties file as a beeline 
> parameter. It may be a useful feature to be able to pass the file in is a 
> parameter, such as --property-file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13986) LLAP: kill Tez AM on token errors from plugin

2016-06-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325233#comment-15325233
 ] 

Hive QA commented on HIVE-13986:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12809296/HIVE-13986.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10221 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/76/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/76/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-76/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12809296 - PreCommit-HIVE-MASTER-Build

> LLAP: kill Tez AM on token errors from plugin
> -
>
> Key: HIVE-13986
> URL: https://issues.apache.org/jira/browse/HIVE-13986
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13986.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13989) Extended ACLs are not handled according to specification

2016-06-10 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13989:
---
Attachment: HIVE-13989-branch-1.patch

> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989-branch-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13990) Client should not check dfs.namenode.acls.enabled to determine if extended ACLs are supported

2016-06-10 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13990:
---
Attachment: HIVE-13990-branch-1.patch

> Client should not check dfs.namenode.acls.enabled to determine if extended 
> ACLs are supported
> -
>
> Key: HIVE-13990
> URL: https://issues.apache.org/jira/browse/HIVE-13990
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Chris Drome
> Attachments: HIVE-13990-branch-1.patch
>
>
> dfs.namenode.acls.enabled is a server side configuration and the client 
> should not presume to know how the server is configured. Barring a method for 
> querying the NN whether ACLs are supported the client should try and catch 
> the appropriate exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13994) increase large varchar limits in db scripts to be db-specific

2016-06-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325208#comment-15325208
 ] 

Ashutosh Chauhan commented on HIVE-13994:
-

sounds good to me.

> increase large varchar limits in db scripts to be db-specific
> -
>
> Key: HIVE-13994
> URL: https://issues.apache.org/jira/browse/HIVE-13994
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Right now all our max varchar limits are 4k, presumably due to Oracle 
> limitations. All other dbs support larger values and/or MAX; given that we 
> moved away from schema auto-creation and towards db-specific scripts, we can 
> increase these limits per database to maximum allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-13864) Beeline ignores the command that follows a semicolon and comment

2016-06-10 Thread Reuben Kuhnert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-13864 started by Reuben Kuhnert.
-
> Beeline ignores the command that follows a semicolon and comment
> 
>
> Key: HIVE-13864
> URL: https://issues.apache.org/jira/browse/HIVE-13864
> Project: Hive
>  Issue Type: Bug
>Reporter: Muthu Manickam
>Assignee: Reuben Kuhnert
> Attachments: HIVE-13864.01.patch
>
>
> Beeline ignores the next line/command that follows a command with semicolon 
> and comments.
> Example 1:
> select *
> from table1; -- comments
> select * from table2;
> In this case, only the first command is executed.. second command "select * 
> from table2" is not executed.
> --
> Example 2:
> select *
> from table1; -- comments
> select * from table2;
> select * from table3;
> In this case, first command and third command is executed. second command 
> "select * from table2" is not executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13864) Beeline ignores the command that follows a semicolon and comment

2016-06-10 Thread Reuben Kuhnert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-13864:
--
Attachment: HIVE-13864.01.patch

> Beeline ignores the command that follows a semicolon and comment
> 
>
> Key: HIVE-13864
> URL: https://issues.apache.org/jira/browse/HIVE-13864
> Project: Hive
>  Issue Type: Bug
>Reporter: Muthu Manickam
>Assignee: Reuben Kuhnert
> Attachments: HIVE-13864.01.patch
>
>
> Beeline ignores the next line/command that follows a command with semicolon 
> and comments.
> Example 1:
> select *
> from table1; -- comments
> select * from table2;
> In this case, only the first command is executed.. second command "select * 
> from table2" is not executed.
> --
> Example 2:
> select *
> from table1; -- comments
> select * from table2;
> select * from table3;
> In this case, first command and third command is executed. second command 
> "select * from table2" is not executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13864) Beeline ignores the command that follows a semicolon and comment

2016-06-10 Thread Reuben Kuhnert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-13864:
--
Status: Patch Available  (was: In Progress)

> Beeline ignores the command that follows a semicolon and comment
> 
>
> Key: HIVE-13864
> URL: https://issues.apache.org/jira/browse/HIVE-13864
> Project: Hive
>  Issue Type: Bug
>Reporter: Muthu Manickam
>Assignee: Reuben Kuhnert
> Attachments: HIVE-13864.01.patch
>
>
> Beeline ignores the next line/command that follows a command with semicolon 
> and comments.
> Example 1:
> select *
> from table1; -- comments
> select * from table2;
> In this case, only the first command is executed.. second command "select * 
> from table2" is not executed.
> --
> Example 2:
> select *
> from table1; -- comments
> select * from table2;
> select * from table3;
> In this case, first command and third command is executed. second command 
> "select * from table2" is not executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13731) LLAP: return LLAP token with the splits

2016-06-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13731:

Attachment: HIVE-13731.03.patch

> LLAP: return LLAP token with the splits
> ---
>
> Key: HIVE-13731
> URL: https://issues.apache.org/jira/browse/HIVE-13731
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13731.01.patch, HIVE-13731.01.wo.13675-13443.patch, 
> HIVE-13731.02.patch, HIVE-13731.03.patch, HIVE-13731.patch, 
> HIVE-13731.wo.13444-13675-13443.patch
>
>
> Need to return the token with the splits, then take it in LLAPIF and make 
> sure it's used when talking to LLAP



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13731) LLAP: return LLAP token with the splits

2016-06-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325098#comment-15325098
 ] 

Sergey Shelukhin commented on HIVE-13731:
-

Test failures are known or have a namenode in safe mode

> LLAP: return LLAP token with the splits
> ---
>
> Key: HIVE-13731
> URL: https://issues.apache.org/jira/browse/HIVE-13731
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13731.01.patch, HIVE-13731.01.wo.13675-13443.patch, 
> HIVE-13731.02.patch, HIVE-13731.patch, HIVE-13731.wo.13444-13675-13443.patch
>
>
> Need to return the token with the splits, then take it in LLAPIF and make 
> sure it's used when talking to LLAP



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13994) increase large varchar limits in db scripts to be db-specific

2016-06-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325086#comment-15325086
 ] 

Sergey Shelukhin commented on HIVE-13994:
-

[~ashutoshc] [~sushanth] thoughts/objections?

> increase large varchar limits in db scripts to be db-specific
> -
>
> Key: HIVE-13994
> URL: https://issues.apache.org/jira/browse/HIVE-13994
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Right now all our max varchar limits are 4k, presumably due to Oracle 
> limitations. All other dbs support larger values and/or MAX; given that we 
> moved away from schema auto-creation and towards db-specific scripts, we can 
> increase these limits per database to maximum allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in

2016-06-10 Thread Abdullah Yousufi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325076#comment-15325076
 ] 

Abdullah Yousufi commented on HIVE-13964:
-

It shouldn't be displaying that error. Could you possibly retry and see if you 
get the error again?

> Add a parameter to beeline to allow a properties file to be passed in
> -
>
> Key: HIVE-13964
> URL: https://issues.apache.org/jira/browse/HIVE-13964
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch
>
>
> HIVE-6652 removed the ability to pass in a properties file as a beeline 
> parameter. It may be a useful feature to be able to pass the file in is a 
> parameter, such as --property-file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13392) disable speculative execution for ACID Compactor

2016-06-10 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325068#comment-15325068
 ] 

Alan Gates commented on HIVE-13392:
---

Patch seems fine, though it seems to contains some stuff unrelated to the 
stated purpose of the JIRA (e.g. moving ValidCompactorTxnList around).

+1

> disable speculative execution for ACID Compactor
> 
>
> Key: HIVE-13392
> URL: https://issues.apache.org/jira/browse/HIVE-13392
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-13392.2.patch, HIVE-13392.3.patch, 
> HIVE-13392.4.patch, HIVE-13392.patch
>
>
> https://developer.yahoo.com/hadoop/tutorial/module4.html
> Speculative execution is enabled by default. You can disable speculative 
> execution for the mappers and reducers by setting the 
> mapred.map.tasks.speculative.execution and 
> mapred.reduce.tasks.speculative.execution JobConf options to false, 
> respectively.
> CompactorMR is currently not set up to handle speculative execution and may 
> lead to something like
> {code}
> 2016-02-08 22:56:38,256 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>  Failed to CREATE_FILE 
> /apps/hive/warehouse/service_logs_v2/ds=2016-01-20/_tmp_6cf08b9f-c2e2-4182-bc81-e032801b147f/base_13858600/bucket_4
>  for DFSClient_attempt_1454628390210_27756_m_01_1_131224698_1 on 
> 172.18.129.12 because this file lease is currently owned by 
> DFSClient_attempt_1454628390210_27756_m_01_0_-2027182532_1 on 
> 172.18.129.18
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2937)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2562)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2451)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:688)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> {code}
> Short term: disable speculative execution for this job
> Longer term perhaps make each task write to dir with UUID...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications

2016-06-10 Thread Sravya Tirukkovalur (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325043#comment-15325043
 ] 

Sravya Tirukkovalur commented on HIVE-13966:


What we really need here is to bring DbNotificationListners as part of 
transaction. Without requiring all post event listeners to be part of 
transaction as the contracts can be different. Afaict, all listeners are 
synchronous. So we should think of a better name?

> DbNotificationListener: can loose DDL operation notifications
> -
>
> Key: HIVE-13966
> URL: https://issues.apache.org/jira/browse/HIVE-13966
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Nachiket Vaidya
>Priority: Critical
>
> The code for each API in HiveMetaStore.java is like this:
> 1. openTransaction()
> 2. -- operation--
> 3. commit() or rollback() based on result of the operation.
> 4. add entry to notification log (unconditionally)
> If the operation is failed (in step 2), we still add entry to notification 
> log. Found this issue in testing.
> It is still ok as this is the case of false positive.
> If the operation is successful and adding to notification log failed, the 
> user will get an MetaException. It will not rollback the operation, as it is 
> already committed. We need to handle this case so that we will not have false 
> negatives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in

2016-06-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325038#comment-15325038
 ] 

Sergio Peña commented on HIVE-13964:


we should not display the 'No such file or directory' error. I thin that if an 
unknown parameter is passed, then we can continue beeline.

> Add a parameter to beeline to allow a properties file to be passed in
> -
>
> Key: HIVE-13964
> URL: https://issues.apache.org/jira/browse/HIVE-13964
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch
>
>
> HIVE-6652 removed the ability to pass in a properties file as a beeline 
> parameter. It may be a useful feature to be able to pass the file in is a 
> parameter, such as --property-file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13833) Add an initial delay when starting the heartbeat

2016-06-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325036#comment-15325036
 ] 

Hive QA commented on HIVE-13833:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12809297/HIVE-13833.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 10223 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testHeartbeater
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testLockTimeout
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.checkExpectedLocks
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.lockConflictDbTable
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking10
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking11
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking3
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking5
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking7
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking8
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking9
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/75/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/75/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-75/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12809297 - PreCommit-HIVE-MASTER-Build

> Add an initial delay when starting the heartbeat
> 
>
> Key: HIVE-13833
> URL: https://issues.apache.org/jira/browse/HIVE-13833
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Minor
> Attachments: HIVE-13833.1.patch, HIVE-13833.2.patch
>
>
> Since the scheduling of heartbeat happens immediately after lock acquisition, 
> it's unnecessary to send heartbeat at the time when locks is acquired. Add an 
> initial delay to skip this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13264) JDBC driver makes 2 Open Session Calls for every open session

2016-06-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13264:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks [~nithinmahesh] for the work.

> JDBC driver makes 2 Open Session Calls for every open session
> -
>
> Key: HIVE-13264
> URL: https://issues.apache.org/jira/browse/HIVE-13264
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.1, 2.0.1
>Reporter: NITHIN MAHESH
>Assignee: NITHIN MAHESH
>  Labels: jdbc
> Fix For: 2.2.0
>
> Attachments: HIVE-13264.1.patch, HIVE-13264.2.patch, 
> HIVE-13264.3.patch, HIVE-13264.4.patch, HIVE-13264.5.patch, 
> HIVE-13264.6.patch, HIVE-13264.6.patch, HIVE-13264.7.patch, 
> HIVE-13264.8.patch, HIVE-13264.9.patch, HIVE-13264.patch
>
>
> When HTTP is used as the transport mode by the Hive JDBC driver, we noticed 
> that there is an additional open/close session just to validate the 
> connection. 
>  
> TCLIService.Iface client = new TCLIService.Client(new 
> TBinaryProtocol(transport));
>   TOpenSessionResp openResp = client.OpenSession(new TOpenSessionReq());
>   if (openResp != null) {
> client.CloseSession(new 
> TCloseSessionReq(openResp.getSessionHandle()));
>   }
>  
> The open session call is a costly one and should not be used to test 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13264) JDBC driver makes 2 Open Session Calls for every open session

2016-06-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13264:

Affects Version/s: 1.2.1
   2.0.1

> JDBC driver makes 2 Open Session Calls for every open session
> -
>
> Key: HIVE-13264
> URL: https://issues.apache.org/jira/browse/HIVE-13264
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.1, 2.0.1
>Reporter: NITHIN MAHESH
>Assignee: NITHIN MAHESH
>  Labels: jdbc
> Attachments: HIVE-13264.1.patch, HIVE-13264.2.patch, 
> HIVE-13264.3.patch, HIVE-13264.4.patch, HIVE-13264.5.patch, 
> HIVE-13264.6.patch, HIVE-13264.6.patch, HIVE-13264.7.patch, 
> HIVE-13264.8.patch, HIVE-13264.9.patch, HIVE-13264.patch
>
>
> When HTTP is used as the transport mode by the Hive JDBC driver, we noticed 
> that there is an additional open/close session just to validate the 
> connection. 
>  
> TCLIService.Iface client = new TCLIService.Client(new 
> TBinaryProtocol(transport));
>   TOpenSessionResp openResp = client.OpenSession(new TOpenSessionReq());
>   if (openResp != null) {
> client.CloseSession(new 
> TCloseSessionReq(openResp.getSessionHandle()));
>   }
>  
> The open session call is a costly one and should not be used to test 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in

2016-06-10 Thread Abdullah Yousufi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324967#comment-15324967
 ] 

Abdullah Yousufi edited comment on HIVE-13964 at 6/10/16 6:22 PM:
--

I resolved the first issue and now the error is set to 1 in that case.

The second issue is pretty important though, because that was the issue 
HIVE-6652 addressed, so it's not good if it still exists. However, I checked 
and it seems that's the current behavior, so my patch doesn't reintroduce that 
issue. This is what I get if I do this in the repo (or with my patch):
{code}
$ ./beeline BLA
Beeline version 2.2.0-SNAPSHOT by Apache Hive
beeline>
{code}
Note, how I don't get the 'No such file or directory' statement error. What 
behavior do we want here? It seems that the fix from HIVE-6652 was reverted at 
some point. [~xuefuz]


was (Author: ayousufi):
I resolved the first issue and now the error is set to 1 in that case.

The second issue is pretty important though, because that was the issue 
HIVE-6652 addressed, so it's not good if it still exists. However, I checked 
and it seems that's the current behavior in upstream currently, so my patch 
doesn't reintroduce that issue. This is what I get if I do this in upstream (or 
with my patch):
{code}
$ ./beeline BLA
Beeline version 2.2.0-SNAPSHOT by Apache Hive
beeline>
{code}
Note, how I don't get the 'No such file or directory' statement error. What 
behavior do we want here? It seems that the fix from HIVE-6652 was reverted at 
some point. [~xuefuz]

> Add a parameter to beeline to allow a properties file to be passed in
> -
>
> Key: HIVE-13964
> URL: https://issues.apache.org/jira/browse/HIVE-13964
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch
>
>
> HIVE-6652 removed the ability to pass in a properties file as a beeline 
> parameter. It may be a useful feature to be able to pass the file in is a 
> parameter, such as --property-file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13982) Extension to limit push down through order by & group by

2016-06-10 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13982:
---
Description: 
Pointed out by [~gopalv].

Queries which follow the format are not optimal with map-side aggregation, 
because the Map 1 does not have TopN in the reduce sink.

These queries shuffle 100% of the aggregate in cases where the reduce de-dup 
does not kick in. 

{code}
select state, city, sum(sales) from table
group by state, city
order by state, city
limit 10;
{code}

{code}
select state, city, sum(sales) from table
group by city, state
order by state, city
limit 10;
{code}

{code}
select state, city, sum(sales) from table
group by city, state
order by state desc, city
limit 10;
{code}

  was:
Pointed out by [~gopalv].

Queries which follow the format are not optimal with map-side aggregation, 
because the Map 1 does not have TopN in the reduce sink.

These queries shuffle 100% of the aggregate in cases where the reduce de-dup 
does not kick in. 

As input data grows, it falls off a cliff of performance after 4 reducers.

{code}
select state, city, sum(sales) from table
group by state, city
order by state, city
limit 10;
{code}

{code}
select state, city, sum(sales) from table
group by city, state
order by state, city
limit 10;
{code}

{code}
select state, city, sum(sales) from table
group by city, state
order by state desc, city
limit 10;
{code}


> Extension to limit push down through order by & group by
> 
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> Queries which follow the format are not optimal with map-side aggregation, 
> because the Map 1 does not have TopN in the reduce sink.
> These queries shuffle 100% of the aggregate in cases where the reduce de-dup 
> does not kick in. 
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13264) JDBC driver makes 2 Open Session Calls for every open session

2016-06-10 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325009#comment-15325009
 ] 

Vaibhav Gumashta commented on HIVE-13264:
-

Failures look unrelated. Will commit shortly.

> JDBC driver makes 2 Open Session Calls for every open session
> -
>
> Key: HIVE-13264
> URL: https://issues.apache.org/jira/browse/HIVE-13264
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: NITHIN MAHESH
>Assignee: NITHIN MAHESH
>  Labels: jdbc
> Attachments: HIVE-13264.1.patch, HIVE-13264.2.patch, 
> HIVE-13264.3.patch, HIVE-13264.4.patch, HIVE-13264.5.patch, 
> HIVE-13264.6.patch, HIVE-13264.6.patch, HIVE-13264.7.patch, 
> HIVE-13264.8.patch, HIVE-13264.9.patch, HIVE-13264.patch
>
>
> When HTTP is used as the transport mode by the Hive JDBC driver, we noticed 
> that there is an additional open/close session just to validate the 
> connection. 
>  
> TCLIService.Iface client = new TCLIService.Client(new 
> TBinaryProtocol(transport));
>   TOpenSessionResp openResp = client.OpenSession(new TOpenSessionReq());
>   if (openResp != null) {
> client.CloseSession(new 
> TCloseSessionReq(openResp.getSessionHandle()));
>   }
>  
> The open session call is a costly one and should not be used to test 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13993) Hive should provide built-in UDF that can apply another UDF to each element of an array

2016-06-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324998#comment-15324998
 ] 

Sergey Shelukhin commented on HIVE-13993:
-

Then we also need fold, and we can have map-reduce on top of Hive ;)

> Hive should provide built-in UDF that can apply another UDF to each element 
> of an array
> ---
>
> Key: HIVE-13993
> URL: https://issues.apache.org/jira/browse/HIVE-13993
> Project: Hive
>  Issue Type: New Feature
>Reporter: Anthony Hsu
>
> There is currently no simple way to take an array field and apply a UDF on 
> each element of the array, returning a new array. This is a basic use case 
> that Hive should provide a built-in UDF for. More motivation: 
> http://stackoverflow.com/questions/27722493/how-to-invoke-udf-for-each-element-in-an-array-in-hive



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13954) Parquet logs should go to STDERR

2016-06-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13954:
-
Reporter: Takahiko Saito  (was: Prasanth Jayachandran)

> Parquet logs should go to STDERR
> 
>
> Key: HIVE-13954
> URL: https://issues.apache.org/jira/browse/HIVE-13954
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Takahiko Saito
>Assignee: Prasanth Jayachandran
>  Labels: TODOC1.3, TODOC2.1
> Fix For: 1.3.0, 2.1.0, 2.2.0
>
> Attachments: HIVE-13954-branch-1.patch, HIVE-13954.1.patch
>
>
> Parquet uses java util logging. When java logging is not configured using 
> default logging.properties file, parquet's default fallback handler writes to 
> STDOUT at INFO level. Hive writes all logging to STDERR and writes only the 
> query output to STDOUT. Writing logs to STDOUT may cause issues when 
> comparing query results. 
> If we provide default logging.properties for parquet then we can configure it 
> to write to file or stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in

2016-06-10 Thread Abdullah Yousufi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324967#comment-15324967
 ] 

Abdullah Yousufi commented on HIVE-13964:
-

I resolved the first issue and now the error is set to 1 in that case.

The second issue is pretty important though, because that was the issue 
HIVE-6652 addressed, so it's not good if it still exists. However, I checked 
and it seems that's the current behavior in upstream currently, so my patch 
doesn't reintroduce that issue. This is what I get if I do this in upstream (or 
with my patch):
{code}
$ ./beeline BLA
Beeline version 2.2.0-SNAPSHOT by Apache Hive
beeline>
{code}
Note, how I don't get the 'No such file or directory' statement error. What 
behavior do we want here? It seems that the fix from HIVE-6652 was reverted at 
some point. [~xuefuz]

> Add a parameter to beeline to allow a properties file to be passed in
> -
>
> Key: HIVE-13964
> URL: https://issues.apache.org/jira/browse/HIVE-13964
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch
>
>
> HIVE-6652 removed the ability to pass in a properties file as a beeline 
> parameter. It may be a useful feature to be able to pass the file in is a 
> parameter, such as --property-file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13982) Extension to limit push down through order by & group by

2016-06-10 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13982:
---
Attachment: HIVE-13982.2.patch

> Extension to limit push down through order by & group by
> 
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> Queries which follow the format are not optimal with map-side aggregation, 
> because the Map 1 does not have TopN in the reduce sink.
> These queries shuffle 100% of the aggregate in cases where the reduce de-dup 
> does not kick in. 
> As input data grows, it falls off a cliff of performance after 4 reducers.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-13982) Extension to limit push down through order by & group by

2016-06-10 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-13982 started by Jesus Camacho Rodriguez.
--
> Extension to limit push down through order by & group by
> 
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> Queries which follow the format are not optimal with map-side aggregation, 
> because the Map 1 does not have TopN in the reduce sink.
> These queries shuffle 100% of the aggregate in cases where the reduce de-dup 
> does not kick in. 
> As input data grows, it falls off a cliff of performance after 4 reducers.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13982) Extension to limit push down through order by & group by

2016-06-10 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13982:
---
Status: Patch Available  (was: In Progress)

> Extension to limit push down through order by & group by
> 
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> Queries which follow the format are not optimal with map-side aggregation, 
> because the Map 1 does not have TopN in the reduce sink.
> These queries shuffle 100% of the aggregate in cases where the reduce de-dup 
> does not kick in. 
> As input data grows, it falls off a cliff of performance after 4 reducers.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13982) Extension to limit push down through order by & group by

2016-06-10 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13982:
---
Status: Open  (was: Patch Available)

> Extension to limit push down through order by & group by
> 
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> Queries which follow the format are not optimal with map-side aggregation, 
> because the Map 1 does not have TopN in the reduce sink.
> These queries shuffle 100% of the aggregate in cases where the reduce de-dup 
> does not kick in. 
> As input data grows, it falls off a cliff of performance after 4 reducers.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13392) disable speculative execution for ACID Compactor

2016-06-10 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324924#comment-15324924
 ] 

Wei Zheng commented on HIVE-13392:
--

+1

> disable speculative execution for ACID Compactor
> 
>
> Key: HIVE-13392
> URL: https://issues.apache.org/jira/browse/HIVE-13392
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-13392.2.patch, HIVE-13392.3.patch, 
> HIVE-13392.4.patch, HIVE-13392.patch
>
>
> https://developer.yahoo.com/hadoop/tutorial/module4.html
> Speculative execution is enabled by default. You can disable speculative 
> execution for the mappers and reducers by setting the 
> mapred.map.tasks.speculative.execution and 
> mapred.reduce.tasks.speculative.execution JobConf options to false, 
> respectively.
> CompactorMR is currently not set up to handle speculative execution and may 
> lead to something like
> {code}
> 2016-02-08 22:56:38,256 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>  Failed to CREATE_FILE 
> /apps/hive/warehouse/service_logs_v2/ds=2016-01-20/_tmp_6cf08b9f-c2e2-4182-bc81-e032801b147f/base_13858600/bucket_4
>  for DFSClient_attempt_1454628390210_27756_m_01_1_131224698_1 on 
> 172.18.129.12 because this file lease is currently owned by 
> DFSClient_attempt_1454628390210_27756_m_01_0_-2027182532_1 on 
> 172.18.129.18
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2937)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2562)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2451)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:688)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> {code}
> Short term: disable speculative execution for this job
> Longer term perhaps make each task write to dir with UUID...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13833) Add an initial delay when starting the heartbeat

2016-06-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324846#comment-15324846
 ] 

Eugene Koifman commented on HIVE-13833:
---

+1 pending tests


> Add an initial delay when starting the heartbeat
> 
>
> Key: HIVE-13833
> URL: https://issues.apache.org/jira/browse/HIVE-13833
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Minor
> Attachments: HIVE-13833.1.patch, HIVE-13833.2.patch
>
>
> Since the scheduling of heartbeat happens immediately after lock acquisition, 
> it's unnecessary to send heartbeat at the time when locks is acquired. Add an 
> initial delay to skip this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in

2016-06-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324820#comment-15324820
 ] 

Sergio Peña commented on HIVE-13964:


#2 is ok. It is displayed now. Maybe I wasn't noticing it.

#3 It does exit.

There are other problems.

If the file passed does not exist, it exits (OK), but the exit error is 0, We 
should have an error number higher than 0. Sometimes users use this number on 
their scripts to see if beeline run correctly or not.
 
{noformat}
# beeline --property-file /tmp/a
/tmp/a (No such file or directory)
Beeline version 2.2.0-SNAPSHOT by Apache Hive

# echo $?
0
{noformat}

If I pass a different argument, Beeline only displays 'No such file or 
directory', and it continues with it.

{noformat}
# beeline BLA
Beeline version 2.2.0-SNAPSHOT by Apache Hive
No such file or directory
beeline>
{noformat}

> Add a parameter to beeline to allow a properties file to be passed in
> -
>
> Key: HIVE-13964
> URL: https://issues.apache.org/jira/browse/HIVE-13964
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch
>
>
> HIVE-6652 removed the ability to pass in a properties file as a beeline 
> parameter. It may be a useful feature to be able to pass the file in is a 
> parameter, such as --property-file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13159) TxnHandler should support datanucleus.connectionPoolingType = None

2016-06-10 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-13159:
--
Status: Patch Available  (was: Open)

> TxnHandler should support datanucleus.connectionPoolingType = None
> --
>
> Key: HIVE-13159
> URL: https://issues.apache.org/jira/browse/HIVE-13159
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Sergey Shelukhin
>Assignee: Alan Gates
> Attachments: HIVE-13159.2.patch, HIVE-13159.3.patch, HIVE-13159.patch
>
>
> Right now, one has to choose bonecp or dbcp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13159) TxnHandler should support datanucleus.connectionPoolingType = None

2016-06-10 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-13159:
--
Attachment: HIVE-13159.3.patch

New version of the patch updated to match current master.

> TxnHandler should support datanucleus.connectionPoolingType = None
> --
>
> Key: HIVE-13159
> URL: https://issues.apache.org/jira/browse/HIVE-13159
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Sergey Shelukhin
>Assignee: Alan Gates
> Attachments: HIVE-13159.2.patch, HIVE-13159.3.patch, HIVE-13159.patch
>
>
> Right now, one has to choose bonecp or dbcp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint

2016-06-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13725:

Attachment: (was: HIVE-13725.1.patch)

> ACID: Streaming API should synchronize calls when multiple threads use the 
> same endpoint
> 
>
> Key: HIVE-13725
> URL: https://issues.apache.org/jira/browse/HIVE-13725
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Metastore, Transactions
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Critical
>  Labels: ACID, Streaming
> Attachments: HIVE-13725.1.patch
>
>
> Currently, the streaming endpoint creates a metastore client which gets used 
> for RPC. The client itself is not internally thread safe. Therefore, the API 
> methods should provide the relevant synchronization so that the methods can 
> be called from different threads. A sample use case is as follows:
> 1. Thread 1 creates a streaming endpoint and opens a txn batch.
> 2. Thread 2 heartbeats the txn batch.
> With the current impl, this can result in an "out of sequence response", 
> since the response of the calls in thread1 might end up going to thread2 and 
> vice-versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13968) CombineHiveInputFormat does not honor InputFormat that implements AvoidSplitCombination

2016-06-10 Thread Prasanna Rajaperumal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Rajaperumal updated HIVE-13968:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for reviewing the change [~ruili]

> CombineHiveInputFormat does not honor InputFormat that implements 
> AvoidSplitCombination
> ---
>
> Key: HIVE-13968
> URL: https://issues.apache.org/jira/browse/HIVE-13968
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanna Rajaperumal
>Assignee: Prasanna Rajaperumal
> Attachments: HIVE-13968.1.patch, HIVE-13968.2.patch, 
> HIVE-13968.3.patch
>
>
> If I have 100 path[] , the nonCombinablePaths will have only the paths 
> paths[0-9] and the rest of the paths will be in combinablePaths, even if the 
> inputformat returns false for AvoidSplitCombination.shouldSkipCombine() for 
> all the paths. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint

2016-06-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13725:

Attachment: HIVE-13725.1.patch

> ACID: Streaming API should synchronize calls when multiple threads use the 
> same endpoint
> 
>
> Key: HIVE-13725
> URL: https://issues.apache.org/jira/browse/HIVE-13725
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Metastore, Transactions
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Critical
>  Labels: ACID, Streaming
> Attachments: HIVE-13725.1.patch
>
>
> Currently, the streaming endpoint creates a metastore client which gets used 
> for RPC. The client itself is not internally thread safe. Therefore, the API 
> methods should provide the relevant synchronization so that the methods can 
> be called from different threads. A sample use case is as follows:
> 1. Thread 1 creates a streaming endpoint and opens a txn batch.
> 2. Thread 2 heartbeats the txn batch.
> With the current impl, this can result in an "out of sequence response", 
> since the response of the calls in thread1 might end up going to thread2 and 
> vice-versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint

2016-06-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13725:

Attachment: HIVE-13725.1.patch

> ACID: Streaming API should synchronize calls when multiple threads use the 
> same endpoint
> 
>
> Key: HIVE-13725
> URL: https://issues.apache.org/jira/browse/HIVE-13725
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Metastore, Transactions
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Critical
>  Labels: ACID, Streaming
> Attachments: HIVE-13725.1.patch
>
>
> Currently, the streaming endpoint creates a metastore client which gets used 
> for RPC. The client itself is not internally thread safe. Therefore, the API 
> methods should provide the relevant synchronization so that the methods can 
> be called from different threads. A sample use case is as follows:
> 1. Thread 1 creates a streaming endpoint and opens a txn batch.
> 2. Thread 2 heartbeats the txn batch.
> With the current impl, this can result in an "out of sequence response", 
> since the response of the calls in thread1 might end up going to thread2 and 
> vice-versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13982) Extension to limit push down through order by & group by

2016-06-10 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13982:
---
Description: 
Pointed out by [~gopalv].

Queries which follow the format are not optimal with map-side aggregation, 
because the Map 1 does not have TopN in the reduce sink.

These queries shuffle 100% of the aggregate in cases where the reduce de-dup 
does not kick in. 

As input data grows, it falls off a cliff of performance after 4 reducers.

{code}
select state, city, sum(sales) from table
group by state, city
order by state, city
limit 10;
{code}

{code}
select state, city, sum(sales) from table
group by city, state
order by state, city
limit 10;
{code}

{code}
select state, city, sum(sales) from table
group by city, state
order by state desc, city
limit 10;
{code}

  was:
Queries which follow the format are not optimal with map-side aggregation, 
because the Map 1 does not have TopN in the reduce sink.

These queries shuffle 100% of the aggregate in cases where the reduce de-dup 
does not kick in. 

As input data grows, it falls off a cliff of performance after 4 reducers.

{code}
select state, city, sum(sales) from table
group by state, city
order by state, city
limit 10;
{code}

{code}
select state, city, sum(sales) from table
group by city, state
order by state, city
limit 10;
{code}

{code}
select state, city, sum(sales) from table
group by city, state
order by state desc, city
limit 10;
{code}


> Extension to limit push down through order by & group by
> 
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> Queries which follow the format are not optimal with map-side aggregation, 
> because the Map 1 does not have TopN in the reduce sink.
> These queries shuffle 100% of the aggregate in cases where the reduce de-dup 
> does not kick in. 
> As input data grows, it falls off a cliff of performance after 4 reducers.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in

2016-06-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324750#comment-15324750
 ] 

Hive QA commented on HIVE-13964:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12809293/HIVE-13964.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10225 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/74/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/74/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-74/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12809293 - PreCommit-HIVE-MASTER-Build

> Add a parameter to beeline to allow a properties file to be passed in
> -
>
> Key: HIVE-13964
> URL: https://issues.apache.org/jira/browse/HIVE-13964
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch
>
>
> HIVE-6652 removed the ability to pass in a properties file as a beeline 
> parameter. It may be a useful feature to be able to pass the file in is a 
> parameter, such as --property-file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13838) Set basic stats as inaccurate for all ACID tables

2016-06-10 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324748#comment-15324748
 ] 

Pengcheng Xiong commented on HIVE-13838:


Thanks [~ekoifman], i will take a look again. Thanks for finding this!

> Set basic stats as inaccurate for all ACID tables
> -
>
> Key: HIVE-13838
> URL: https://issues.apache.org/jira/browse/HIVE-13838
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-13838.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13908) Beeline adds extra fractional digits when you insert values to table with float data type

2016-06-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324738#comment-15324738
 ] 

Sergio Peña commented on HIVE-13908:


This issue is related to a Thrift communication. 

The FLOAT data is correctly loaded on HS2, but in order to send it to beeline, 
it needs to cast it as DOUBLE because Thrift does not support FLOAT data types. 
This is where the decimals got extended. 

This is the code where it happens:
https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/thrift/ColumnBuffer.java#L368

{noformat}
case FLOAT_TYPE:
  nulls.set(size, field == null);
  doubleVars()[size] = field == null ? 0 : new Double(field.toString());
  break;
{noformat}

> Beeline adds extra fractional digits when you insert values to table with 
> float data type 
> --
>
> Key: HIVE-13908
> URL: https://issues.apache.org/jira/browse/HIVE-13908
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Takahiko Saito
>
> Via beeline, although -35664.76 is inserted,  -35664.76171875 is displayed
> {noformat}
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> drop table test;
> No rows affected (0.067 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> create table test(f float);
> No rows affected (0.248 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test 
> values(-35664.76),(29497.34);
> INFO  : Tez session hasn't been created yet. Opening session
> INFO  : Dag name: insert into table tes...35664.76),(29497.34)(Stage-1)
> INFO  :
> INFO  : Status: Running (Executing on YARN cluster with App id 
> application_1464727816747_0019)
> INFO  : Map 1: -/-
> INFO  : Map 1: 0/1
> INFO  : Map 1: 0/1
> INFO  : Map 1: 0(+1)/1
> INFO  : Map 1: 1/1
> INFO  : Loading data to table default.test from 
> hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-01_20-16-32_885_9161749848563358684-1/-ext-1
> INFO  : Table default.test stats: [numFiles=1, numRows=2, totalSize=19, 
> rawDataSize=17]
> No rows affected (31.725 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test;
> +--+--+
> |  test.f  |
> +--+--+
> | -35664.76171875  |
> | 29497.33984375   |
> +--+--+
> 2 rows selected (0.143 seconds)
> {noformat}
> The issue is not seen via Hive CLI:
> {noformat}
> hive> create table test(f float);
> OK
> Time taken: 0.32 seconds
> hive> insert into table test values(-35664.76),(29497.34);
> Query ID = hrt_qa_20160601202446_75f38c5d-f52b-45b3-b67a-8a8b0a194305
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1464727816747_0020)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  1  100   0  
>  0
> 
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 7.66 s
> 
> Loading data to table default.test
> Table default.test stats: [numFiles=1, numRows=2, totalSize=19, 
> rawDataSize=17]
> OK
> Time taken: 11.477 seconds
> hive> select * from test;
> OK
> -35664.76
> 29497.34
> Time taken: 0.144 seconds, Fetched: 2 row(s)
> {noformat}
> hdfs file shows expected value:
> {noformat}
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> dfs -cat 
> hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/00_0
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> ;
> +-+--+
> | DFS Output  |
> +-+--+
> | -35664.76   |
> | 29497.34|
> +-+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13788) hive msck listpartitions need to make use of directSQL instead of datanucleus

2016-06-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324659#comment-15324659
 ] 

Ashutosh Chauhan commented on HIVE-13788:
-

Lets keep changes minimal to msck. Can you please get rid of changes from other 
code path as we get better understanding of getPartsWithAuthInfo(). Second, to 
retrieve all partitions instead of adding new method, please instead use 
PartitionPruner::prune() method. 

> hive msck listpartitions need to make use of directSQL instead of datanucleus
> -
>
> Key: HIVE-13788
> URL: https://issues.apache.org/jira/browse/HIVE-13788
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Minor
> Attachments: HIVE-13788.1.patch, msck_call_stack_with_fix.png, 
> msck_stack_trace.png
>
>
> Currently, for tables having 1000s of partitions too many DB calls are made 
> via datanucleus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13838) Set basic stats as inaccurate for all ACID tables

2016-06-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324589#comment-15324589
 ] 

Eugene Koifman commented on HIVE-13838:
---

[~pxiong] as far as I can tell this is still not fixed.  Please see
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/70/testReport/
or
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/73/testReport/

the same set of tests keeps failing

> Set basic stats as inaccurate for all ACID tables
> -
>
> Key: HIVE-13838
> URL: https://issues.apache.org/jira/browse/HIVE-13838
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-13838.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13392) disable speculative execution for ACID Compactor

2016-06-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324582#comment-15324582
 ] 

Eugene Koifman commented on HIVE-13392:
---

all failures have age > 1

[~wzheng] or [~alangates] could you review please?

> disable speculative execution for ACID Compactor
> 
>
> Key: HIVE-13392
> URL: https://issues.apache.org/jira/browse/HIVE-13392
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-13392.2.patch, HIVE-13392.3.patch, 
> HIVE-13392.4.patch, HIVE-13392.patch
>
>
> https://developer.yahoo.com/hadoop/tutorial/module4.html
> Speculative execution is enabled by default. You can disable speculative 
> execution for the mappers and reducers by setting the 
> mapred.map.tasks.speculative.execution and 
> mapred.reduce.tasks.speculative.execution JobConf options to false, 
> respectively.
> CompactorMR is currently not set up to handle speculative execution and may 
> lead to something like
> {code}
> 2016-02-08 22:56:38,256 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>  Failed to CREATE_FILE 
> /apps/hive/warehouse/service_logs_v2/ds=2016-01-20/_tmp_6cf08b9f-c2e2-4182-bc81-e032801b147f/base_13858600/bucket_4
>  for DFSClient_attempt_1454628390210_27756_m_01_1_131224698_1 on 
> 172.18.129.12 because this file lease is currently owned by 
> DFSClient_attempt_1454628390210_27756_m_01_0_-2027182532_1 on 
> 172.18.129.18
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2937)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2562)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2451)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:688)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> {code}
> Short term: disable speculative execution for this job
> Longer term perhaps make each task write to dir with UUID...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13968) CombineHiveInputFormat does not honor InputFormat that implements AvoidSplitCombination

2016-06-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324529#comment-15324529
 ] 

Hive QA commented on HIVE-13968:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12809280/HIVE-13968.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10224 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/73/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/73/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-73/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12809280 - PreCommit-HIVE-MASTER-Build

> CombineHiveInputFormat does not honor InputFormat that implements 
> AvoidSplitCombination
> ---
>
> Key: HIVE-13968
> URL: https://issues.apache.org/jira/browse/HIVE-13968
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanna Rajaperumal
>Assignee: Prasanna Rajaperumal
> Attachments: HIVE-13968.1.patch, HIVE-13968.2.patch, 
> HIVE-13968.3.patch
>
>
> If I have 100 path[] , the nonCombinablePaths will have only the paths 
> paths[0-9] and the rest of the paths will be in combinablePaths, even if the 
> inputformat returns false for AvoidSplitCombination.shouldSkipCombine() for 
> all the paths. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-06-10 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324449#comment-15324449
 ] 

BELUGA BEHR commented on HIVE-13278:


This problem does not seem specific to Spark.  I believe it happens when Hive 
starts a map-only MapReduce job. It doesn't generate a reduce.xml as there 
isn't a reduce phase, but later it tries to put reduce.xml in distributed cache 
when submitting the job, which cause this error.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Priority: Minor
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13903) getFunctionInfo is downloading jar on every call

2016-06-10 Thread Rajat Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-13903:

Attachment: HIVE-13903.02.patch

> getFunctionInfo is downloading jar on every call
> 
>
> Key: HIVE-13903
> URL: https://issues.apache.org/jira/browse/HIVE-13903
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch, 
> HIVE-13903.02.patch
>
>
> on queries using permanent udfs, the jar file of the udf is downloaded 
> multiple times. Each call originating from Registry.getFunctionInfo. This 
> increases time for the query, especially if that query is just an explain 
> query. The jar should be downloaded once, and not downloaded again if the udf 
> class is accessible in the current thread. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13903) getFunctionInfo is downloading jar on every call

2016-06-10 Thread Rajat Khandelwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324416#comment-15324416
 ] 

Rajat Khandelwal commented on HIVE-13903:
-

Taking patch from reviewboard and attaching

> getFunctionInfo is downloading jar on every call
> 
>
> Key: HIVE-13903
> URL: https://issues.apache.org/jira/browse/HIVE-13903
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch, 
> HIVE-13903.02.patch
>
>
> on queries using permanent udfs, the jar file of the udf is downloaded 
> multiple times. Each call originating from Registry.getFunctionInfo. This 
> increases time for the query, especially if that query is just an explain 
> query. The jar should be downloaded once, and not downloaded again if the udf 
> class is accessible in the current thread. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13731) LLAP: return LLAP token with the splits

2016-06-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324376#comment-15324376
 ] 

Hive QA commented on HIVE-13731:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12809265/HIVE-13731.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10193 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-join1.q-mapjoin_decimal.q-vectorized_distinct_gby.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vectorization_13.q-schema_evol_text_nonvec_mapwork_part_all_primitive.q-bucket3.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/72/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/72/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-72/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12809265 - PreCommit-HIVE-MASTER-Build

> LLAP: return LLAP token with the splits
> ---
>
> Key: HIVE-13731
> URL: https://issues.apache.org/jira/browse/HIVE-13731
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13731.01.patch, HIVE-13731.01.wo.13675-13443.patch, 
> HIVE-13731.02.patch, HIVE-13731.patch, HIVE-13731.wo.13444-13675-13443.patch
>
>
> Need to return the token with the splits, then take it in LLAPIF and make 
> sure it's used when talking to LLAP



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13968) CombineHiveInputFormat does not honor InputFormat that implements AvoidSplitCombination

2016-06-10 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324340#comment-15324340
 ] 

Rui Li commented on HIVE-13968:
---

+1

> CombineHiveInputFormat does not honor InputFormat that implements 
> AvoidSplitCombination
> ---
>
> Key: HIVE-13968
> URL: https://issues.apache.org/jira/browse/HIVE-13968
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanna Rajaperumal
>Assignee: Prasanna Rajaperumal
> Attachments: HIVE-13968.1.patch, HIVE-13968.2.patch, 
> HIVE-13968.3.patch
>
>
> If I have 100 path[] , the nonCombinablePaths will have only the paths 
> paths[0-9] and the rest of the paths will be in combinablePaths, even if the 
> inputformat returns false for AvoidSplitCombination.shouldSkipCombine() for 
> all the paths. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13903) getFunctionInfo is downloading jar on every call

2016-06-10 Thread Rajat Khandelwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324336#comment-15324336
 ] 

Rajat Khandelwal commented on HIVE-13903:
-

Created https://reviews.apache.org/r/48544/

> getFunctionInfo is downloading jar on every call
> 
>
> Key: HIVE-13903
> URL: https://issues.apache.org/jira/browse/HIVE-13903
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch
>
>
> on queries using permanent udfs, the jar file of the udf is downloaded 
> multiple times. Each call originating from Registry.getFunctionInfo. This 
> increases time for the query, especially if that query is just an explain 
> query. The jar should be downloaded once, and not downloaded again if the udf 
> class is accessible in the current thread. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13903) getFunctionInfo is downloading jar on every call

2016-06-10 Thread Rajat Khandelwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324338#comment-15324338
 ] 

Rajat Khandelwal commented on HIVE-13903:
-

Taking patch from reviewboard and attaching

> getFunctionInfo is downloading jar on every call
> 
>
> Key: HIVE-13903
> URL: https://issues.apache.org/jira/browse/HIVE-13903
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch
>
>
> on queries using permanent udfs, the jar file of the udf is downloaded 
> multiple times. Each call originating from Registry.getFunctionInfo. This 
> increases time for the query, especially if that query is just an explain 
> query. The jar should be downloaded once, and not downloaded again if the udf 
> class is accessible in the current thread. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13903) getFunctionInfo is downloading jar on every call

2016-06-10 Thread Rajat Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-13903:

Status: Patch Available  (was: In Progress)

> getFunctionInfo is downloading jar on every call
> 
>
> Key: HIVE-13903
> URL: https://issues.apache.org/jira/browse/HIVE-13903
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch
>
>
> on queries using permanent udfs, the jar file of the udf is downloaded 
> multiple times. Each call originating from Registry.getFunctionInfo. This 
> increases time for the query, especially if that query is just an explain 
> query. The jar should be downloaded once, and not downloaded again if the udf 
> class is accessible in the current thread. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13903) getFunctionInfo is downloading jar on every call

2016-06-10 Thread Rajat Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-13903:

Attachment: HIVE-13903.01.patch

> getFunctionInfo is downloading jar on every call
> 
>
> Key: HIVE-13903
> URL: https://issues.apache.org/jira/browse/HIVE-13903
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch
>
>
> on queries using permanent udfs, the jar file of the udf is downloaded 
> multiple times. Each call originating from Registry.getFunctionInfo. This 
> increases time for the query, especially if that query is just an explain 
> query. The jar should be downloaded once, and not downloaded again if the udf 
> class is accessible in the current thread. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >