date:20171009

[jira] [Updated] (HIVE-17752) dynamic_semijoin_reduction_sw test should use partition with some rows.

2017-10-09 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17752:
--
Summary: dynamic_semijoin_reduction_sw test should use partition with some 
rows.  (was: dynamic_semijoin_reduction_sw test should use partition with some 
rows to get some non-zero results.)

> dynamic_semijoin_reduction_sw test should use partition with some rows.
> ---
>
> Key: HIVE-17752
> URL: https://issues.apache.org/jira/browse/HIVE-17752
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> dynamic_semijoin_reduction_sw test should use partition with some rows to get 
> some non-zero results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17752) dynamic_semijoin_reduction_sw test should use partition with some rows.

2017-10-09 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17752:
--
Description: dynamic_semijoin_reduction_sw test should use partition with 
some rows.  (was: dynamic_semijoin_reduction_sw test should use partition with 
some rows to get some non-zero results.)

> dynamic_semijoin_reduction_sw test should use partition with some rows.
> ---
>
> Key: HIVE-17752
> URL: https://issues.apache.org/jira/browse/HIVE-17752
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> dynamic_semijoin_reduction_sw test should use partition with some rows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Work started] (HIVE-17752) dynamic_semijoin_reduction_sw test should use partition with some rows to get some non-zero results.

2017-10-09 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17752 started by Deepak Jaiswal.
-
> dynamic_semijoin_reduction_sw test should use partition with some rows to get 
> some non-zero results.
> 
>
> Key: HIVE-17752
> URL: https://issues.apache.org/jira/browse/HIVE-17752
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> dynamic_semijoin_reduction_sw test should use partition with some rows to get 
> some non-zero results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17752) dynamic_semijoin_reduction_sw test should use partition with some rows to get some non-zero results.

2017-10-09 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17752:
--
Status: Patch Available  (was: In Progress)

> dynamic_semijoin_reduction_sw test should use partition with some rows to get 
> some non-zero results.
> 
>
> Key: HIVE-17752
> URL: https://issues.apache.org/jira/browse/HIVE-17752
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> dynamic_semijoin_reduction_sw test should use partition with some rows to get 
> some non-zero results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17752) dynamic_semijoin_reduction_sw test should use partition with some rows to get some non-zero results.

2017-10-09 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-17752:
-


> dynamic_semijoin_reduction_sw test should use partition with some rows to get 
> some non-zero results.
> 
>
> Key: HIVE-17752
> URL: https://issues.apache.org/jira/browse/HIVE-17752
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> dynamic_semijoin_reduction_sw test should use partition with some rows to get 
> some non-zero results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17631) upgrade orc to 1.4.0

2017-10-09 Thread Saijin Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang updated HIVE-17631:

Status: Open  (was: Patch Available)

> upgrade orc to 1.4.0
> 
>
> Key: HIVE-17631
> URL: https://issues.apache.org/jira/browse/HIVE-17631
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
>
> It seems like orc 1.4.0 has a latest and stable version:
> https://orc.apache.org/docs/releases.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17631) upgrade orc to 1.4.0

2017-10-09 Thread Saijin Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang updated HIVE-17631:

Attachment: (was: HIVE-17631.1.patch)

> upgrade orc to 1.4.0
> 
>
> Key: HIVE-17631
> URL: https://issues.apache.org/jira/browse/HIVE-17631
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
>
> It seems like orc 1.4.0 has a latest and stable version:
> https://orc.apache.org/docs/releases.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17725) Fix misnamed tests which are not run during precommit runs.

2017-10-09 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198188#comment-16198188
 ] 

Zoltan Haindrich commented on HIVE-17725:
-

I see :)
+1

> Fix misnamed tests which are not run during precommit runs. 
> 
>
> Key: HIVE-17725
> URL: https://issues.apache.org/jira/browse/HIVE-17725
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Zoltan Haindrich
>Assignee: Daniel Voros
> Attachments: HIVE-17725.1.patch, HIVE-17725.2.patch
>
>
> I've just seen a testfailure for jdk9; but the test is not even executed 
> during precommit runs:
> {{TaskTrackerTest}}
> I think the test classes name should match the {{\*\*/Test\*}} to be executed 
> during test runs...however there seems to be quite a few... {{find . -name 
> '*Test.java'}} returns a few abstracts ; but there are real test ; which are 
> just misnamed...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17634) Estimate the column stats even not retrieve columns from metastore(hive.stats.fetch.column.stats as false)

2017-10-09 Thread liyunzhang_intel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-17634:

Description:  In the statistics 
estimation([StatsRulesProcFactory|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L134]),
 we do not estimate the column stats once we set hive.stats.fetch.column.stats 
as false.Suggest to estimate the data size by column type when 
{{hive.stats.fetch.column.stats}} as false like HIVE-17634.1.patch does.  (was: 
 In the statistics estimation([StatsRulesProcFactory|), we do not estimate the 
column stats once we set hive.stats.fetch.column.stats as false.Suggest to 
estimate the data size by column type when {{hive.stats.fetch.column.stats}} as 
false like HIVE-17634.1.patch does.)

> Estimate the column stats even not retrieve columns from 
> metastore(hive.stats.fetch.column.stats as false)
> --
>
> Key: HIVE-17634
> URL: https://issues.apache.org/jira/browse/HIVE-17634
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-17634.1.patch, HIVE-17634.patch
>
>
>  In the statistics 
> estimation([StatsRulesProcFactory|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L134]),
>  we do not estimate the column stats once we set 
> hive.stats.fetch.column.stats as false.Suggest to estimate the data size by 
> column type when {{hive.stats.fetch.column.stats}} as false like 
> HIVE-17634.1.patch does.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17634) Estimate the column stats even not retrieve columns from metastore(hive.stats.fetch.column.stats as false)

2017-10-09 Thread liyunzhang_intel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-17634:

Description:  In the statistics estimation([StatsRulesProcFactory|), we do 
not estimate the column stats once we set hive.stats.fetch.column.stats as 
false.Suggest to estimate the data size by column type when 
{{hive.stats.fetch.column.stats}} as false like HIVE-17634.1.patch does.  (was: 
in 
[RelOptHiveTable#updateColStats|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L309],
 we set {{fetchColStats}},{{fetchPartStats}} as true when call 
{{StatsUtils.collectStatistics}}
{code}

   if (!hiveTblMetadata.isPartitioned()) {
// 2.1 Handle the case for unpartitioned table.
try {
  Statistics stats = StatsUtils.collectStatistics(hiveConf, null,
  hiveTblMetadata, hiveNonPartitionCols, 
nonPartColNamesThatRqrStats,
  colStatsCached, nonPartColNamesThatRqrStats, true, true);
  ...
{code}

This will cause querying columns statistic from metastore even we set  
{{hive.stats.fetch.column.stats}} and {{hive.stats.fetch.partition.stats}} as 
false in HiveConf.  If we these two properties as false, we can not any column 
statistics from metastore.  Suggest to set the properties from HiveConf. )

> Estimate the column stats even not retrieve columns from 
> metastore(hive.stats.fetch.column.stats as false)
> --
>
> Key: HIVE-17634
> URL: https://issues.apache.org/jira/browse/HIVE-17634
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-17634.1.patch, HIVE-17634.patch
>
>
>  In the statistics estimation([StatsRulesProcFactory|), we do not estimate 
> the column stats once we set hive.stats.fetch.column.stats as false.Suggest 
> to estimate the data size by column type when 
> {{hive.stats.fetch.column.stats}} as false like HIVE-17634.1.patch does.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17620) Use the default MR scratch directory (HDFS) in the only case when hive.blobstore.optimizations.enabled=true AND isFinalJob=true

2017-10-09 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198174#comment-16198174
 ] 

Rajesh Balamohan commented on HIVE-17620:
-

LGTM. +1

> Use the default MR scratch directory (HDFS) in the only case when 
> hive.blobstore.optimizations.enabled=true AND isFinalJob=true
> ---
>
> Key: HIVE-17620
> URL: https://issues.apache.org/jira/browse/HIVE-17620
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0, 2.3.0, 3.0.0
>Reporter: Gergely Hajós
>Assignee: Gergely Hajós
> Attachments: HIVE-17620.1.patch
>
>
> Introduced in HIVE-15121. Context::getTempDirForPath tries to use temporary 
> MR directory instead of blobstore directory in three cases:
> {code}
> if (!isFinalJob && BlobStorageUtils.areOptimizationsEnabled(conf)) {
> {code}
> while the only valid case for using a temporary MR dir is when optimization 
> is enabled and the job is not final:
> {code}
> if (BlobStorageUtils.areOptimizationsEnabled(conf) && !isFinalJob) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-15212) merge branch into master

2017-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198172#comment-16198172
 ] 

Hive QA commented on HIVE-15212:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12891175/HIVE-15212.20.patch

{color:green}SUCCESS:{color} +1 due to 20 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11205 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_all] (batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_exim] (batchId=87)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[spark_local_queries] 
(batchId=64)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[mm_bucket_convert]
 (batchId=91)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=239)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7204/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7204/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7204/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12891175 - PreCommit-HIVE-Build

> merge branch into master
> 
>
> Key: HIVE-15212
> URL: https://issues.apache.org/jira/browse/HIVE-15212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15212.01.patch, HIVE-15212.02.patch, 
> HIVE-15212.03.patch, HIVE-15212.04.patch, HIVE-15212.05.patch, 
> HIVE-15212.06.patch, HIVE-15212.07.patch, HIVE-15212.08.patch, 
> HIVE-15212.09.patch, HIVE-15212.10.patch, HIVE-15212.11.patch, 
> HIVE-15212.12.patch, HIVE-15212.12.patch, HIVE-15212.13.patch, 
> HIVE-15212.13.patch, HIVE-15212.14.patch, HIVE-15212.15.patch, 
> HIVE-15212.16.patch, HIVE-15212.17.patch, HIVE-15212.18.patch, 
> HIVE-15212.19.patch, HIVE-15212.20.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-10-09 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Attachment: HIVE-11548.6-branch-2.2.patch

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch, 
> HIVE-11548.6-branch-2.2.patch, HIVE-11548.6-branch-2.patch, HIVE-11548.6.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-10-09 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Attachment: HIVE-11548.6-branch-2.patch

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch, 
> HIVE-11548.6-branch-2.patch, HIVE-11548.6.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17751) Separate HMS Client and HMS server into separate sub-modules

2017-10-09 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17751:
---
Description: external applications which are interfacing with HMS should 
ideally only include HMSClient library instead of one big library containing 
server as well. We should ideally have a thin client library so that cross 
version support for external applications is easier. We should sub-divide the 
standalone module into possibly 3 modules (one for common classes, one for 
client classes and one for server) or 2 sub-modules (one for client and one for 
server) so that we can generate separate jars for HMS client and server.  (was: 
For external applications which are interfacing with HMS should ideally only 
include HMSClient library instead of one big library containing server as well. 
We should ideally have a thin client library so that cross version support for 
external applications is easier. We should sub-divide the standalone module 
into possibly 3 modules (one for common classes, one for client classes and one 
for server) or 2 sub-modules (one for client and one for server) so that we can 
generate separate jars for HMS client and server.)

> Separate HMS Client and HMS server into separate sub-modules
> 
>
> Key: HIVE-17751
> URL: https://issues.apache.org/jira/browse/HIVE-17751
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> external applications which are interfacing with HMS should ideally only 
> include HMSClient library instead of one big library containing server as 
> well. We should ideally have a thin client library so that cross version 
> support for external applications is easier. We should sub-divide the 
> standalone module into possibly 3 modules (one for common classes, one for 
> client classes and one for server) or 2 sub-modules (one for client and one 
> for server) so that we can generate separate jars for HMS client and server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17751) Separate HMS Client and HMS server into separate sub-modules

2017-10-09 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-17751:
--


> Separate HMS Client and HMS server into separate sub-modules
> 
>
> Key: HIVE-17751
> URL: https://issues.apache.org/jira/browse/HIVE-17751
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> For external applications which are interfacing with HMS should ideally only 
> include HMSClient library instead of one big library containing server as 
> well. We should ideally have a thin client library so that cross version 
> support for external applications is easier. We should sub-divide the 
> standalone module into possibly 3 modules (one for common classes, one for 
> client classes and one for server) or 2 sub-modules (one for client and one 
> for server) so that we can generate separate jars for HMS client and server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198122#comment-16198122
 ] 

Hive QA commented on HIVE-11548:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12891166/HIVE-11548.6.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11197 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[spark_local_queries] 
(batchId=64)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=240)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7203/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7203/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7203/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12891166 - PreCommit-HIVE-Build

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch, HIVE-11548.6.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17659) get_token thrift call fails for DBTokenStore in remote HMS mode

2017-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198054#comment-16198054
 ] 

Hive QA commented on HIVE-17659:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12891153/HIVE-17659.02.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11205 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[spark_local_queries] 
(batchId=64)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=239)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate (batchId=184)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteSmallint 
(batchId=184)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7202/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7202/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7202/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12891153 - PreCommit-HIVE-Build

> get_token thrift call fails for DBTokenStore in remote HMS mode
> ---
>
> Key: HIVE-17659
> URL: https://issues.apache.org/jira/browse/HIVE-17659
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17659.01-branch-2.patch, HIVE-17659.01.patch, 
> HIVE-17659.02-branch-2.patch, HIVE-17659.02.patch
>
>
> The {{get_token(String tokenIdentifier)}} HMS thrift API fails when HMS is 
> deployed in remote mode and when there is no token found with that 
> tokenIndentifier. This could happen when an application calls a 
> renewDelegationToken on an expired/cancelled delegation token. The issue is 
> that get_token tries to return a null result values which cannot be done in 
> Thrift. The API call errors out with 
> {{org.apache.thrift.TApplicationException unknown result}} exception which is 
> uncaught and HS2 thrift server closes the client transport. So no further 
> calls from that connection can be accepted unless client reconnects to HS2 
> again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-10-09 Thread Ke Jia (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198042#comment-16198042
 ] 

Ke Jia commented on HIVE-17139:
---

[~Ferd], thanks for your review.  The failed tests seem not patch related.

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.10.patch, 
> HIVE-17139.11.patch, HIVE-17139.12.patch, HIVE-17139.13.patch, 
> HIVE-17139.13.patch, HIVE-17139.14.patch, HIVE-17139.15.patch, 
> HIVE-17139.16.patch, HIVE-17139.17.patch, HIVE-17139.18.patch, 
> HIVE-17139.18.patch, HIVE-17139.19.patch, HIVE-17139.2.patch, 
> HIVE-17139.20.patch, HIVE-17139.3.patch, HIVE-17139.4.patch, 
> HIVE-17139.5.patch, HIVE-17139.6.patch, HIVE-17139.7.patch, 
> HIVE-17139.8.patch, HIVE-17139.9.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17371) Move tokenstores to metastore module

2017-10-09 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198026#comment-16198026
 ] 

Vihang Karajgaonkar commented on HIVE-17371:


Hi [~thejas] Can you please take a look at this patch? Thanks!

> Move tokenstores to metastore module
> 
>
> Key: HIVE-17371
> URL: https://issues.apache.org/jira/browse/HIVE-17371
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17371.01.patch, HIVE-17371.02.patch, 
> HIVE-17371.03.patch, HIVE-17371.04.patch
>
>
> The {{getTokenStore}} method will not work for the {{DBTokenStore}} and 
> {{ZKTokenStore}} since they implement 
> {{org.apache.hadoop.hive.thrift.DelegationTokenStore}} instead of  
> {{org.apache.hadoop.hive.metastore.security.DelegationTokenStore}}
> {code}
> private DelegationTokenStore getTokenStore(Configuration conf) throws 
> IOException {
> String tokenStoreClassName =
> MetastoreConf.getVar(conf, 
> MetastoreConf.ConfVars.DELEGATION_TOKEN_STORE_CLS, "");
> // The second half of this if is to catch cases where users are passing 
> in a HiveConf for
> // configuration.  It will have set the default value of
> // "hive.cluster.delegation.token.store .class" to
> // "org.apache.hadoop.hive.thrift.MemoryTokenStore" as part of its 
> construction.  But this is
> // the hive-shims version of the memory store.  We want to convert this 
> to our default value.
> if (StringUtils.isBlank(tokenStoreClassName) ||
> 
> "org.apache.hadoop.hive.thrift.MemoryTokenStore".equals(tokenStoreClassName)) 
> {
>   return new MemoryTokenStore();
> }
> try {
>   Class storeClass =
>   
> Class.forName(tokenStoreClassName).asSubclass(DelegationTokenStore.class);
>   return ReflectionUtils.newInstance(storeClass, conf);
> } catch (ClassNotFoundException e) {
>   throw new IOException("Error initializing delegation token store: " + 
> tokenStoreClassName, e);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16677) CTAS with no data fails in Druid

2017-10-09 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198014#comment-16198014
 ] 

Jesus Camacho Rodriguez commented on HIVE-16677:


Cc [~ashutoshc], [~bslim]

> CTAS with no data fails in Druid
> 
>
> Key: HIVE-16677
> URL: https://issues.apache.org/jira/browse/HIVE-16677
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16677.patch
>
>
> If we create a table in Druid using a CTAS statement and the query executed 
> to create the table produces no data, we fail with the following exception:
> {noformat}
> druid.DruidStorageHandler: Exception while commit
> java.io.FileNotFoundException: File 
> /tmp/workingDirectory/.staging-jcamachorodriguez_20170515053123_835c394b-2157-4f6b-bfed-a2753acd568e/segmentsDescriptorDir
>  does not exist.
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16677) CTAS with no data fails in Druid

2017-10-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16677:
---
Attachment: HIVE-16677.patch

> CTAS with no data fails in Druid
> 
>
> Key: HIVE-16677
> URL: https://issues.apache.org/jira/browse/HIVE-16677
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16677.patch
>
>
> If we create a table in Druid using a CTAS statement and the query executed 
> to create the table produces no data, we fail with the following exception:
> {noformat}
> druid.DruidStorageHandler: Exception while commit
> java.io.FileNotFoundException: File 
> /tmp/workingDirectory/.staging-jcamachorodriguez_20170515053123_835c394b-2157-4f6b-bfed-a2753acd568e/segmentsDescriptorDir
>  does not exist.
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16677) CTAS with no data fails in Druid

2017-10-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16677:
---
Status: Patch Available  (was: In Progress)

> CTAS with no data fails in Druid
> 
>
> Key: HIVE-16677
> URL: https://issues.apache.org/jira/browse/HIVE-16677
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16677.patch
>
>
> If we create a table in Druid using a CTAS statement and the query executed 
> to create the table produces no data, we fail with the following exception:
> {noformat}
> druid.DruidStorageHandler: Exception while commit
> java.io.FileNotFoundException: File 
> /tmp/workingDirectory/.staging-jcamachorodriguez_20170515053123_835c394b-2157-4f6b-bfed-a2753acd568e/segmentsDescriptorDir
>  does not exist.
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17747) HMS DropTableMessage should include the full table object

2017-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198006#comment-16198006
 ] 

Hive QA commented on HIVE-17747:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12891147/HIVE-17747.0.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11193 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[spark_local_queries] 
(batchId=64)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=239)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7201/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7201/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7201/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12891147 - PreCommit-HIVE-Build

> HMS DropTableMessage should include the full table object
> -
>
> Key: HIVE-17747
> URL: https://issues.apache.org/jira/browse/HIVE-17747
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 2.3.0
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-17747.0.patch
>
>
> I have a notification log follower use-case which requires accessing the 
> parameters of dropped tables, so it would be useful if the {{DROP_TABLE}} 
> events in the notification log included the full table object, as the create 
> and alter events do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17750) add a flag to automatically create most tables as MM

2017-10-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-17750:
---


> add a flag to automatically create most tables as MM 
> -
>
> Key: HIVE-17750
> URL: https://issues.apache.org/jira/browse/HIVE-17750
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> After merge we are going to do another round of gap identification... similar 
> to HIVE-14990.
> However the approach used there is a huge PITA. It'd be much better to make 
> tables MM by default at create time, not pretend they are MM at check time, 
> from the perspective of spurious error elimination.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17749) Multiple class have missed the ASF header

2017-10-09 Thread Saijin Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang updated HIVE-17749:

Assignee: Saijin Huang
  Status: Patch Available  (was: Open)

> Multiple class have missed the ASF header
> -
>
> Key: HIVE-17749
> URL: https://issues.apache.org/jira/browse/HIVE-17749
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
>Priority: Minor
> Attachments: HIVE-17749.1.patch
>
>
> Multiple class have missed the ASF header



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17749) Multiple class have missed the ASF header

2017-10-09 Thread Saijin Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang updated HIVE-17749:

Attachment: HIVE-17749.1.patch

> Multiple class have missed the ASF header
> -
>
> Key: HIVE-17749
> URL: https://issues.apache.org/jira/browse/HIVE-17749
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Priority: Minor
> Attachments: HIVE-17749.1.patch
>
>
> Multiple class have missed the ASF header



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17749) Multiple class have missed the ASF header

2017-10-09 Thread Saijin Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang updated HIVE-17749:

Description: Multiple class have missed the ASF header

> Multiple class have missed the ASF header
> -
>
> Key: HIVE-17749
> URL: https://issues.apache.org/jira/browse/HIVE-17749
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Priority: Minor
> Attachments: HIVE-17749.1.patch
>
>
> Multiple class have missed the ASF header



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17691) Miscellaneous List

2017-10-09 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17691:
--
Description: 
# DDLSemanticAnalyzer.alterTableOutput is unused
# DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
TransactionManager
# DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
place..
# FileSinkOperator has multiple places that look like _conf.getWriteType() == 
AcidUtils.Operation.NOT_ACID || conf.isMmTable()_ - what is the writeType for 
MM tables?  Seems that Wei opted for "work.getLoadTableWork().getWriteType() != 
AcidUtils.Operation.NOT_ACID && !tbd.isMmTable()" to mean MM, e.g. 
MoveTask.handleStaticParts() call to Hive.loadPartition()
# HiveConf.HIVE_TXN_OPERATIONAL_PROPERTIES - the doc/explanation there is 
obsolete
# Compactor Initiator likely doesn't work for MM tables.  It's triggered by 
into in TXN_COMPONENTS/COMPLETED_TXN_COMPONENTS.  MM tables don't write to 
either because DbTxnManager.acquireLocks() does  
_compBuilder.setIsAcid(AcidUtils.isFullAcidTable(t));_ i.e. it treats MM as 
non-acid tables
# In general integration with full Acid seems confused wrt to MM and seems to 
treat MM as special table type rather than subtype of Acid table.  (mostly, but 
not always).
# LoadSemanticAnalyzer.analyzeInternal(ASTNode) sets statementId to 0 rather 
than from TM
# ImportCommitTask - doesn't currently do anything.  It used to commit mmID.  
Need to verify we properly commit the txn in the Driver
# As far as I can tell all the mm_*.q tests run on TestCliDriver which means 
MR.  This doesn't exercise some code specifically for dealing with writes from 
Union All queries (CTAS, Insert into).  On MR this requires 
"hive.optimize.union.remove=true" (false by default)
# Remove MoveWork().setNoop(boolean) and usages per todo in 
_GenMapRedUtils.createMRWorkForMergingFiles (FileSinkOperator fsInput, Path 
finalName, DependencyCollectionTask dependencyTask,   List 
mvTasks, HiveConf conf,   Task currTask)_
# PartialScanWork.tblDesc - unused
# _Partition.getBucketPath(int bucketNum)_ has "// Note: this makes assumptions 
that won't work with MM tables, unions, etc.".  File Jira?
# _PartitionDesc.LOG_ is unused
# 




  was:
# DDLSemanticAnalyzer.alterTableOutput is unused
# DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
TransactionManager
# DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
place..
# FileSinkOperator has multiple places that look like _conf.getWriteType() == 
AcidUtils.Operation.NOT_ACID || conf.isMmTable()_ - what is the writeType for 
MM tables?  Seems that Wei opted for "work.getLoadTableWork().getWriteType() != 
AcidUtils.Operation.NOT_ACID && !tbd.isMmTable()" to mean MM, e.g. 
MoveTask.handleStaticParts() call to Hive.loadPartition()
# HiveConf.HIVE_TXN_OPERATIONAL_PROPERTIES - the doc/explanation there is 
obsolete
# Compactor Initiator likely doesn't work for MM tables.  It's triggered by 
into in TXN_COMPONENTS/COMPLETED_TXN_COMPONENTS.  MM tables don't write to 
either because DbTxnManager.acquireLocks() does  
_compBuilder.setIsAcid(AcidUtils.isFullAcidTable(t));_ i.e. it treats MM as 
non-acid tables
# In general integration with full Acid seems confused wrt to MM and seems to 
treat MM as special table type rather than subtype of Acid table.  (mostly, but 
not always).
# LoadSemanticAnalyzer.analyzeInternal(ASTNode) sets statementId to 0 rather 
than from TM
# ImportCommitTask - doesn't currently do anything.  It used to commit mmID.  
Need to verify we properly commit the txn in the Driver
# As far as I can tell all the mm_*.q tests run on TestCliDriver which means 
MR.  This doesn't exercise some code specifically for dealing with writes from 
Union All queries (CTAS, Insert into).  On MR this requires 
"hive.optimize.union.remove=true" (false by default)

Already done 
# Remove MoveWork().setNoop(boolean) and usages per todo in 
_GenMapRedUtils.createMRWorkForMergingFiles (FileSinkOperator fsInput, Path 
finalName, DependencyCollectionTask dependencyTask,   List 
mvTasks, HiveConf conf,   Task currTask)





> Miscellaneous List
> --
>
> Key: HIVE-17691
> URL: https://issues.apache.org/jira/browse/HIVE-17691
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> # DDLSemanticAnalyzer.alterTableOutput is unused
> # DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
> TransactionManager
> # DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
> crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
> place..
> #

[jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-10-09 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197967#comment-16197967
 ] 

Ferdinand Xu commented on HIVE-17139:
-

[~Jk_Self], can you please take a look at the failed qtest cases?

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.10.patch, 
> HIVE-17139.11.patch, HIVE-17139.12.patch, HIVE-17139.13.patch, 
> HIVE-17139.13.patch, HIVE-17139.14.patch, HIVE-17139.15.patch, 
> HIVE-17139.16.patch, HIVE-17139.17.patch, HIVE-17139.18.patch, 
> HIVE-17139.18.patch, HIVE-17139.19.patch, HIVE-17139.2.patch, 
> HIVE-17139.20.patch, HIVE-17139.3.patch, HIVE-17139.4.patch, 
> HIVE-17139.5.patch, HIVE-17139.6.patch, HIVE-17139.7.patch, 
> HIVE-17139.8.patch, HIVE-17139.9.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15212) merge branch into master

2017-10-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15212:

Attachment: HIVE-15212.20.patch

Another merge and one more jira

> merge branch into master
> 
>
> Key: HIVE-15212
> URL: https://issues.apache.org/jira/browse/HIVE-15212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15212.01.patch, HIVE-15212.02.patch, 
> HIVE-15212.03.patch, HIVE-15212.04.patch, HIVE-15212.05.patch, 
> HIVE-15212.06.patch, HIVE-15212.07.patch, HIVE-15212.08.patch, 
> HIVE-15212.09.patch, HIVE-15212.10.patch, HIVE-15212.11.patch, 
> HIVE-15212.12.patch, HIVE-15212.12.patch, HIVE-15212.13.patch, 
> HIVE-15212.13.patch, HIVE-15212.14.patch, HIVE-15212.15.patch, 
> HIVE-15212.16.patch, HIVE-15212.17.patch, HIVE-15212.18.patch, 
> HIVE-15212.19.patch, HIVE-15212.20.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17691) Miscellaneous List

2017-10-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197960#comment-16197960
 ] 

Sergey Shelukhin commented on HIVE-17691:
-

Most of these look like they could be fixed after the merge

> Miscellaneous List
> --
>
> Key: HIVE-17691
> URL: https://issues.apache.org/jira/browse/HIVE-17691
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> # DDLSemanticAnalyzer.alterTableOutput is unused
> # DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
> TransactionManager
> # DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
> crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
> place..
> # FileSinkOperator has multiple places that look like _conf.getWriteType() == 
> AcidUtils.Operation.NOT_ACID || conf.isMmTable()_ - what is the writeType for 
> MM tables?  Seems that Wei opted for "work.getLoadTableWork().getWriteType() 
> != AcidUtils.Operation.NOT_ACID && !tbd.isMmTable()" to mean MM, e.g. 
> MoveTask.handleStaticParts() call to Hive.loadPartition()
> # HiveConf.HIVE_TXN_OPERATIONAL_PROPERTIES - the doc/explanation there is 
> obsolete
> # Compactor Initiator likely doesn't work for MM tables.  It's triggered by 
> into in TXN_COMPONENTS/COMPLETED_TXN_COMPONENTS.  MM tables don't write to 
> either because DbTxnManager.acquireLocks() does  
> _compBuilder.setIsAcid(AcidUtils.isFullAcidTable(t));_ i.e. it treats MM as 
> non-acid tables
> # In general integration with full Acid seems confused wrt to MM and seems to 
> treat MM as special table type rather than subtype of Acid table.  (mostly, 
> but not always).
> # LoadSemanticAnalyzer.analyzeInternal(ASTNode) sets statementId to 0 rather 
> than from TM
> # ImportCommitTask - doesn't currently do anything.  It used to commit mmID.  
> Need to verify we properly commit the txn in the Driver
> # As far as I can tell all the mm_*.q tests run on TestCliDriver which means 
> MR.  This doesn't exercise some code specifically for dealing with writes 
> from Union All queries (CTAS, Insert into).  On MR this requires 
> "hive.optimize.union.remove=true" (false by default)
> Already done 
> # Remove MoveWork().setNoop(boolean) and usages per todo in 
> _GenMapRedUtils.createMRWorkForMergingFiles (FileSinkOperator fsInput, Path 
> finalName, DependencyCollectionTask dependencyTask,   List 
> mvTasks, HiveConf conf,   Task currTask)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17691) Miscellaneous List

2017-10-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17691:

Description: 
# DDLSemanticAnalyzer.alterTableOutput is unused
# DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
TransactionManager
# DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
place..
# FileSinkOperator has multiple places that look like _conf.getWriteType() == 
AcidUtils.Operation.NOT_ACID || conf.isMmTable()_ - what is the writeType for 
MM tables?  Seems that Wei opted for "work.getLoadTableWork().getWriteType() != 
AcidUtils.Operation.NOT_ACID && !tbd.isMmTable()" to mean MM, e.g. 
MoveTask.handleStaticParts() call to Hive.loadPartition()
# HiveConf.HIVE_TXN_OPERATIONAL_PROPERTIES - the doc/explanation there is 
obsolete
# Compactor Initiator likely doesn't work for MM tables.  It's triggered by 
into in TXN_COMPONENTS/COMPLETED_TXN_COMPONENTS.  MM tables don't write to 
either because DbTxnManager.acquireLocks() does  
_compBuilder.setIsAcid(AcidUtils.isFullAcidTable(t));_ i.e. it treats MM as 
non-acid tables
# In general integration with full Acid seems confused wrt to MM and seems to 
treat MM as special table type rather than subtype of Acid table.  (mostly, but 
not always).
# LoadSemanticAnalyzer.analyzeInternal(ASTNode) sets statementId to 0 rather 
than from TM
# ImportCommitTask - doesn't currently do anything.  It used to commit mmID.  
Need to verify we properly commit the txn in the Driver
# As far as I can tell all the mm_*.q tests run on TestCliDriver which means 
MR.  This doesn't exercise some code specifically for dealing with writes from 
Union All queries (CTAS, Insert into).  On MR this requires 
"hive.optimize.union.remove=true" (false by default)

Already done 
# Remove MoveWork().setNoop(boolean) and usages per todo in 
_GenMapRedUtils.createMRWorkForMergingFiles (FileSinkOperator fsInput, Path 
finalName, DependencyCollectionTask dependencyTask,   List 
mvTasks, HiveConf conf,   Task currTask)




  was:
# DDLSemanticAnalyzer.alterTableOutput is unused
# DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
TransactionManager
# DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
place..
# FileSinkOperator has multiple places that look like _conf.getWriteType() == 
AcidUtils.Operation.NOT_ACID || conf.isMmTable()_ - what is the writeType for 
MM tables?  Seems that Wei opted for "work.getLoadTableWork().getWriteType() != 
AcidUtils.Operation.NOT_ACID && !tbd.isMmTable()" to mean MM, e.g. 
MoveTask.handleStaticParts() call to Hive.loadPartition()
# HiveConf.HIVE_TXN_OPERATIONAL_PROPERTIES - the doc/explanation there is 
obsolete
# Compactor Initiator likely doesn't work for MM tables.  It's triggered by 
into in TXN_COMPONENTS/COMPLETED_TXN_COMPONENTS.  MM tables don't write to 
either because DbTxnManager.acquireLocks() does  
_compBuilder.setIsAcid(AcidUtils.isFullAcidTable(t));_ i.e. it treats MM as 
non-acid tables
# In general integration with full Acid seems confused wrt to MM and seems to 
treat MM as special table type rather than subtype of Acid table.  (mostly, but 
not always).
# LoadSemanticAnalyzer.analyzeInternal(ASTNode) sets statementId to 0 rather 
than from TM
# ImportCommitTask - doesn't currently do anything.  It used to commit mmID.  
Need to verify we properly commit the txn in the Driver
# As far as I can tell all the mm_*.q tests run on TestCliDriver which means 
MR.  This doesn't exercise some code specifically for dealing with writes from 
Union All queries (CTAS, Insert into).  On MR this requires 
"hive.optimize.union.remove=true" (false by default)
# Remove MoveWork().setNoop(boolean) and usages per todo in 
_GenMapRedUtils.createMRWorkForMergingFiles (FileSinkOperator fsInput, Path 
finalName, DependencyCollectionTask dependencyTask,   List 
mvTasks, HiveConf conf,   Task currTask)_





> Miscellaneous List
> --
>
> Key: HIVE-17691
> URL: https://issues.apache.org/jira/browse/HIVE-17691
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> # DDLSemanticAnalyzer.alterTableOutput is unused
> # DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
> TransactionManager
> # DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
> crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
> place..
> # FileSinkOperator has multiple places that look like _conf.getWriteType() == 
> AcidUtils.Operation.NOT_ACID || conf.isMmTable()_ - what is the writeType for 
> MM tables?  Seems that Wei opted for

[jira] [Resolved] (HIVE-17693) remove the logic to convert from MM to plain hive table

2017-10-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-17693.
-
   Resolution: Fixed
 Assignee: Sergey Shelukhin
Fix Version/s: hive-14535

> remove the logic to convert from MM to plain hive table
> ---
>
> Key: HIVE-17693
> URL: https://issues.apache.org/jira/browse/HIVE-17693
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
>
> I would argue this is not necessary and just adds more complexity.
> doing "insert ... select..." seems much safer (not as fast obviously)
> _DDLTask.generateRemoveMmTasks()_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17693) remove the logic to convert from MM to plain hive table

2017-10-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17693:

Summary: remove the logic to convert from MM to plain hive table  (was: 
Conversion from MM to plain hive table)

> remove the logic to convert from MM to plain hive table
> ---
>
> Key: HIVE-17693
> URL: https://issues.apache.org/jira/browse/HIVE-17693
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> I would argue this is not necessary and just adds more complexity.
> doing "insert ... select..." seems much safer (not as fast obviously)
> _DDLTask.generateRemoveMmTasks()_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17693) Conversion from MM to plain hive table

2017-10-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197945#comment-16197945
 ] 

Sergey Shelukhin commented on HIVE-17693:
-

Actually it does handle buckets, but we can remove it anyway.

> Conversion from MM to plain hive table
> --
>
> Key: HIVE-17693
> URL: https://issues.apache.org/jira/browse/HIVE-17693
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> I would argue this is not necessary and just adds more complexity.
> doing "insert ... select..." seems much safer (not as fast obviously)
> _DDLTask.generateRemoveMmTasks()_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17647) DDLTask.generateAddMmTasks(Table tbl) should not start transactions

2017-10-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197935#comment-16197935
 ] 

Sergey Shelukhin commented on HIVE-17647:
-

Can be handled after merge. Yes, lock is probably needed...

> DDLTask.generateAddMmTasks(Table tbl) should not start transactions
> ---
>
> Key: HIVE-17647
> URL: https://issues.apache.org/jira/browse/HIVE-17647
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> This method has 
> {noformat}
>   if (txnManager.isTxnOpen()) {
> mmWriteId = txnManager.getCurrentTxnId();
>   } else {
> mmWriteId = txnManager.openTxn(new Context(conf), conf.getUser());
> txnManager.commitTxn();
>   }
> {noformat}
> this should throw if there is no open transaction.  It should never open one.
> In general the logic seems suspect.  Looks like the intent is to move all 
> existing files into a delta_x_x/ when a plain table is converted to MM table. 
>  This seems like something that needs to be done from under an Exclusive lock 
> to prevent concurrent Insert operations writing data under table/partition 
> root.  But this is too late to acquire locks which should be done from the 
> Driver.acquireLocks()  (or else have deadlock detector since acquiring them 
> here would bread all-or-nothing lock acquisition semantics currently required 
> w/o deadlock detector)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17646) MetaStoreUtils.isToInsertOnlyTable(Map<String, String> props) is not needed

2017-10-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197932#comment-16197932
 ] 

Sergey Shelukhin commented on HIVE-17646:
-

The logic in this method is needed. How does TransactionValidationListener 
apply here? 

> MetaStoreUtils.isToInsertOnlyTable(Map props) is not needed
> ---
>
> Key: HIVE-17646
> URL: https://issues.apache.org/jira/browse/HIVE-17646
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> TransactionValidationListener is where all the logic to verify
> "transactional" & "transactional_properties" should be



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17733) Move RawStore to standalone metastore

2017-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197930#comment-16197930
 ] 

Hive QA commented on HIVE-17733:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12891144/HIVE-17733.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7200/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7200/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7200/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-10-10 00:25:08.262
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-7200/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-10-10 00:25:08.264
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at c254557 HIVE-17111: Add TestLocalSparkCliDriver (Sahil Takiar, 
reviewed by Aihua Xu, Peter Vary, Xuefu Zhang)
+ git clean -f -d
Removing standalone-metastore/src/gen/org/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at c254557 HIVE-17111: Add TestLocalSparkCliDriver (Sahil Takiar, 
reviewed by Aihua Xu, Peter Vary, Xuefu Zhang)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-10-10 00:25:09.384
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
metastore/src/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java:93
error: 
metastore/src/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java: 
patch does not apply
error: patch failed: 
metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java:32
error: 
metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java: 
patch does not apply
error: patch failed: 
metastore/src/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java:41
error: 
metastore/src/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java: 
patch does not apply
error: patch failed: 
metastore/src/test/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java:527
error: 
metastore/src/test/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java: 
patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12891144 - PreCommit-HIVE-Build

> Move RawStore to standalone metastore
> -
>
> Key: HIVE-17733
> URL: https://issues.apache.org/jira/browse/HIVE-17733
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>  Labels: pull-request-available
> Attachments: HIVE-17733.2.patch, HIVE-17733.patch
>
>
> This includes moving implementations of RawStore (like ObjectStore), 
> MetastoreDirectSql, and stats related classes like ColumnStatsAggregator and 
> the NDV classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16677) CTAS with no data fails in Druid

2017-10-09 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197929#comment-16197929
 ] 

Jesus Camacho Rodriguez commented on HIVE-16677:


This will make CREATE TABLE statements work, as well as INSERT / INSERT 
OVERWRITE statements that produce no data.

https://cwiki.apache.org/confluence/display/Hive/Druid+Integration needs to be 
updated.

> CTAS with no data fails in Druid
> 
>
> Key: HIVE-16677
> URL: https://issues.apache.org/jira/browse/HIVE-16677
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> If we create a table in Druid using a CTAS statement and the query executed 
> to create the table produces no data, we fail with the following exception:
> {noformat}
> druid.DruidStorageHandler: Exception while commit
> java.io.FileNotFoundException: File 
> /tmp/workingDirectory/.staging-jcamachorodriguez_20170515053123_835c394b-2157-4f6b-bfed-a2753acd568e/segmentsDescriptorDir
>  does not exist.
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17726) Using exists may lead to incorrect results

2017-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197928#comment-16197928
 ] 

Hive QA commented on HIVE-17726:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12891137/HIVE-17726.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 29 failed/errored test(s), 11193 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_partitioner] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[spark_local_queries] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_exists] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_in_having] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_notexists] 
(batchId=86)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_notexists_having]
 (batchId=82)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_notin_having] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_unqualcolumnrefs]
 (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mapjoin_reduce] 
(batchId=76)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_multi]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_views]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[constprog_partitioner]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_exists] 
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] 
(batchId=129)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=137)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query16] 
(batchId=241)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query94] 
(batchId=241)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=239)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query16] 
(batchId=239)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] 
(batchId=239)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query94] 
(batchId=239)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7199/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7199/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7199/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 29 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12891137 - PreCommit-HIVE-Build

> Using exists may lead to incorrect results
> --
>
> Key: HIVE-17726
> URL: https://issues.apache.org/jira/browse/HIVE-17726
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Zoltan Haindrich
>Assignee: Vineet Garg
> Attachments: HIVE-17726.1.patch
>
>
> {code}
> drop table if exists tx1;
> create table tx1 (a integer,b integer);
> insert into tx1   values  (1, 1),
> (1, 2),
> (1, 3);
> select count(*) as result,3 as expected from tx1 u
> where exists (select * from tx1 v where u.a=v.a and u.b <> v.b);
> select count(*) as result,3 as expected from tx1 u
> where exists (select * from tx1 v where u.a=v.a and u.b <> v.b limit 1);
> {code}
> current results are 6 and 2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HIVE-17650) DDLTask.handleRemoveMm() assumes locks not present

2017-10-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-17650.
-
Resolution: Duplicate

We will remove the ability to directly convert the tables from MM to non-MM

> DDLTask.handleRemoveMm() assumes locks not present
> --
>
> Key: HIVE-17650
> URL: https://issues.apache.org/jira/browse/HIVE-17650
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> This moves every file in the table from under delta_x_x/ to root of the 
> table/partition
> How would this work for bucketed tables?  Will it create bucket_x_copy_N 
> files?
> This could create 1000s of copy_N files - this will likely break something
> The comments in the method assume locks are present - this would imply that 
> there are appropriate Read/WriteEntity objects already created - I doubt this 
> is the case for a table property change.
> It seems like this kind of op should require an Exclusive lock at table level 
> to prevent concurrent inserts (into new delta_x_x/)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Work started] (HIVE-16677) CTAS with no data fails in Druid

2017-10-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16677 started by Jesus Camacho Rodriguez.
--
> CTAS with no data fails in Druid
> 
>
> Key: HIVE-16677
> URL: https://issues.apache.org/jira/browse/HIVE-16677
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> If we create a table in Druid using a CTAS statement and the query executed 
> to create the table produces no data, we fail with the following exception:
> {noformat}
> druid.DruidStorageHandler: Exception while commit
> java.io.FileNotFoundException: File 
> /tmp/workingDirectory/.staging-jcamachorodriguez_20170515053123_835c394b-2157-4f6b-bfed-a2753acd568e/segmentsDescriptorDir
>  does not exist.
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17745) FileSinkOperator: do not invoke updateProgress for every record

2017-10-09 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197925#comment-16197925
 ] 

Rajesh Balamohan commented on HIVE-17745:
-

Image attached is from correct profiler snapshot. ThreadLocal is from 
LazyBinaryUtils->writeVLong. That needs to be fixed as well. 

> FileSinkOperator: do not invoke updateProgress for every record
> ---
>
> Key: HIVE-17745
> URL: https://issues.apache.org/jira/browse/HIVE-17745
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: fileSinkOperator.png
>
>
> Depending on the query and the number of records, {{updateProgress}} shows up 
> in profiler as hotspot. It would be good to consider updating the progress on 
> completing "x" number of records instead of invoking it for every record.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17671) TableScanDesc.isAcidTable is restricted to FullAcid tables

2017-10-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197923#comment-16197923
 ] 

Sergey Shelukhin commented on HIVE-17671:
-

Does it actually cause problems? I'm assuming from Wei's changes that it 
intends to guard full-acid-specific logic.

> TableScanDesc.isAcidTable is restricted to FullAcid tables
> --
>
> Key: HIVE-17671
> URL: https://issues.apache.org/jira/browse/HIVE-17671
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> _isAcidTable = AcidUtils.isAcidTable(this.tableMetadata);_
> is changed to 
> _isAcidTable = AcidUtils.isFullAcidTable(this.tableMetadata);_
> This property is then checked all over the place - why?
> This then affects TableScanDesc.isAcidTable() so FetchTask, HiveInputFormat 
> etc assume that they are handling Acid read only if it's full acid... this 
> doesn't look right



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17743) Add InterfaceAudience and InterfaceStability annotations for Thrift generated APIs

2017-10-09 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197752#comment-16197752
 ] 

Alan Gates edited comment on HIVE-17743 at 10/10/17 12:20 AM:
--

Ok, I misunderstood.  I think the standalone-metastore part of this patch looks 
good then.

I don't have an opinion on the right thing to do for service-rpc, whether it 
should depend on a separate module or hive-common.


was (Author: alangates):
Ok, misunderstood.  I think the standalone-metastore part of this patch looks 
good then.

I don't have an opinion on the right thing to do for service-rpc, whether it 
should depend on a separate module or hive-common.

> Add InterfaceAudience and InterfaceStability annotations for Thrift generated 
> APIs
> --
>
> Key: HIVE-17743
> URL: https://issues.apache.org/jira/browse/HIVE-17743
> Project: Hive
>  Issue Type: Sub-task
>  Components: Thrift API
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17743.1.patch
>
>
> The Thrift generated files don't have {{InterfaceAudience}} or 
> {{InterfaceStability}} annotations on them, mainly because all the files are 
> auto-generated.
> We should add some code that auto-tags all the Java Thrift generated files 
> with these annotations. This way even when they are re-generated, they still 
> contain the annotations.
> We should be able to do this using the 
> {{com.google.code.maven-replacer-plugin}} similar to what we do in 
> {{standalone-metastore/pom.xml}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17502) Reuse of default session should not throw an exception in LLAP w/ Tez

2017-10-09 Thread Thai Bui (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197922#comment-16197922
 ] 

Thai Bui commented on HIVE-17502:
-


bq. Because currently each SessionState (Hive session) can only have one Tez 
session. With this patch it's possible to reuse the Hive session, utilizing 
several Tez sessions. So the code that gets/sets/closes/etc. the session by 
getting it from SessionState will become invalid I assume.

Yes, it's possible to reuse the Hive session, *if and only if that session has 
been returned to the pool*. So in effect, 1 session is used by 1 client at a 
time and so SessionState only handles 1 Tez session.

What this patch does is to remove the exception being thrown when a client is 
making a second request using the same session (while it's still being used). 
In that case, the first session in the request will be *skipped*, and a new 
session from the pool will be returned. Thus, allowing the same client to use 
multiple sessions (handled by distinct threads). Since all session states are 
thread-local, each session request is handled by a different thread and things 
are fine, there's no bug AFAIK.

[~thejas] I totally understand the concern about long term support. My company 
has a big investment in this and we do want to work with the community to find 
the best solution using Hive 2 w/ LLAP + Tez. This is one of the biggest 
requirements for my team since it's frustrating to use Hue 4 in 
one-query-at-time mode. To change Hue 4 to use a different session per query is 
very invasive whereas it makes more sense for HiveServer2 to be more permissive 
and not throw an exception.

Also, I think we need to clarity the intention of this patch. It is not to use 
the same session for multiple queries in parallel. It will use a session for a 
query at a time. However, when the client tries to reuse the session (while 
it's still being used), a new session from the pool will be returned.

The only downside to this patch is that there will be quite a bit of orphaned 
sessions (since users keep requesting new sessions and leave existing 
sessions). However, that is taken care of easily by setting HS2 session 
expiration times more aggressively.

> Reuse of default session should not throw an exception in LLAP w/ Tez
> -
>
> Key: HIVE-17502
> URL: https://issues.apache.org/jira/browse/HIVE-17502
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Tez
>Affects Versions: 2.1.1, 2.2.0
> Environment: HDP 2.6.1.0-129, Hue 4
>Reporter: Thai Bui
>Assignee: Thai Bui
> Fix For: 3.0.0
>
> Attachments: HIVE-17502.patch
>
>
> Hive2 w/ LLAP on Tez doesn't allow a currently used, default session to be 
> skipped mostly because of this line 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L365.
> However, some clients such as Hue 4, allow multiple sessions to be used per 
> user. Under this configuration, a Thrift client will send a request to either 
> reuse or open a new session. The reuse request could include the session id 
> of a currently used snippet being executed in Hue, this causes HS2 to throw 
> an exception:
> {noformat}
> 2017-09-10T17:51:36,548 INFO  [Thread-89]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(512)) - The current user: 
> hive, session user: hive
> 2017-09-10T17:51:36,549 ERROR [Thread-89]: exec.Task 
> (TezTask.java:execute(232)) - Failed to execute tez graph.
> org.apache.hadoop.hive.ql.metadata.HiveException: The pool session 
> sessionId=5b61a578-6336-41c5-860d-9838166f97fe, queueName=llap, user=hive, 
> doAs=false, isOpen=true, isDefault=true, expires in 591015330ms should have 
> been returned to the pool
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.canWorkWithSameSession(TezSessionPoolManager.java:534)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:544)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:147) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
> {noformat}
> Note that every query is issued as a single 'hive' user to share the LLAP 
> daemon pool, a

[jira] [Assigned] (HIVE-16677) CTAS with no data fails in Druid

2017-10-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-16677:
--

Assignee: Jesus Camacho Rodriguez

> CTAS with no data fails in Druid
> 
>
> Key: HIVE-16677
> URL: https://issues.apache.org/jira/browse/HIVE-16677
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> If we create a table in Druid using a CTAS statement and the query executed 
> to create the table produces no data, we fail with the following exception:
> {noformat}
> druid.DruidStorageHandler: Exception while commit
> java.io.FileNotFoundException: File 
> /tmp/workingDirectory/.staging-jcamachorodriguez_20170515053123_835c394b-2157-4f6b-bfed-a2753acd568e/segmentsDescriptorDir
>  does not exist.
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17691) Miscellaneous List

2017-10-09 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17691:
--
Description: 
# DDLSemanticAnalyzer.alterTableOutput is unused
# DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
TransactionManager
# DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
place..
# FileSinkOperator has multiple places that look like _conf.getWriteType() == 
AcidUtils.Operation.NOT_ACID || conf.isMmTable()_ - what is the writeType for 
MM tables?  Seems that Wei opted for "work.getLoadTableWork().getWriteType() != 
AcidUtils.Operation.NOT_ACID && !tbd.isMmTable()" to mean MM, e.g. 
MoveTask.handleStaticParts() call to Hive.loadPartition()
# HiveConf.HIVE_TXN_OPERATIONAL_PROPERTIES - the doc/explanation there is 
obsolete
# Compactor Initiator likely doesn't work for MM tables.  It's triggered by 
into in TXN_COMPONENTS/COMPLETED_TXN_COMPONENTS.  MM tables don't write to 
either because DbTxnManager.acquireLocks() does  
_compBuilder.setIsAcid(AcidUtils.isFullAcidTable(t));_ i.e. it treats MM as 
non-acid tables
# In general integration with full Acid seems confused wrt to MM and seems to 
treat MM as special table type rather than subtype of Acid table.  (mostly, but 
not always).
# LoadSemanticAnalyzer.analyzeInternal(ASTNode) sets statementId to 0 rather 
than from TM
# ImportCommitTask - doesn't currently do anything.  It used to commit mmID.  
Need to verify we properly commit the txn in the Driver
# As far as I can tell all the mm_*.q tests run on TestCliDriver which means 
MR.  This doesn't exercise some code specifically for dealing with writes from 
Union All queries (CTAS, Insert into).  On MR this requires 
"hive.optimize.union.remove=true" (false by default)
# Remove MoveWork().setNoop(boolean) and usages per todo in 
_GenMapRedUtils.createMRWorkForMergingFiles (FileSinkOperator fsInput, Path 
finalName, DependencyCollectionTask dependencyTask,   List 
mvTasks, HiveConf conf,   Task currTask)_




  was:
# DDLSemanticAnalyzer.alterTableOutput is unused
# DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
TransactionManager
# DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
place..
# FileSinkOperator has multiple places that look like _conf.getWriteType() == 
AcidUtils.Operation.NOT_ACID || conf.isMmTable()_ - what is the writeType for 
MM tables?  Seems that Wei opted for "work.getLoadTableWork().getWriteType() != 
AcidUtils.Operation.NOT_ACID && !tbd.isMmTable()" to mean MM, e.g. 
MoveTask.handleStaticParts() call to Hive.loadPartition()
# HiveConf.HIVE_TXN_OPERATIONAL_PROPERTIES - the doc/explanation there is 
obsolete
# Compactor Initiator likely doesn't work for MM tables.  It's triggered by 
into in TXN_COMPONENTS/COMPLETED_TXN_COMPONENTS.  MM tables don't write to 
either because DbTxnManager.acquireLocks() does  
_compBuilder.setIsAcid(AcidUtils.isFullAcidTable(t));_ i.e. it treats MM as 
non-acid tables
# In general integration with full Acid seems confused wrt to MM and seems to 
treat MM as special table type rather than subtype of Acid table.  (mostly, but 
not always).
# LoadSemanticAnalyzer.analyzeInternal(ASTNode) sets statementId to 0 rather 
than from TM
# ImportCommitTask - doesn't currently do anything.  It used to commit mmID.  
Need to verify we properly commit the txn in the Driver
# As far as I can tell all the mm_*.q tests run on TestCliDriver which means 
MR.  This doesn't exercise some code specifically for dealing with writes from 
Union All queries (CTAS, Insert into).  On MR this requires 
"hive.optimize.union.remove=true" (false by default)
# Remove MoveWork().setNoop(boolean) and usages




> Miscellaneous List
> --
>
> Key: HIVE-17691
> URL: https://issues.apache.org/jira/browse/HIVE-17691
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> # DDLSemanticAnalyzer.alterTableOutput is unused
> # DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
> TransactionManager
> # DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
> crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
> place..
> # FileSinkOperator has multiple places that look like _conf.getWriteType() == 
> AcidUtils.Operation.NOT_ACID || conf.isMmTable()_ - what is the writeType for 
> MM tables?  Seems that Wei opted for "work.getLoadTableWork().getWriteType() 
> != AcidUtils.Operation.NOT_ACID && !tbd.isMmTable()" to mean MM, e.g. 
> MoveTask.handleStaticParts() call to Hive.loadPartition()
> # HiveConf.HIVE_TXN_OPERATIONAL_PROPERTIES - the

[jira] [Updated] (HIVE-17691) Miscellaneous List

2017-10-09 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17691:
--
Description: 
# DDLSemanticAnalyzer.alterTableOutput is unused
# DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
TransactionManager
# DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
place..
# FileSinkOperator has multiple places that look like _conf.getWriteType() == 
AcidUtils.Operation.NOT_ACID || conf.isMmTable()_ - what is the writeType for 
MM tables?  Seems that Wei opted for "work.getLoadTableWork().getWriteType() != 
AcidUtils.Operation.NOT_ACID && !tbd.isMmTable()" to mean MM, e.g. 
MoveTask.handleStaticParts() call to Hive.loadPartition()
# HiveConf.HIVE_TXN_OPERATIONAL_PROPERTIES - the doc/explanation there is 
obsolete
# Compactor Initiator likely doesn't work for MM tables.  It's triggered by 
into in TXN_COMPONENTS/COMPLETED_TXN_COMPONENTS.  MM tables don't write to 
either because DbTxnManager.acquireLocks() does  
_compBuilder.setIsAcid(AcidUtils.isFullAcidTable(t));_ i.e. it treats MM as 
non-acid tables
# In general integration with full Acid seems confused wrt to MM and seems to 
treat MM as special table type rather than subtype of Acid table.  (mostly, but 
not always).
# LoadSemanticAnalyzer.analyzeInternal(ASTNode) sets statementId to 0 rather 
than from TM
# ImportCommitTask - doesn't currently do anything.  It used to commit mmID.  
Need to verify we properly commit the txn in the Driver
# As far as I can tell all the mm_*.q tests run on TestCliDriver which means 
MR.  This doesn't exercise some code specifically for dealing with writes from 
Union All queries (CTAS, Insert into).  On MR this requires 
"hive.optimize.union.remove=true" (false by default)
# Remove MoveWork().setNoop(boolean) and usages



  was:
# DDLSemanticAnalyzer.alterTableOutput is unused
# DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
TransactionManager
# DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
place..
# FileSinkOperator has multiple places that look like _conf.getWriteType() == 
AcidUtils.Operation.NOT_ACID || conf.isMmTable()_ - what is the writeType for 
MM tables?  Seems that Wei opted for "work.getLoadTableWork().getWriteType() != 
AcidUtils.Operation.NOT_ACID && !tbd.isMmTable()" to mean MM, e.g. 
MoveTask.handleStaticParts() call to Hive.loadPartition()
# HiveConf.HIVE_TXN_OPERATIONAL_PROPERTIES - the doc/explanation there is 
obsolete
# Compactor Initiator likely doesn't work for MM tables.  It's triggered by 
into in TXN_COMPONENTS/COMPLETED_TXN_COMPONENTS.  MM tables don't write to 
either because DbTxnManager.acquireLocks() does  
_compBuilder.setIsAcid(AcidUtils.isFullAcidTable(t));_ i.e. it treats MM as 
non-acid tables
# In general integration with full Acid seems confused wrt to MM and seems to 
treat MM as special table type rather than subtype of Acid table.  (mostly, but 
not always).
# LoadSemanticAnalyzer.analyzeInternal(ASTNode) sets statementId to 0 rather 
than from TM
# ImportCommitTask - doesn't currently do anything.  It used to commit mmID.  
Need to verify we properly commit the txn in the Driver
# As far as I can tell all the mm_*.q tests run on TestCliDriver which means 
MR.  This doesn't exercise some code specifically for dealing with writes from 
Union All queries (CTAS, Insert into).  On MR this requires 
"hive.optimize.union.remove=true" (false by default)




> Miscellaneous List
> --
>
> Key: HIVE-17691
> URL: https://issues.apache.org/jira/browse/HIVE-17691
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> # DDLSemanticAnalyzer.alterTableOutput is unused
> # DDLTask.generateAddMmTasks(Table) - stmtId should probably come from 
> TransactionManager
> # DDLTask.createTable(Hive db, CreateTableDesc crtTbl) has _Long mmWriteId = 
> crtTbl.getInitialMmWriteId();_ logic is unclear..  this ID is only set in one 
> place..
> # FileSinkOperator has multiple places that look like _conf.getWriteType() == 
> AcidUtils.Operation.NOT_ACID || conf.isMmTable()_ - what is the writeType for 
> MM tables?  Seems that Wei opted for "work.getLoadTableWork().getWriteType() 
> != AcidUtils.Operation.NOT_ACID && !tbd.isMmTable()" to mean MM, e.g. 
> MoveTask.handleStaticParts() call to Hive.loadPartition()
> # HiveConf.HIVE_TXN_OPERATIONAL_PROPERTIES - the doc/explanation there is 
> obsolete
> # Compactor Initiator likely doesn't work for MM tables.  It's triggered by 
> into in TXN_COMPONENTS/COMPLETED_TXN_COMPONENTS.  MM tables don't write to 
> either because DbTxnManager.acquireLocks() does  
>

[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-10-09 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Attachment: (was: HIVE-11548.6-branch-2.patch)

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch, HIVE-11548.6.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-10-09 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Attachment: HIVE-11548.6-branch-2.patch

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch, HIVE-11548.6.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-15267) Make query length calculation logic more accurate in TxnUtils.needNewQuery()

2017-10-09 Thread Steve Yeom (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197897#comment-16197897
 ] 

Steve Yeom commented on HIVE-15267:
---

[~ekoifman] 1
[~eugene.koifman] 2,

Hey Eugene, 
please review the patch2 for this jira. 
Thanks, 
Steve. 

> Make query length calculation logic more accurate in TxnUtils.needNewQuery()
> 
>
> Key: HIVE-15267
> URL: https://issues.apache.org/jira/browse/HIVE-15267
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Wei Zheng
>Assignee: Steve Yeom
> Attachments: HIVE-15267.01.patch, HIVE-15267.02.patch
>
>
> In HIVE-15181 there's such review comment, for which this ticket will handle
> {code}
> in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do 
> the right thing.
> If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most 
> likely want each SQL string to be at most 1K.
> But if sizeInBytes=2047, this still returns false.
> It should include length of "suffix" in computation of sizeInBytes
> Along the same lines: the check for max query length is done after each batch 
> is already added to the query. Suppose there are 1000 9-digit txn IDs in each 
> IN(...). That's, conservatively, 18KB of text. So the length of each query is 
> increasing in 18KB chunks. 
> I think the check for query length should be done for each item in IN clause.
> If some DB has a limit on query length of X, then any query > X will fail. So 
> I think this must ensure not to produce any queries > X, even by 1 char.
> For example, case 3.1 of the UT generates a query of almost 4000 characters - 
> this is clearly > 1KB.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-10-09 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Attachment: HIVE-11548.6.patch

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch, HIVE-11548.6.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17698) FileSinkDesk.getMergeInputDirName() uses stmtId=0

2017-10-09 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197888#comment-16197888
 ] 

Eugene Koifman commented on HIVE-17698:
---

it should get if from transaction manager.
getMergeInputDirName() doesn't exist on master
in master file creation is handled by OrcRecordUpdater

> FileSinkDesk.getMergeInputDirName() uses stmtId=0
> -
>
> Key: HIVE-17698
> URL: https://issues.apache.org/jira/browse/HIVE-17698
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> this is certainly wrong for multi statement txn but may also affect writes 
> from Union All queries if these are made to follow full Acid convention
> _return new Path(root, AcidUtils.deltaSubdir(txnId, txnId, 0));_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17695) collapse union all produced directories into delta directory name suffix for MM

2017-10-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17695:

Summary: collapse union all produced directories into delta directory name 
suffix for MM  (was: HiveInputFormat.processForWriteIds(Path dir, JobConf conf, 
  ValidTxnList validTxnList, List finalPaths))

> collapse union all produced directories into delta directory name suffix for 
> MM
> ---
>
> Key: HIVE-17695
> URL: https://issues.apache.org/jira/browse/HIVE-17695
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> this has special handling for writes resulting from Union All query
> In full Acid case at least, these subdirs get collapsed in favor of 
> statementId based dir names (delta_x_y_stmtId).  It would be cleaner/simpler 
> to make MM follow the same logic.  (full acid does it Hive.moveFiles() I 
> think)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17695) HiveInputFormat.processForWriteIds(Path dir, JobConf conf, ValidTxnList validTxnList, List finalPaths)

2017-10-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197887#comment-16197887
 ] 

Sergey Shelukhin commented on HIVE-17695:
-

Cannot be done with list bucketing in the picture (it's 
delta_.../lb-stuff/union-suffix iirc). 

> HiveInputFormat.processForWriteIds(Path dir, JobConf conf,   ValidTxnList 
> validTxnList, List finalPaths)
> --
>
> Key: HIVE-17695
> URL: https://issues.apache.org/jira/browse/HIVE-17695
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> this has special handling for writes resulting from Union All query
> In full Acid case at least, these subdirs get collapsed in favor of 
> statementId based dir names (delta_x_y_stmtId).  It would be cleaner/simpler 
> to make MM follow the same logic.  (full acid does it Hive.moveFiles() I 
> think)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17698) FileSinkDesk.getMergeInputDirName() uses stmtId=0

2017-10-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197886#comment-16197886
 ] 

Sergey Shelukhin commented on HIVE-17698:
-

What should it use? What does it use on master?

> FileSinkDesk.getMergeInputDirName() uses stmtId=0
> -
>
> Key: HIVE-17698
> URL: https://issues.apache.org/jira/browse/HIVE-17698
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> this is certainly wrong for multi statement txn but may also affect writes 
> from Union All queries if these are made to follow full Acid convention
> _return new Path(root, AcidUtils.deltaSubdir(txnId, txnId, 0));_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17502) Reuse of default session should not throw an exception in LLAP w/ Tez

2017-10-09 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197870#comment-16197870
 ] 

Thejas M Nair commented on HIVE-17502:
--

[~thai.bui]
Do you know why Hue uses same HiveServer2 session for multiple queries ?
I fear this is mostly kicking can down the road as we don't seem to have all 
cases covered here or any ongoing investment to cover those cases, that I am 
aware of.
In general, with JDBC,  recommendations people have seem to be to not run 
simultaneous queries within same jdbc connection. (Also, jdbc statement calls 
are blocking, so its not easy to do without multiple threads).
Is it possible to change Hue to use different sessions for each query ?
We can still get this patch in, but I just want to get the conversation going 
on the long term solution for this.


> Reuse of default session should not throw an exception in LLAP w/ Tez
> -
>
> Key: HIVE-17502
> URL: https://issues.apache.org/jira/browse/HIVE-17502
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Tez
>Affects Versions: 2.1.1, 2.2.0
> Environment: HDP 2.6.1.0-129, Hue 4
>Reporter: Thai Bui
>Assignee: Thai Bui
> Fix For: 3.0.0
>
> Attachments: HIVE-17502.patch
>
>
> Hive2 w/ LLAP on Tez doesn't allow a currently used, default session to be 
> skipped mostly because of this line 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L365.
> However, some clients such as Hue 4, allow multiple sessions to be used per 
> user. Under this configuration, a Thrift client will send a request to either 
> reuse or open a new session. The reuse request could include the session id 
> of a currently used snippet being executed in Hue, this causes HS2 to throw 
> an exception:
> {noformat}
> 2017-09-10T17:51:36,548 INFO  [Thread-89]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(512)) - The current user: 
> hive, session user: hive
> 2017-09-10T17:51:36,549 ERROR [Thread-89]: exec.Task 
> (TezTask.java:execute(232)) - Failed to execute tez graph.
> org.apache.hadoop.hive.ql.metadata.HiveException: The pool session 
> sessionId=5b61a578-6336-41c5-860d-9838166f97fe, queueName=llap, user=hive, 
> doAs=false, isOpen=true, isDefault=true, expires in 591015330ms should have 
> been returned to the pool
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.canWorkWithSameSession(TezSessionPoolManager.java:534)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:544)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:147) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
> {noformat}
> Note that every query is issued as a single 'hive' user to share the LLAP 
> daemon pool, a set of pre-determined number of AMs is initialized at setup 
> time. Thus, HS2 should allow new sessions from a Thrift client to be used out 
> of the pool, or an existing session to be skipped and an unused session from 
> the pool to be returned. The logic to throw an exception in the  
> `canWorkWithSameSession` doesn't make sense to me.
> I have a solution to fix this issue in my local branch at 
> https://github.com/thaibui/hive/commit/078a521b9d0906fe6c0323b63e567f6eee2f3a70.
>  When applied, the log will become like so
> {noformat}
> 2017-09-10T09:15:33,578 INFO  [Thread-239]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(533)) - Skipping default 
> session sessionId=6638b1da-0f8a-405e-85f0-9586f484e6de, queueName=llap, 
> user=hive, doAs=false, isOpen=true, isDefault=true, expires in 591868732ms 
> since it is being used.
> {noformat}
> A test case is provided in my branch to demonstrate how it works. If possible 
> I would like this patch to be applied to version 2.1, 2.2 and master. Since 
> we are using 2.1 LLAP in production with Hue 4, this patch is critical to our 
> success.
> Alternatively, if this patch is too broad in scope, I propose adding an 
> option to allow "skipping of currently used default sessions". With this new 
> option default to "false", existing behavior won't change unless the option 
> is turned on.
> I will

[jira] [Comment Edited] (HIVE-17502) Reuse of default session should not throw an exception in LLAP w/ Tez

2017-10-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197868#comment-16197868
 ] 

Sergey Shelukhin edited comment on HIVE-17502 at 10/9/17 11:31 PM:
---

Because currently each SessionState (Hive session) can only have one Tez 
session. With this patch it's possible to reuse the Hive session, utilizing 
several Tez sessions. So the code that gets/sets/closes/etc. the session by 
getting it from SessionState will become invalid I assume.

Just having a config is not good enough to allow known buggy code... 

Btw, we were wondering in some discussion, why cannot Hue just not reuse the 
session this way?


was (Author: sershe):
Because currently each SessionState (Hive session) can only have one Tez 
session. With this patch it's possible to reuse the Hive session, utilizing 
several Tez sessions. So the code that gets/sets/closes/etc. the session by 
getting it from SessionState will become invalid I assume

> Reuse of default session should not throw an exception in LLAP w/ Tez
> -
>
> Key: HIVE-17502
> URL: https://issues.apache.org/jira/browse/HIVE-17502
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Tez
>Affects Versions: 2.1.1, 2.2.0
> Environment: HDP 2.6.1.0-129, Hue 4
>Reporter: Thai Bui
>Assignee: Thai Bui
> Fix For: 3.0.0
>
> Attachments: HIVE-17502.patch
>
>
> Hive2 w/ LLAP on Tez doesn't allow a currently used, default session to be 
> skipped mostly because of this line 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L365.
> However, some clients such as Hue 4, allow multiple sessions to be used per 
> user. Under this configuration, a Thrift client will send a request to either 
> reuse or open a new session. The reuse request could include the session id 
> of a currently used snippet being executed in Hue, this causes HS2 to throw 
> an exception:
> {noformat}
> 2017-09-10T17:51:36,548 INFO  [Thread-89]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(512)) - The current user: 
> hive, session user: hive
> 2017-09-10T17:51:36,549 ERROR [Thread-89]: exec.Task 
> (TezTask.java:execute(232)) - Failed to execute tez graph.
> org.apache.hadoop.hive.ql.metadata.HiveException: The pool session 
> sessionId=5b61a578-6336-41c5-860d-9838166f97fe, queueName=llap, user=hive, 
> doAs=false, isOpen=true, isDefault=true, expires in 591015330ms should have 
> been returned to the pool
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.canWorkWithSameSession(TezSessionPoolManager.java:534)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:544)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:147) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
> {noformat}
> Note that every query is issued as a single 'hive' user to share the LLAP 
> daemon pool, a set of pre-determined number of AMs is initialized at setup 
> time. Thus, HS2 should allow new sessions from a Thrift client to be used out 
> of the pool, or an existing session to be skipped and an unused session from 
> the pool to be returned. The logic to throw an exception in the  
> `canWorkWithSameSession` doesn't make sense to me.
> I have a solution to fix this issue in my local branch at 
> https://github.com/thaibui/hive/commit/078a521b9d0906fe6c0323b63e567f6eee2f3a70.
>  When applied, the log will become like so
> {noformat}
> 2017-09-10T09:15:33,578 INFO  [Thread-239]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(533)) - Skipping default 
> session sessionId=6638b1da-0f8a-405e-85f0-9586f484e6de, queueName=llap, 
> user=hive, doAs=false, isOpen=true, isDefault=true, expires in 591868732ms 
> since it is being used.
> {noformat}
> A test case is provided in my branch to demonstrate how it works. If possible 
> I would like this patch to be applied to version 2.1, 2.2 and master. Since 
> we are using 2.1 LLAP in production with Hue 4, this patch is critical to our 
> success.
> Alternatively, if this patch is too broad in scope, I propose adding an 
> option to allow

[jira] [Commented] (HIVE-17502) Reuse of default session should not throw an exception in LLAP w/ Tez

2017-10-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197868#comment-16197868
 ] 

Sergey Shelukhin commented on HIVE-17502:
-

Because currently each SessionState (Hive session) can only have one Tez 
session. With this patch it's possible to reuse the Hive session, utilizing 
several Tez sessions. So the code that gets/sets/closes/etc. the session by 
getting it from SessionState will become invalid I assume

> Reuse of default session should not throw an exception in LLAP w/ Tez
> -
>
> Key: HIVE-17502
> URL: https://issues.apache.org/jira/browse/HIVE-17502
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Tez
>Affects Versions: 2.1.1, 2.2.0
> Environment: HDP 2.6.1.0-129, Hue 4
>Reporter: Thai Bui
>Assignee: Thai Bui
> Fix For: 3.0.0
>
> Attachments: HIVE-17502.patch
>
>
> Hive2 w/ LLAP on Tez doesn't allow a currently used, default session to be 
> skipped mostly because of this line 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L365.
> However, some clients such as Hue 4, allow multiple sessions to be used per 
> user. Under this configuration, a Thrift client will send a request to either 
> reuse or open a new session. The reuse request could include the session id 
> of a currently used snippet being executed in Hue, this causes HS2 to throw 
> an exception:
> {noformat}
> 2017-09-10T17:51:36,548 INFO  [Thread-89]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(512)) - The current user: 
> hive, session user: hive
> 2017-09-10T17:51:36,549 ERROR [Thread-89]: exec.Task 
> (TezTask.java:execute(232)) - Failed to execute tez graph.
> org.apache.hadoop.hive.ql.metadata.HiveException: The pool session 
> sessionId=5b61a578-6336-41c5-860d-9838166f97fe, queueName=llap, user=hive, 
> doAs=false, isOpen=true, isDefault=true, expires in 591015330ms should have 
> been returned to the pool
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.canWorkWithSameSession(TezSessionPoolManager.java:534)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:544)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:147) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
> {noformat}
> Note that every query is issued as a single 'hive' user to share the LLAP 
> daemon pool, a set of pre-determined number of AMs is initialized at setup 
> time. Thus, HS2 should allow new sessions from a Thrift client to be used out 
> of the pool, or an existing session to be skipped and an unused session from 
> the pool to be returned. The logic to throw an exception in the  
> `canWorkWithSameSession` doesn't make sense to me.
> I have a solution to fix this issue in my local branch at 
> https://github.com/thaibui/hive/commit/078a521b9d0906fe6c0323b63e567f6eee2f3a70.
>  When applied, the log will become like so
> {noformat}
> 2017-09-10T09:15:33,578 INFO  [Thread-239]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(533)) - Skipping default 
> session sessionId=6638b1da-0f8a-405e-85f0-9586f484e6de, queueName=llap, 
> user=hive, doAs=false, isOpen=true, isDefault=true, expires in 591868732ms 
> since it is being used.
> {noformat}
> A test case is provided in my branch to demonstrate how it works. If possible 
> I would like this patch to be applied to version 2.1, 2.2 and master. Since 
> we are using 2.1 LLAP in production with Hue 4, this patch is critical to our 
> success.
> Alternatively, if this patch is too broad in scope, I propose adding an 
> option to allow "skipping of currently used default sessions". With this new 
> option default to "false", existing behavior won't change unless the option 
> is turned on.
> I will prepare an official path if this change to master &/ the other 
> branches is acceptable. I'm not an contributor &/ committer, this will be my 
> first time contributing to Hive and the Apache foundation. Any early review 
> is greatly appreciated, thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17502) Reuse of default session should not throw an exception in LLAP w/ Tez

2017-10-09 Thread Thai Bui (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197864#comment-16197864
 ] 

Thai Bui commented on HIVE-17502:
-

Sure, I'm happy to add additional checks, although nothing should be affected 
unless the new flag is turned on. I'm not sure what you mean 
SessionState::getTezSession and all related methods will become invalid, could 
you clarify?

> Reuse of default session should not throw an exception in LLAP w/ Tez
> -
>
> Key: HIVE-17502
> URL: https://issues.apache.org/jira/browse/HIVE-17502
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Tez
>Affects Versions: 2.1.1, 2.2.0
> Environment: HDP 2.6.1.0-129, Hue 4
>Reporter: Thai Bui
>Assignee: Thai Bui
> Fix For: 3.0.0
>
> Attachments: HIVE-17502.patch
>
>
> Hive2 w/ LLAP on Tez doesn't allow a currently used, default session to be 
> skipped mostly because of this line 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L365.
> However, some clients such as Hue 4, allow multiple sessions to be used per 
> user. Under this configuration, a Thrift client will send a request to either 
> reuse or open a new session. The reuse request could include the session id 
> of a currently used snippet being executed in Hue, this causes HS2 to throw 
> an exception:
> {noformat}
> 2017-09-10T17:51:36,548 INFO  [Thread-89]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(512)) - The current user: 
> hive, session user: hive
> 2017-09-10T17:51:36,549 ERROR [Thread-89]: exec.Task 
> (TezTask.java:execute(232)) - Failed to execute tez graph.
> org.apache.hadoop.hive.ql.metadata.HiveException: The pool session 
> sessionId=5b61a578-6336-41c5-860d-9838166f97fe, queueName=llap, user=hive, 
> doAs=false, isOpen=true, isDefault=true, expires in 591015330ms should have 
> been returned to the pool
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.canWorkWithSameSession(TezSessionPoolManager.java:534)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:544)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:147) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
> {noformat}
> Note that every query is issued as a single 'hive' user to share the LLAP 
> daemon pool, a set of pre-determined number of AMs is initialized at setup 
> time. Thus, HS2 should allow new sessions from a Thrift client to be used out 
> of the pool, or an existing session to be skipped and an unused session from 
> the pool to be returned. The logic to throw an exception in the  
> `canWorkWithSameSession` doesn't make sense to me.
> I have a solution to fix this issue in my local branch at 
> https://github.com/thaibui/hive/commit/078a521b9d0906fe6c0323b63e567f6eee2f3a70.
>  When applied, the log will become like so
> {noformat}
> 2017-09-10T09:15:33,578 INFO  [Thread-239]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(533)) - Skipping default 
> session sessionId=6638b1da-0f8a-405e-85f0-9586f484e6de, queueName=llap, 
> user=hive, doAs=false, isOpen=true, isDefault=true, expires in 591868732ms 
> since it is being used.
> {noformat}
> A test case is provided in my branch to demonstrate how it works. If possible 
> I would like this patch to be applied to version 2.1, 2.2 and master. Since 
> we are using 2.1 LLAP in production with Hue 4, this patch is critical to our 
> success.
> Alternatively, if this patch is too broad in scope, I propose adding an 
> option to allow "skipping of currently used default sessions". With this new 
> option default to "false", existing behavior won't change unless the option 
> is turned on.
> I will prepare an official path if this change to master &/ the other 
> branches is acceptable. I'm not an contributor &/ committer, this will be my 
> first time contributing to Hive and the Apache foundation. Any early review 
> is greatly appreciated, thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-15267) Make query length calculation logic more accurate in TxnUtils.needNewQuery()

2017-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197849#comment-16197849
 ] 

Hive QA commented on HIVE-15267:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12891141/HIVE-15267.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11191 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=101)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=239)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7198/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7198/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7198/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12891141 - PreCommit-HIVE-Build

> Make query length calculation logic more accurate in TxnUtils.needNewQuery()
> 
>
> Key: HIVE-15267
> URL: https://issues.apache.org/jira/browse/HIVE-15267
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Wei Zheng
>Assignee: Steve Yeom
> Attachments: HIVE-15267.01.patch, HIVE-15267.02.patch
>
>
> In HIVE-15181 there's such review comment, for which this ticket will handle
> {code}
> in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do 
> the right thing.
> If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most 
> likely want each SQL string to be at most 1K.
> But if sizeInBytes=2047, this still returns false.
> It should include length of "suffix" in computation of sizeInBytes
> Along the same lines: the check for max query length is done after each batch 
> is already added to the query. Suppose there are 1000 9-digit txn IDs in each 
> IN(...). That's, conservatively, 18KB of text. So the length of each query is 
> increasing in 18KB chunks. 
> I think the check for query length should be done for each item in IN clause.
> If some DB has a limit on query length of X, then any query > X will fail. So 
> I think this must ensure not to produce any queries > X, even by 1 char.
> For example, case 3.1 of the UT generates a query of almost 4000 characters - 
> this is clearly > 1KB.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17111) Add TestLocalSparkCliDriver

2017-10-09 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17111:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the reviews everyone! Pushed to master.

> Add TestLocalSparkCliDriver
> ---
>
> Key: HIVE-17111
> URL: https://issues.apache.org/jira/browse/HIVE-17111
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-17111.1.patch
>
>
> The TestSparkCliDriver sets the spark.master to local-cluster[2,2,1024] but 
> the HoS still uses decides to use the RemoteHiveSparkClient rather than the 
> LocalHiveSparkClient.
> The issue is with the following check in HiveSparkClientFactory:
> {code}
> if (master.equals("local") || master.startsWith("local[")) {
>   // With local spark context, all user sessions share the same spark 
> context.
>   return LocalHiveSparkClient.getInstance(generateSparkConf(sparkConf));
> } else {
>   return new RemoteHiveSparkClient(hiveconf, sparkConf);
> }
> {code}
> When {{master.startsWith("local[")}} it checks the value of spark.master and 
> sees that it doesn't start with {{local[}} and then decides to use the 
> RemoteHiveSparkClient.
> We should fix this so that the LocalHiveSparkClient is used. It should speed 
> up some of the tests, and also makes qtests easier to debug since everything 
> will now be run in the same process.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17473) implement workload management pools

2017-10-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197832#comment-16197832
 ] 

Sergey Shelukhin commented on HIVE-17473:
-

[~prasanth_j] [~aplusplus] ping?

> implement workload management pools
> ---
>
> Key: HIVE-17473
> URL: https://issues.apache.org/jira/browse/HIVE-17473
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17473.01.patch, HIVE-17473.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17502) Reuse of default session should not throw an exception in LLAP w/ Tez

2017-10-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197830#comment-16197830
 ] 

Sergey Shelukhin commented on HIVE-17502:
-

Hmm.. SessionState::getTezSession and all related methods will become invalid 
now, right? I wonder if they need additional correctness checks for this case.
[~thejas] any other input?

> Reuse of default session should not throw an exception in LLAP w/ Tez
> -
>
> Key: HIVE-17502
> URL: https://issues.apache.org/jira/browse/HIVE-17502
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Tez
>Affects Versions: 2.1.1, 2.2.0
> Environment: HDP 2.6.1.0-129, Hue 4
>Reporter: Thai Bui
>Assignee: Thai Bui
> Fix For: 3.0.0
>
> Attachments: HIVE-17502.patch
>
>
> Hive2 w/ LLAP on Tez doesn't allow a currently used, default session to be 
> skipped mostly because of this line 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L365.
> However, some clients such as Hue 4, allow multiple sessions to be used per 
> user. Under this configuration, a Thrift client will send a request to either 
> reuse or open a new session. The reuse request could include the session id 
> of a currently used snippet being executed in Hue, this causes HS2 to throw 
> an exception:
> {noformat}
> 2017-09-10T17:51:36,548 INFO  [Thread-89]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(512)) - The current user: 
> hive, session user: hive
> 2017-09-10T17:51:36,549 ERROR [Thread-89]: exec.Task 
> (TezTask.java:execute(232)) - Failed to execute tez graph.
> org.apache.hadoop.hive.ql.metadata.HiveException: The pool session 
> sessionId=5b61a578-6336-41c5-860d-9838166f97fe, queueName=llap, user=hive, 
> doAs=false, isOpen=true, isDefault=true, expires in 591015330ms should have 
> been returned to the pool
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.canWorkWithSameSession(TezSessionPoolManager.java:534)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:544)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:147) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
> {noformat}
> Note that every query is issued as a single 'hive' user to share the LLAP 
> daemon pool, a set of pre-determined number of AMs is initialized at setup 
> time. Thus, HS2 should allow new sessions from a Thrift client to be used out 
> of the pool, or an existing session to be skipped and an unused session from 
> the pool to be returned. The logic to throw an exception in the  
> `canWorkWithSameSession` doesn't make sense to me.
> I have a solution to fix this issue in my local branch at 
> https://github.com/thaibui/hive/commit/078a521b9d0906fe6c0323b63e567f6eee2f3a70.
>  When applied, the log will become like so
> {noformat}
> 2017-09-10T09:15:33,578 INFO  [Thread-239]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(533)) - Skipping default 
> session sessionId=6638b1da-0f8a-405e-85f0-9586f484e6de, queueName=llap, 
> user=hive, doAs=false, isOpen=true, isDefault=true, expires in 591868732ms 
> since it is being used.
> {noformat}
> A test case is provided in my branch to demonstrate how it works. If possible 
> I would like this patch to be applied to version 2.1, 2.2 and master. Since 
> we are using 2.1 LLAP in production with Hue 4, this patch is critical to our 
> success.
> Alternatively, if this patch is too broad in scope, I propose adding an 
> option to allow "skipping of currently used default sessions". With this new 
> option default to "false", existing behavior won't change unless the option 
> is turned on.
> I will prepare an official path if this change to master &/ the other 
> branches is acceptable. I'm not an contributor &/ committer, this will be my 
> first time contributing to Hive and the Apache foundation. Any early review 
> is greatly appreciated, thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17629) CachedStore - wait for prewarm at use time, not init time

2017-10-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17629:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> CachedStore - wait for prewarm at use time, not init time
> -
>
> Key: HIVE-17629
> URL: https://issues.apache.org/jira/browse/HIVE-17629
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-17629.patch
>
>
> Most of the changes are trivial, due to static changing to non-static. The 
> patch basically adds lifecycle management for shared cache, and uses it to 
> move prewarm to background thread, waiting for prewarm to happen at use time 
> not setConf, and to make init optional (if not called, CachedStore would 
> proxy the methods to rawStore, like it currently does in the ones not 
> implemented).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17659) get_token thrift call fails for DBTokenStore in remote HMS mode

2017-10-09 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17659:
---
Attachment: HIVE-17659.02.patch

Fixed the compile issue for the patch on master.

> get_token thrift call fails for DBTokenStore in remote HMS mode
> ---
>
> Key: HIVE-17659
> URL: https://issues.apache.org/jira/browse/HIVE-17659
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17659.01-branch-2.patch, HIVE-17659.01.patch, 
> HIVE-17659.02-branch-2.patch, HIVE-17659.02.patch
>
>
> The {{get_token(String tokenIdentifier)}} HMS thrift API fails when HMS is 
> deployed in remote mode and when there is no token found with that 
> tokenIndentifier. This could happen when an application calls a 
> renewDelegationToken on an expired/cancelled delegation token. The issue is 
> that get_token tries to return a null result values which cannot be done in 
> Thrift. The API call errors out with 
> {{org.apache.thrift.TApplicationException unknown result}} exception which is 
> uncaught and HS2 thrift server closes the client transport. So no further 
> calls from that connection can be accepted unless client reconnects to HS2 
> again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17747) HMS DropTableMessage should include the full table object

2017-10-09 Thread Dan Burkert (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Burkert updated HIVE-17747:
---
Status: Patch Available  (was: In Progress)

> HMS DropTableMessage should include the full table object
> -
>
> Key: HIVE-17747
> URL: https://issues.apache.org/jira/browse/HIVE-17747
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 2.3.0
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-17747.0.patch
>
>
> I have a notification log follower use-case which requires accessing the 
> parameters of dropped tables, so it would be useful if the {{DROP_TABLE}} 
> events in the notification log included the full table object, as the create 
> and alter events do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17747) HMS DropTableMessage should include the full table object

2017-10-09 Thread Dan Burkert (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Burkert updated HIVE-17747:
---
Attachment: HIVE-17747.0.patch

> HMS DropTableMessage should include the full table object
> -
>
> Key: HIVE-17747
> URL: https://issues.apache.org/jira/browse/HIVE-17747
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 2.3.0
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-17747.0.patch
>
>
> I have a notification log follower use-case which requires accessing the 
> parameters of dropped tables, so it would be useful if the {{DROP_TABLE}} 
> events in the notification log included the full table object, as the create 
> and alter events do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Work started] (HIVE-17747) HMS DropTableMessage should include the full table object

2017-10-09 Thread Dan Burkert (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17747 started by Dan Burkert.
--
> HMS DropTableMessage should include the full table object
> -
>
> Key: HIVE-17747
> URL: https://issues.apache.org/jira/browse/HIVE-17747
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 2.3.0
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-17747.0.patch
>
>
> I have a notification log follower use-case which requires accessing the 
> parameters of dropped tables, so it would be useful if the {{DROP_TABLE}} 
> events in the notification log included the full table object, as the create 
> and alter events do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17747) HMS DropTableMessage should include the full table object

2017-10-09 Thread Dan Burkert (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Burkert reassigned HIVE-17747:
--


> HMS DropTableMessage should include the full table object
> -
>
> Key: HIVE-17747
> URL: https://issues.apache.org/jira/browse/HIVE-17747
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 2.3.0
>Reporter: Dan Burkert
>Assignee: Dan Burkert
>
> I have a notification log follower use-case which requires accessing the 
> parameters of dropped tables, so it would be useful if the {{DROP_TABLE}} 
> events in the notification log included the full table object, as the create 
> and alter events do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17733) Move RawStore to standalone metastore

2017-10-09 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17733:
--
Status: Patch Available  (was: Open)

> Move RawStore to standalone metastore
> -
>
> Key: HIVE-17733
> URL: https://issues.apache.org/jira/browse/HIVE-17733
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>  Labels: pull-request-available
> Attachments: HIVE-17733.2.patch, HIVE-17733.patch
>
>
> This includes moving implementations of RawStore (like ObjectStore), 
> MetastoreDirectSql, and stats related classes like ColumnStatsAggregator and 
> the NDV classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17733) Move RawStore to standalone metastore

2017-10-09 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17733:
--
Attachment: HIVE-17733.2.patch

New version of the patch that fixes failure in TestMetastoreExpr

> Move RawStore to standalone metastore
> -
>
> Key: HIVE-17733
> URL: https://issues.apache.org/jira/browse/HIVE-17733
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>  Labels: pull-request-available
> Attachments: HIVE-17733.2.patch, HIVE-17733.patch
>
>
> This includes moving implementations of RawStore (like ObjectStore), 
> MetastoreDirectSql, and stats related classes like ColumnStatsAggregator and 
> the NDV classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17733) Move RawStore to standalone metastore

2017-10-09 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17733:
--
Status: Open  (was: Patch Available)

TestMetastoreExpr test failure is valid.  TestDbTxnManager2 passes for me 
locally.  Will take a look at why it failed in this run.

> Move RawStore to standalone metastore
> -
>
> Key: HIVE-17733
> URL: https://issues.apache.org/jira/browse/HIVE-17733
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>  Labels: pull-request-available
> Attachments: HIVE-17733.2.patch, HIVE-17733.patch
>
>
> This includes moving implementations of RawStore (like ObjectStore), 
> MetastoreDirectSql, and stats related classes like ColumnStatsAggregator and 
> the NDV classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-10-09 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197759#comment-16197759
 ] 

Mithun Radhakrishnan commented on HIVE-11548:
-

A quick update: After rebasing this patch, there seems to be a minor bug that 
breaks {{TestExtendedAcls}}. I'm trying to sort this out now.

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17743) Add InterfaceAudience and InterfaceStability annotations for Thrift generated APIs

2017-10-09 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197752#comment-16197752
 ] 

Alan Gates commented on HIVE-17743:
---

Ok, misunderstood.  I think the standalone-metastore part of this patch looks 
good then.

I don't have an opinion on the right thing to do for service-rpc, whether it 
should depend on a separate module or hive-common.

> Add InterfaceAudience and InterfaceStability annotations for Thrift generated 
> APIs
> --
>
> Key: HIVE-17743
> URL: https://issues.apache.org/jira/browse/HIVE-17743
> Project: Hive
>  Issue Type: Sub-task
>  Components: Thrift API
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17743.1.patch
>
>
> The Thrift generated files don't have {{InterfaceAudience}} or 
> {{InterfaceStability}} annotations on them, mainly because all the files are 
> auto-generated.
> We should add some code that auto-tags all the Java Thrift generated files 
> with these annotations. This way even when they are re-generated, they still 
> contain the annotations.
> We should be able to do this using the 
> {{com.google.code.maven-replacer-plugin}} similar to what we do in 
> {{standalone-metastore/pom.xml}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17534) Add a config to turn off parquet vectorization

2017-10-09 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-17534:
--

Assignee: Vihang Karajgaonkar

> Add a config to turn off parquet vectorization
> --
>
> Key: HIVE-17534
> URL: https://issues.apache.org/jira/browse/HIVE-17534
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> It should be a good addition to give an option for users to turn off parquet 
> vectorization without affecting vectorization on other file formats. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15267) Make query length calculation logic more accurate in TxnUtils.needNewQuery()

2017-10-09 Thread Steve Yeom (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Yeom updated HIVE-15267:
--
Attachment: HIVE-15267.02.patch

> Make query length calculation logic more accurate in TxnUtils.needNewQuery()
> 
>
> Key: HIVE-15267
> URL: https://issues.apache.org/jira/browse/HIVE-15267
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Wei Zheng
>Assignee: Steve Yeom
> Attachments: HIVE-15267.01.patch, HIVE-15267.02.patch
>
>
> In HIVE-15181 there's such review comment, for which this ticket will handle
> {code}
> in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do 
> the right thing.
> If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most 
> likely want each SQL string to be at most 1K.
> But if sizeInBytes=2047, this still returns false.
> It should include length of "suffix" in computation of sizeInBytes
> Along the same lines: the check for max query length is done after each batch 
> is already added to the query. Suppose there are 1000 9-digit txn IDs in each 
> IN(...). That's, conservatively, 18KB of text. So the length of each query is 
> increasing in 18KB chunks. 
> I think the check for query length should be done for each item in IN clause.
> If some DB has a limit on query length of X, then any query > X will fail. So 
> I think this must ensure not to produce any queries > X, even by 1 char.
> For example, case 3.1 of the UT generates a query of almost 4000 characters - 
> this is clearly > 1KB.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15267) Make query length calculation logic more accurate in TxnUtils.needNewQuery()

2017-10-09 Thread Steve Yeom (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Yeom updated HIVE-15267:
--
Status: Patch Available  (was: Open)

> Make query length calculation logic more accurate in TxnUtils.needNewQuery()
> 
>
> Key: HIVE-15267
> URL: https://issues.apache.org/jira/browse/HIVE-15267
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 2.1.0, 1.2.1
>Reporter: Wei Zheng
>Assignee: Steve Yeom
> Attachments: HIVE-15267.01.patch, HIVE-15267.02.patch
>
>
> In HIVE-15181 there's such review comment, for which this ticket will handle
> {code}
> in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do 
> the right thing.
> If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most 
> likely want each SQL string to be at most 1K.
> But if sizeInBytes=2047, this still returns false.
> It should include length of "suffix" in computation of sizeInBytes
> Along the same lines: the check for max query length is done after each batch 
> is already added to the query. Suppose there are 1000 9-digit txn IDs in each 
> IN(...). That's, conservatively, 18KB of text. So the length of each query is 
> increasing in 18KB chunks. 
> I think the check for query length should be done for each item in IN clause.
> If some DB has a limit on query length of X, then any query > X will fail. So 
> I think this must ensure not to produce any queries > X, even by 1 char.
> For example, case 3.1 of the UT generates a query of almost 4000 characters - 
> this is clearly > 1KB.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15267) Make query length calculation logic more accurate in TxnUtils.needNewQuery()

2017-10-09 Thread Steve Yeom (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Yeom updated HIVE-15267:
--
Status: Open  (was: Patch Available)

> Make query length calculation logic more accurate in TxnUtils.needNewQuery()
> 
>
> Key: HIVE-15267
> URL: https://issues.apache.org/jira/browse/HIVE-15267
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 2.1.0, 1.2.1
>Reporter: Wei Zheng
>Assignee: Steve Yeom
> Attachments: HIVE-15267.01.patch
>
>
> In HIVE-15181 there's such review comment, for which this ticket will handle
> {code}
> in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do 
> the right thing.
> If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most 
> likely want each SQL string to be at most 1K.
> But if sizeInBytes=2047, this still returns false.
> It should include length of "suffix" in computation of sizeInBytes
> Along the same lines: the check for max query length is done after each batch 
> is already added to the query. Suppose there are 1000 9-digit txn IDs in each 
> IN(...). That's, conservatively, 18KB of text. So the length of each query is 
> increasing in 18KB chunks. 
> I think the check for query length should be done for each item in IN clause.
> If some DB has a limit on query length of X, then any query > X will fail. So 
> I think this must ensure not to produce any queries > X, even by 1 char.
> For example, case 3.1 of the UT generates a query of almost 4000 characters - 
> this is clearly > 1KB.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15267) Make query length calculation logic more accurate in TxnUtils.needNewQuery()

2017-10-09 Thread Steve Yeom (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Yeom updated HIVE-15267:
--
Attachment: (was: HIVE-15267.02.patch)

> Make query length calculation logic more accurate in TxnUtils.needNewQuery()
> 
>
> Key: HIVE-15267
> URL: https://issues.apache.org/jira/browse/HIVE-15267
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Wei Zheng
>Assignee: Steve Yeom
> Attachments: HIVE-15267.01.patch
>
>
> In HIVE-15181 there's such review comment, for which this ticket will handle
> {code}
> in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do 
> the right thing.
> If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most 
> likely want each SQL string to be at most 1K.
> But if sizeInBytes=2047, this still returns false.
> It should include length of "suffix" in computation of sizeInBytes
> Along the same lines: the check for max query length is done after each batch 
> is already added to the query. Suppose there are 1000 9-digit txn IDs in each 
> IN(...). That's, conservatively, 18KB of text. So the length of each query is 
> increasing in 18KB chunks. 
> I think the check for query length should be done for each item in IN clause.
> If some DB has a limit on query length of X, then any query > X will fail. So 
> I think this must ensure not to produce any queries > X, even by 1 char.
> For example, case 3.1 of the UT generates a query of almost 4000 characters - 
> this is clearly > 1KB.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15267) Make query length calculation logic more accurate in TxnUtils.needNewQuery()

2017-10-09 Thread Steve Yeom (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Yeom updated HIVE-15267:
--
Attachment: HIVE-15267.02.patch

> Make query length calculation logic more accurate in TxnUtils.needNewQuery()
> 
>
> Key: HIVE-15267
> URL: https://issues.apache.org/jira/browse/HIVE-15267
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Wei Zheng
>Assignee: Steve Yeom
> Attachments: HIVE-15267.01.patch, HIVE-15267.02.patch
>
>
> In HIVE-15181 there's such review comment, for which this ticket will handle
> {code}
> in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do 
> the right thing.
> If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most 
> likely want each SQL string to be at most 1K.
> But if sizeInBytes=2047, this still returns false.
> It should include length of "suffix" in computation of sizeInBytes
> Along the same lines: the check for max query length is done after each batch 
> is already added to the query. Suppose there are 1000 9-digit txn IDs in each 
> IN(...). That's, conservatively, 18KB of text. So the length of each query is 
> increasing in 18KB chunks. 
> I think the check for query length should be done for each item in IN clause.
> If some DB has a limit on query length of X, then any query > X will fail. So 
> I think this must ensure not to produce any queries > X, even by 1 char.
> For example, case 3.1 of the UT generates a query of almost 4000 characters - 
> this is clearly > 1KB.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17726) Using exists may lead to incorrect results

2017-10-09 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17726:
---
Attachment: HIVE-17726.1.patch

> Using exists may lead to incorrect results
> --
>
> Key: HIVE-17726
> URL: https://issues.apache.org/jira/browse/HIVE-17726
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Zoltan Haindrich
>Assignee: Vineet Garg
> Attachments: HIVE-17726.1.patch
>
>
> {code}
> drop table if exists tx1;
> create table tx1 (a integer,b integer);
> insert into tx1   values  (1, 1),
> (1, 2),
> (1, 3);
> select count(*) as result,3 as expected from tx1 u
> where exists (select * from tx1 v where u.a=v.a and u.b <> v.b);
> select count(*) as result,3 as expected from tx1 u
> where exists (select * from tx1 v where u.a=v.a and u.b <> v.b limit 1);
> {code}
> current results are 6 and 2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17726) Using exists may lead to incorrect results

2017-10-09 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17726:
---
Status: Patch Available  (was: Open)

> Using exists may lead to incorrect results
> --
>
> Key: HIVE-17726
> URL: https://issues.apache.org/jira/browse/HIVE-17726
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Zoltan Haindrich
>Assignee: Vineet Garg
> Attachments: HIVE-17726.1.patch
>
>
> {code}
> drop table if exists tx1;
> create table tx1 (a integer,b integer);
> insert into tx1   values  (1, 1),
> (1, 2),
> (1, 3);
> select count(*) as result,3 as expected from tx1 u
> where exists (select * from tx1 v where u.a=v.a and u.b <> v.b);
> select count(*) as result,3 as expected from tx1 u
> where exists (select * from tx1 v where u.a=v.a and u.b <> v.b limit 1);
> {code}
> current results are 6 and 2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17629) CachedStore - wait for prewarm at use time, not init time

2017-10-09 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197726#comment-16197726
 ] 

Alan Gates commented on HIVE-17629:
---

Ok, I missed the call to RawStore first.

+1

> CachedStore - wait for prewarm at use time, not init time
> -
>
> Key: HIVE-17629
> URL: https://issues.apache.org/jira/browse/HIVE-17629
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17629.patch
>
>
> Most of the changes are trivial, due to static changing to non-static. The 
> patch basically adds lifecycle management for shared cache, and uses it to 
> move prewarm to background thread, waiting for prewarm to happen at use time 
> not setConf, and to make init optional (if not called, CachedStore would 
> proxy the methods to rawStore, like it currently does in the ones not 
> implemented).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17746) Regenerate spark_explainuser_1.q.out

2017-10-09 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197672#comment-16197672
 ] 

Vineet Garg commented on HIVE-17746:


Hi [~pvary]. Thanks for taking care of this! The changes are good and are 
expected.  +1.


> Regenerate spark_explainuser_1.q.out
> 
>
> Key: HIVE-17746
> URL: https://issues.apache.org/jira/browse/HIVE-17746
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17746.patch
>
>
> There is 2 changes in  spark_explainuser_1.q.out:
> 1., After HIVE-17465, the row numbers are different in the explain plans. 
> [~vgarg], [~ashutoshc]: Could you please check, wether it is an intended 
> change?
> 2., After HIVE-17535, CBO optimization turned on and the output of the 
> following query changed:
> {code:title=Query}
> explain select explode(array('a', 'b'));
> {code}
> {code:title=Original}
>  POSTHOOK: query: explain select explode(array('a', 'b'))
>  POSTHOOK: type: QUERY
>  Plan not optimized by CBO.
>  
>  Stage-0
>Fetch Operator
>  limit:-1
> UDTF Operator [UDTF_2]
>   function name:explode
>   Select Operator [SEL_1]
> Output:["_col0"]
> TableScan [TS_0]
> {code}
> {code:title=New}
>  POSTHOOK: query: explain select explode(array('a', 'b'))
>  POSTHOOK: type: QUERY
>  Plan optimized by CBO.
>  
>  Stage-0
>Fetch Operator
>  limit:-1
> Select Operator [SEL_3]
>   Output:["_col0"]
>   UDTF Operator [UDTF_2]
> function name:explode
> Select Operator [SEL_1]
>   Output:["_col0"]
>   TableScan [TS_0]
> {code}
> This 2nd change does not look like a successful optimization for me. Is it 
> planned :)
> If you think these are planned changes, then I think it would be good to 
> update the golden file.
> Thanks,
> Peter



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17609) Tool to manipulate delegation tokens

2017-10-09 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17609:

   Resolution: Fixed
Fix Version/s: 2.2.1
   2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

> Tool to manipulate delegation tokens
> 
>
> Key: HIVE-17609
> URL: https://issues.apache.org/jira/browse/HIVE-17609
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Security
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-17609.1-branch-2.2.patch, 
> HIVE-17609.1-branch-2.patch, HIVE-17609.1.patch, HIVE-17609.2.patch
>
>
> This was precipitated by OOZIE-2797. We had a case in production where the 
> number of active metastore delegation tokens outstripped the ZooKeeper 
> {{jute.maxBuffer}} size. Delegation tokens could neither be fetched, nor be 
> cancelled. 
> The root-cause turned out to be a miscommunication, causing delegation tokens 
> fetched by Oozie *not* to be cancelled automatically from HCat. This was 
> sorted out as part of OOZIE-2797.
> The issue exposed how poor the log-messages were, in the code pertaining to 
> token fetch/cancellation. We also found need for a tool to query/list/purge 
> delegation tokens that might have expired already. This patch introduces such 
> a tool, and improves the log-messages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17609) Tool to manipulate delegation tokens

2017-10-09 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197661#comment-16197661
 ] 

Mithun Radhakrishnan commented on HIVE-17609:
-

Committed to {{master}}, {{branch-2}}, and {{branch-2.2}}. 

> Tool to manipulate delegation tokens
> 
>
> Key: HIVE-17609
> URL: https://issues.apache.org/jira/browse/HIVE-17609
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Security
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-17609.1-branch-2.2.patch, 
> HIVE-17609.1-branch-2.patch, HIVE-17609.1.patch, HIVE-17609.2.patch
>
>
> This was precipitated by OOZIE-2797. We had a case in production where the 
> number of active metastore delegation tokens outstripped the ZooKeeper 
> {{jute.maxBuffer}} size. Delegation tokens could neither be fetched, nor be 
> cancelled. 
> The root-cause turned out to be a miscommunication, causing delegation tokens 
> fetched by Oozie *not* to be cancelled automatically from HCat. This was 
> sorted out as part of OOZIE-2797.
> The issue exposed how poor the log-messages were, in the code pertaining to 
> token fetch/cancellation. We also found need for a tool to query/list/purge 
> delegation tokens that might have expired already. This patch introduces such 
> a tool, and improves the log-messages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17609) Tool to manipulate delegation tokens

2017-10-09 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197657#comment-16197657
 ] 

Mithun Radhakrishnan commented on HIVE-17609:
-

Thank you for the review, [~owen.omalley]. I'll check this in shortly.

> Tool to manipulate delegation tokens
> 
>
> Key: HIVE-17609
> URL: https://issues.apache.org/jira/browse/HIVE-17609
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Security
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17609.1-branch-2.2.patch, 
> HIVE-17609.1-branch-2.patch, HIVE-17609.1.patch, HIVE-17609.2.patch
>
>
> This was precipitated by OOZIE-2797. We had a case in production where the 
> number of active metastore delegation tokens outstripped the ZooKeeper 
> {{jute.maxBuffer}} size. Delegation tokens could neither be fetched, nor be 
> cancelled. 
> The root-cause turned out to be a miscommunication, causing delegation tokens 
> fetched by Oozie *not* to be cancelled automatically from HCat. This was 
> sorted out as part of OOZIE-2797.
> The issue exposed how poor the log-messages were, in the code pertaining to 
> token fetch/cancellation. We also found need for a tool to query/list/purge 
> delegation tokens that might have expired already. This patch introduces such 
> a tool, and improves the log-messages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17553) CBO wrongly type cast decimal literal to int

2017-10-09 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17553:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to master.

> CBO wrongly type cast decimal literal to int
> 
>
> Key: HIVE-17553
> URL: https://issues.apache.org/jira/browse/HIVE-17553
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17553.1.patch, HIVE-17553.2.patch
>
>
> {code:sql}explain select 100.000BD from f{code}
> {noformat}
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: f
>   Select Operator
> expressions: 100 (type: int)
> outputColumnNames: _col0
> ListSink
> {noformat}
> Notice that the expression 100.000BD is of type int instead of decimal.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-11266) count(*) wrong result based on table statistics for external tables

2017-10-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11266:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Regenerated q file for Spark test and pushed to master, thanks for reviewing 
[~ashutoshc]!

> count(*) wrong result based on table statistics for external tables
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-11266.01.patch, HIVE-11266.patch
>
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17629) CachedStore - wait for prewarm at use time, not init time

2017-10-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197632#comment-16197632
 ] 

Sergey Shelukhin commented on HIVE-17629:
-

The cache is not usable before prewarm by design, currently. Ditto for locks... 
[~thejas] may have more context. Both indeed are out of the scope of the JIRA.

As for the return, this is a pattern used for the methods that edit stuff. Note 
that all of them call rawStore first (because in the normal case they'd call 
both rawStore AND cache). Then, if the cache is null (which means the 
initialization was never triggered) they'd return.
E.g.
{noformat}
 rawStore.createDatabase(db);
+SharedCache sharedCache = sharedCacheWrapper.get();
+if (sharedCache == null) return;
{noformat}

> CachedStore - wait for prewarm at use time, not init time
> -
>
> Key: HIVE-17629
> URL: https://issues.apache.org/jira/browse/HIVE-17629
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17629.patch
>
>
> Most of the changes are trivial, due to static changing to non-static. The 
> patch basically adds lifecycle management for shared cache, and uses it to 
> move prewarm to background thread, waiting for prewarm to happen at use time 
> not setConf, and to make init optional (if not called, CachedStore would 
> proxy the methods to rawStore, like it currently does in the ones not 
> implemented).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17702) incorrect isRepeating handling in decimal reader in ORC

2017-10-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17702:

   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed to branches. Thanks for the review!

> incorrect isRepeating handling in decimal reader in ORC
> ---
>
> Key: HIVE-17702
> URL: https://issues.apache.org/jira/browse/HIVE-17702
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17702.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17731) add a backward compat option for external users to HIVE-11985

2017-10-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17731:

   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master and branch-2; fixed the description text on commit.
Thanks for the reviews!

> add a backward compat option for external users to HIVE-11985
> -
>
> Key: HIVE-17731
> URL: https://issues.apache.org/jira/browse/HIVE-17731
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17731.patch
>
>
> See HIVE-11985.
> Some external callers (e.g. Presto) do not appear to process types from 
> deserializer correctly, relying on DB types. Ideally, it should be resolved 
> via HIVE-17714, hiding the custom SerDe logic from users.
> For now we can add a backward compatibility config for such cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17742) AccumuloIndexedOutputFormat Use SLF4J

2017-10-09 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17742:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Beluga!

> AccumuloIndexedOutputFormat Use SLF4J
> -
>
> Key: HIVE-17742
> URL: https://issues.apache.org/jira/browse/HIVE-17742
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-17742.1.patch
>
>
> {{org.apache.hadoop.hive.accumulo.mr.AccumuloIndexedOutputFormat}}
> # Change to use SL4J instead of core Log4J classes
> # Use SL4J parameterization



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17740) HiveConf - Use SLF4J Parameterization

2017-10-09 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17740:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Beluga!

> HiveConf - Use SLF4J Parameterization  
> ---
>
> Key: HIVE-17740
> URL: https://issues.apache.org/jira/browse/HIVE-17740
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-17740.patch
>
>
> {{org.apache.hadoop.hive.conf.HiveConf}}
> # Parameterize the SLF4J logging
> # Refactor log variable name to align with rest of code base
> # Couple of small nit-picks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17732) Minor Improvements - org.apache.hive.hcatalog.data.JsonSerDe.java

2017-10-09 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17732:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Beluga!

> Minor Improvements - org.apache.hive.hcatalog.data.JsonSerDe.java
> -
>
> Key: HIVE-17732
> URL: https://issues.apache.org/jira/browse/HIVE-17732
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-17732.patch
>
>
> Some simple improvements for org.apache.hive.hcatalog.data.JsonSerDe
> Remove superfluous logging, cut down on object instantiation 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics for external tables

2017-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197551#comment-16197551
 ] 

Hive QA commented on HIVE-11266:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12891109/HIVE-11266.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11191 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=101)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[stats_noscan_2] 
(batchId=117)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7197/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7197/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7197/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12891109 - PreCommit-HIVE-Build

> count(*) wrong result based on table statistics for external tables
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Attachments: HIVE-11266.01.patch, HIVE-11266.patch
>
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17732) Minor Improvements - org.apache.hive.hcatalog.data.JsonSerDe.java

2017-10-09 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197537#comment-16197537
 ] 

Ashutosh Chauhan commented on HIVE-17732:
-

+1

> Minor Improvements - org.apache.hive.hcatalog.data.JsonSerDe.java
> -
>
> Key: HIVE-17732
> URL: https://issues.apache.org/jira/browse/HIVE-17732
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-17732.patch
>
>
> Some simple improvements for org.apache.hive.hcatalog.data.JsonSerDe
> Remove superfluous logging, cut down on object instantiation 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

1 2 >

1 - 100 of 163 matches

Mail list logo