[jira] [Commented] (HIVE-17226) Use strong hashing as security improvement

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159753#comment-16159753
 ] 

Hive QA commented on HIVE-17226:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880114/HIVE-17226.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11027 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_mask_hash] 
(batchId=28)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=241)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6742/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6742/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6742/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880114 - PreCommit-HIVE-Build

> Use strong hashing as security improvement
> --
>
> Key: HIVE-17226
> URL: https://issues.apache.org/jira/browse/HIVE-17226
> Project: Hive
>  Issue Type: Improvement
>  Components: Security
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17226.1.patch
>
>
> There have been 2 places identified where weak hashing needs to be replaced 
> by SHA256.
> 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is 
> mapped to SHA-1, which is not secure enough according to today's standards. 
> We should use SHA-256 instead.
> 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak 
> and should be replaced by DigestUtils.sha256Hex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17475) Disable mapjoin using hint

2017-09-08 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159734#comment-16159734
 ] 

Lefty Leverenz commented on HIVE-17475:
---

Doc note:  The hint "+ mapjoin(None)" should be documented in the wiki, with 
version information.

I suggest documenting it in three places:

* [Configuration Properties -- hive.auto.convert.join | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.auto.convert.join]
* [Joins -- MapJoin Restrictions | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-MapJoinRestrictions]
* [Join Optimization -- Optimize Auto Join Conversion | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion]
 (or add a new subsection after this one)

Added a TODOC3.0 label.

> Disable mapjoin using hint
> --
>
> Key: HIVE-17475
> URL: https://issues.apache.org/jira/browse/HIVE-17475
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17475.1.patch, HIVE-17475.2.patch
>
>
> Using hint disable mapjoin for a given query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16895) Multi-threaded execution of bootstrap dump of partitions

2017-09-08 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159718#comment-16159718
 ] 

Lefty Leverenz commented on HIVE-16895:
---

Thanks for the documentation, [~anishek].  I removed the TODOC3.0 label.

Here's a direct link to the doc:

* [hive.repl.partitions.dump.parallelism | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.repl.partitions.dump.parallelism]

>  Multi-threaded execution of bootstrap dump of partitions
> -
>
> Key: HIVE-16895
> URL: https://issues.apache.org/jira/browse/HIVE-16895
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-16895.1.patch, HIVE-16895.2.patch
>
>
> to allow faster execution of bootstrap dump phase we dump multiple partitions 
> from same table simultaneously. 
> even though dumping  functions is  not going to be a blocker, moving to 
> similar execution modes for all metastore objects will make code more 
> coherent. 
> Bootstrap dump at db level does :
> * boostrap of all tables
> ** boostrap of all partitions in a table.  (scope of current jira) 
> * boostrap of all functions 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17475) Disable mapjoin using hint

2017-09-08 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-17475:
--
Labels: TODOC3.0  (was: )

> Disable mapjoin using hint
> --
>
> Key: HIVE-17475
> URL: https://issues.apache.org/jira/browse/HIVE-17475
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17475.1.patch, HIVE-17475.2.patch
>
>
> Using hint disable mapjoin for a given query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16895) Multi-threaded execution of bootstrap dump of partitions

2017-09-08 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16895:
--
Labels:   (was: TODOC3.0)

>  Multi-threaded execution of bootstrap dump of partitions
> -
>
> Key: HIVE-16895
> URL: https://issues.apache.org/jira/browse/HIVE-16895
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-16895.1.patch, HIVE-16895.2.patch
>
>
> to allow faster execution of bootstrap dump phase we dump multiple partitions 
> from same table simultaneously. 
> even though dumping  functions is  not going to be a blocker, moving to 
> similar execution modes for all metastore objects will make code more 
> coherent. 
> Bootstrap dump at db level does :
> * boostrap of all tables
> ** boostrap of all partitions in a table.  (scope of current jira) 
> * boostrap of all functions 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17403) Fail concatenation for unmanaged and transactional tables

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159711#comment-16159711
 ] 

Hive QA commented on HIVE-17403:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886159/HIVE-17403.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6741/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6741/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6741/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-09-09 03:48:08.441
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6741/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-09-09 03:48:08.443
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 5df1540 HIVE-17480: repl dump sub dir should use UUID instead of 
timestamp (Tao Li, reviewed by Daniel Dai)
+ git clean -f -d
Removing 
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java.orig
Removing 
metastore/src/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 5df1540 HIVE-17480: repl dump sub dir should use UUID instead of 
timestamp (Tao Li, reviewed by Daniel Dai)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-09-09 03:48:13.680
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java: No such file or 
directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java: No 
such file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java: No such 
file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: No such file 
or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java: 
No such file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: No 
such file or directory
error: a/ql/src/test/results/clientnegative/merge_negative_3.q.out: No such 
file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886159 - PreCommit-HIVE-Build

> Fail concatenation for unmanaged and transactional tables
> -
>
> Key: HIVE-17403
> URL: https://issues.apache.org/jira/browse/HIVE-17403
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Blocker
> Attachments: HIVE-17403.1.patch, HIVE-17403.2.patch
>
>
> ALTER TABLE .. CONCATENATE should fail if the table is not managed by hive. 
> For unmanaged tables, file names can be anything. Hive has some assumptions 
> about file names which can result in data loss for unmanaged tables. 
> Example of this is a table/partition having 2 different files files 
> (part-m-0__1417075294718 and part-m-00018__1417075294718). Although both 
> are completely different files, hive thinks these are files generated by 
> separate instances of same task (because of failure or speculative 
> execution). Hive will end up removing this file
> {code}
> 2017-08-28T18:19:29,516 WARN  [b27f10d5-d957-4695-ab2a-1453401793df 

[jira] [Commented] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159710#comment-16159710
 ] 

Hive QA commented on HIVE-17472:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886137/HIVE-17472.2-branch-2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10590 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[drop_deleted_partitions] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_count_distinct]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=144)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=104)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=125)
org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228)
org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition 
(batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6740/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6740/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6740/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886137 - PreCommit-HIVE-Build

> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17472.1.patch, HIVE-17472.2-branch-2.patch, 
> HIVE-17472.2.patch
>
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17487) Example fails on the Hive Getting started page

2017-09-08 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159704#comment-16159704
 ] 

Lefty Leverenz commented on HIVE-17487:
---

[~jakab922], you can fix this yourself if you get wiki edit privileges:

* [About This Wiki -- How to get permission to edit | 
https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit]

> Example fails on the Hive Getting started page
> --
>
> Key: HIVE-17487
> URL: https://issues.apache.org/jira/browse/HIVE-17487
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Papp
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> There is an example on [Hive Getting 
> Started|https://cwiki.apache.org/confluence/display/Hive/GettingStarted] page 
> using the MovieLens100k dataset. The mapper is defined as a python script in 
> the following way:
> {code}
> import sys
> import datetime
> for line in sys.stdin:
>   line = line.strip()
>   userid, movieid, rating, unixtime = line.split('\t')
>   weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
>   print '\t'.join([userid, movieid, rating, str(weekday)])
> {code}
> which is correct assuming you're using the python 2 series. The following 
> code works with both 2 and 3 series:
> {code}
> from __future__ import print_function
> import sys
> import datetime
> for line in sys.stdin:
>   line = line.strip()
>   userid, movieid, rating, unixtime = line.split('\t')
>   weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
>   print('\t'.join([userid, movieid, rating, str(weekday)]))
> {code}
> I think this should be corrected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159686#comment-16159686
 ] 

Hive QA commented on HIVE-17466:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886133/HIVE-17466.2-branch-2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10603 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=144)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=125)
org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228)
org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition 
(batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6739/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6739/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6739/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886133 - PreCommit-HIVE-Build

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch, HIVE-17466.2-branch-2.patch, 
> HIVE-17466.2.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15899) check CTAS over acid table

2017-09-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15899:
--
Attachment: HIVE-15899.01.patch

> check CTAS over acid table 
> ---
>
> Key: HIVE-15899
> URL: https://issues.apache.org/jira/browse/HIVE-15899
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15899.01.patch
>
>
> need to add a test to check if create table as works correctly with acid 
> tables



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-09-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17386:

Attachment: HIVE-17386.03.patch

Fixing TestTezTask

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.01.patch, HIVE-17386.02.patch, HIVE-17386.03.patch, 
> HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17317) Make Dbcp configurable using hive properties in hive-site.xml

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159666#comment-16159666
 ] 

Hive QA commented on HIVE-17317:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886094/HIVE-17317.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11034 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.org.apache.hadoop.hive.cli.TestHBaseCliDriver
 (batchId=94)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6738/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6738/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6738/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886094 - PreCommit-HIVE-Build

> Make Dbcp configurable using hive properties in hive-site.xml
> -
>
> Key: HIVE-17317
> URL: https://issues.apache.org/jira/browse/HIVE-17317
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17317.01.patch, HIVE-17317.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17371) Fix DBTokenStore and ZKTokenStore for the stand-alone metastore

2017-09-08 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159643#comment-16159643
 ] 

Vihang Karajgaonkar commented on HIVE-17371:


Hi [~alangates] I think it makes sense to move rather than copy 
{{DBTokenStore}} to metastore. Currently, I see that we have a copy of 
{{MemoryTokenStore}} presumably one for HS2 and one for HMS? Does it make sense 
to have two copies?

> Fix DBTokenStore and ZKTokenStore for the stand-alone metastore
> ---
>
> Key: HIVE-17371
> URL: https://issues.apache.org/jira/browse/HIVE-17371
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> The {{getTokenStore}} method will not work for the {{DBTokenStore}} and 
> {{ZKTokenStore}} since they implement 
> {{org.apache.hadoop.hive.thrift.DelegationTokenStore}} instead of  
> {{org.apache.hadoop.hive.metastore.security.DelegationTokenStore}}
> {code}
> private DelegationTokenStore getTokenStore(Configuration conf) throws 
> IOException {
> String tokenStoreClassName =
> MetastoreConf.getVar(conf, 
> MetastoreConf.ConfVars.DELEGATION_TOKEN_STORE_CLS, "");
> // The second half of this if is to catch cases where users are passing 
> in a HiveConf for
> // configuration.  It will have set the default value of
> // "hive.cluster.delegation.token.store .class" to
> // "org.apache.hadoop.hive.thrift.MemoryTokenStore" as part of its 
> construction.  But this is
> // the hive-shims version of the memory store.  We want to convert this 
> to our default value.
> if (StringUtils.isBlank(tokenStoreClassName) ||
> 
> "org.apache.hadoop.hive.thrift.MemoryTokenStore".equals(tokenStoreClassName)) 
> {
>   return new MemoryTokenStore();
> }
> try {
>   Class storeClass =
>   
> Class.forName(tokenStoreClassName).asSubclass(DelegationTokenStore.class);
>   return ReflectionUtils.newInstance(storeClass, conf);
> } catch (ClassNotFoundException e) {
>   throw new IOException("Error initializing delegation token store: " + 
> tokenStoreClassName, e);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17366) Constraint replication in bootstrap

2017-09-08 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-17366:
--
Attachment: HIVE-17366.3.patch

Upload the patch corresponds to the last pull request changes.

> Constraint replication in bootstrap
> ---
>
> Key: HIVE-17366
> URL: https://issues.apache.org/jira/browse/HIVE-17366
> Project: Hive
>  Issue Type: New Feature
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-17366.1.patch, HIVE-17366.2.patch, 
> HIVE-17366.3.patch
>
>
> Incremental constraint replication is tracked in HIVE-15705. This is to track 
> the bootstrap replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16669) Fine tune Compaction to take advantage of Acid 2.0

2017-09-08 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135751#comment-16135751
 ] 

Eugene Koifman edited comment on HIVE-16669 at 9/9/17 12:47 AM:


see todo: TestAcidOnTez.testCtasTezUnion - todo tagged with this HIVE-16669

also note: in the same test, minor compaction creates 
delete_delta_018_021  even though the smallest delete_delta has txn 20. 
 Tighten the bounds to make file selection more efficient.


was (Author: ekoifman):
see todo: TestAcidOnTez.testCtasTezUnion - todo tagged with this HIVE-16669

> Fine tune Compaction to take advantage of Acid 2.0
> --
>
> Key: HIVE-16669
> URL: https://issues.apache.org/jira/browse/HIVE-16669
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> * There is little point using 2.0 vectorized reader since there is no 
> operator pipeline in compaction
> * If minor compaction just concats delete_delta files together, then the 2 
> stage compaction should always ensure that we have a limited number of Orc 
> readers to do the merging and current OrcRawRecordMerger should be fine
> * ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17432) Enable join and aggregate materialized view rewriting

2017-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159610#comment-16159610
 ] 

ASF GitHub Bot commented on HIVE-17432:
---

GitHub user jcamachor opened a pull request:

https://github.com/apache/hive/pull/245

HIVE-17432: Enable join and aggregate materialized view rewriting



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jcamachor/hive calcite-mvs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/245.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #245


commit 934920f604d5621e0d6a3761b3de3680fbea2b50
Author: Jesus Camacho Rodriguez 
Date:   2017-04-20T16:47:12Z

HIVE-17432: Enable join and aggregate materialized view rewriting




> Enable join and aggregate materialized view rewriting
> -
>
> Key: HIVE-17432
> URL: https://issues.apache.org/jira/browse/HIVE-17432
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17432.patch
>
>
> Enable Calcite materialized view based rewriting for queries containing joins 
> and aggregates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17432) Enable join and aggregate materialized view rewriting

2017-09-08 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17432:
---
Status: Patch Available  (was: In Progress)

> Enable join and aggregate materialized view rewriting
> -
>
> Key: HIVE-17432
> URL: https://issues.apache.org/jira/browse/HIVE-17432
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17432.patch
>
>
> Enable Calcite materialized view based rewriting for queries containing joins 
> and aggregates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17432) Enable join and aggregate materialized view rewriting

2017-09-08 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17432:
---
Attachment: HIVE-17432.patch

> Enable join and aggregate materialized view rewriting
> -
>
> Key: HIVE-17432
> URL: https://issues.apache.org/jira/browse/HIVE-17432
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17432.patch
>
>
> Enable Calcite materialized view based rewriting for queries containing joins 
> and aggregates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16669) Fine tune Compaction to take advantage of Acid 2.0

2017-09-08 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135751#comment-16135751
 ] 

Eugene Koifman edited comment on HIVE-16669 at 9/9/17 12:45 AM:


see todo: TestAcidOnTez.testCtasTezUnion - todo tagged with this HIVE-16669


was (Author: ekoifman):
see todo: TestAcidOnTez.testCtasTezUnion

> Fine tune Compaction to take advantage of Acid 2.0
> --
>
> Key: HIVE-16669
> URL: https://issues.apache.org/jira/browse/HIVE-16669
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> * There is little point using 2.0 vectorized reader since there is no 
> operator pipeline in compaction
> * If minor compaction just concats delete_delta files together, then the 2 
> stage compaction should always ensure that we have a limited number of Orc 
> readers to do the merging and current OrcRawRecordMerger should be fine
> * ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17030) HPL/SQL: Many cast operations are ignored without warning or notice.

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159590#comment-16159590
 ] 

Hive QA commented on HIVE-17030:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886087/HIVE-17030.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11035 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6737/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6737/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6737/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886087 - PreCommit-HIVE-Build

> HPL/SQL: Many cast operations are ignored without warning or notice.
> 
>
> Key: HIVE-17030
> URL: https://issues.apache.org/jira/browse/HIVE-17030
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Carter Shanklin
>Assignee: Dmitry Tolpeko
>Priority: Critical
> Attachments: HIVE-17030.1.patch
>
>
> This bug is part of a series of issues and surprising behavior I encountered 
> writing a reporting script that would aggregate values and give rows 
> different classifications based on an the aggregate. Addressing some or all 
> of these issues would make HPL/SQL more accessible to newcomers.
> Consider this code:
> {code}
>   val1d := cast('10.0' as double);
>   val2d := cast('5.0' as double);
>   declare val1i int = 5;
>   declare val2i int = 5;
>   val1i = val1d;
>   diff := val1i - val2i;
> {code}
> What is the value of diff? You might think it is 5 but in fact it is 0. Why? 
> Because when you attempt to assign val1i to val1d, this code in Var.java is 
> executed:
> {code}
> else if (type == Type.BIGINT) {
>   if (val.type == Type.STRING) {
> value = Long.parseLong((String)val.value);
>   }
> }
> else if (type == Type.DECIMAL) {
> {code}
> Since there is no case for assigning a double to a bigint, the expression is 
> essentially ignored and the value remains the same. This behavior leads to 
> many surprising results.
> It would be best if HPL/SQL could re-use the cast code from Hive since there 
> are a lot of cases to consider.
> Version = 3.0.0-SNAPSHOT r71f52d8ad512904b3f2c4f04fe39a33f2834f1f2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-08 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17493:
---
Attachment: HIVE-17493.1.patch

> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17493.1.patch
>
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-08 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17493:
---
Status: Patch Available  (was: Open)

> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17493.1.patch
>
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-08 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-17493:
--


> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-08 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17465:
---
Status: Patch Available  (was: Open)

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-08 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17465:
---
Attachment: HIVE-17465.1.patch

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17366) Constraint replication in bootstrap

2017-09-08 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159528#comment-16159528
 ] 

Thejas M Nair commented on HIVE-17366:
--

+1

> Constraint replication in bootstrap
> ---
>
> Key: HIVE-17366
> URL: https://issues.apache.org/jira/browse/HIVE-17366
> Project: Hive
>  Issue Type: New Feature
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-17366.1.patch, HIVE-17366.2.patch
>
>
> Incremental constraint replication is tracked in HIVE-15705. This is to track 
> the bootstrap replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17338) Utilities.get*Tasks multiple methods duplicate code

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159505#comment-16159505
 ] 

Hive QA commented on HIVE-17338:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886077/HIVE-17338.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11033 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6736/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6736/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6736/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886077 - PreCommit-HIVE-Build

> Utilities.get*Tasks multiple methods duplicate code
> ---
>
> Key: HIVE-17338
> URL: https://issues.apache.org/jira/browse/HIVE-17338
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Gergely Hajós
> Attachments: HIVE-17338.1.patch
>
>
> As discussed in https://github.com/apache/hive/pull/212/files, the 3 
> functions can share a more general function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Attachment: HIVE-17489.2-branch-2.patch

Patch for {{branch-2}}.

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17489.1.patch, HIVE-17489.2-branch-2.patch, 
> HIVE-17489.2.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17490) Utility method to get list of all HS2 direct URIs from ZK URI

2017-09-08 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159471#comment-16159471
 ] 

Thejas M Nair commented on HIVE-17490:
--

New method on the lines of following in HiveConnection
{code}
@param zookeeperBasedHS2Url - zookeeper based url
@return list of direct HS2 urls it found using zookeeper service discovery url. 
If the url is not a zk based url, it would return that url
public static List getAllUrls(String zookeeperBasedHS2Url);
{code}

> Utility method to get list of all HS2 direct URIs from ZK URI
> -
>
> Key: HIVE-17490
> URL: https://issues.apache.org/jira/browse/HIVE-17490
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Reporter: Thejas M Nair
>Assignee: Teddy Choi
>
> Administrators needs to be able to kill queries based on query ID, if those 
> queries have been launched via hiveserver2. 
> There can be multiple HS2 instances and only one of them will be running this 
> query. Applications will need to connect to each instance and invoke the 
> command to kill the query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15212) merge branch into master

2017-09-08 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15212:
-
Attachment: HIVE-15212.10.patch

patch 10 for test

> merge branch into master
> 
>
> Key: HIVE-15212
> URL: https://issues.apache.org/jira/browse/HIVE-15212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-15212.01.patch, HIVE-15212.02.patch, 
> HIVE-15212.03.patch, HIVE-15212.04.patch, HIVE-15212.05.patch, 
> HIVE-15212.06.patch, HIVE-15212.07.patch, HIVE-15212.08.patch, 
> HIVE-15212.09.patch, HIVE-15212.10.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17490) Utility method to get list of all HS2 direct URIs from ZK URI

2017-09-08 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-17490:
-
Description: 
Administrators needs to be able to kill queries based on query ID, if those 
queries have been launched via hiveserver2. 

There can be multiple HS2 instances and only one of them will be running this 
query. Applications will need to connect to each instance and invoke the 
command to kill the query.


  was:
Hive studio needs to be able to kill queries based on query ID, if those query 
have been launched via hiveserver2. 

There can be multiple HS2 instances and only one of them will be running this 
query. Applications will need to connect to each instance and invoke the 
command to kill the query.



> Utility method to get list of all HS2 direct URIs from ZK URI
> -
>
> Key: HIVE-17490
> URL: https://issues.apache.org/jira/browse/HIVE-17490
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Reporter: Thejas M Nair
>Assignee: Teddy Choi
>
> Administrators needs to be able to kill queries based on query ID, if those 
> queries have been launched via hiveserver2. 
> There can be multiple HS2 instances and only one of them will be running this 
> query. Applications will need to connect to each instance and invoke the 
> command to kill the query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-09-08 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159467#comment-16159467
 ] 

Daniel Dai commented on HIVE-16886:
---

[~anishek], the patch is out of sync especially in TestObjectStore.java. Can 
you rebase?

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, 
> HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch, 
> HIVE-16886.5.patch, HIVE-16886.6.patch, HIVE-16886.7.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17492) authorization of kill command

2017-09-08 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reassigned HIVE-17492:



> authorization of kill command
> -
>
> Key: HIVE-17492
> URL: https://issues.apache.org/jira/browse/HIVE-17492
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>
> Killing the query should require admin privileges.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17491) kill command to kill queries using query id

2017-09-08 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reassigned HIVE-17491:



> kill command to kill queries using query id
> ---
>
> Key: HIVE-17491
> URL: https://issues.apache.org/jira/browse/HIVE-17491
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Teddy Choi
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17490) Utility method to get list of all HS2 direct URIs from ZK URI

2017-09-08 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reassigned HIVE-17490:



> Utility method to get list of all HS2 direct URIs from ZK URI
> -
>
> Key: HIVE-17490
> URL: https://issues.apache.org/jira/browse/HIVE-17490
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Reporter: Thejas M Nair
>Assignee: Teddy Choi
>
> Hive studio needs to be able to kill queries based on query ID, if those 
> query have been launched via hiveserver2. 
> There can be multiple HS2 instances and only one of them will be running this 
> query. Applications will need to connect to each instance and invoke the 
> command to kill the query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Status: Patch Available  (was: Open)

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17489.1.patch, HIVE-17489.2.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Attachment: HIVE-17489.2.patch

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17489.1.patch, HIVE-17489.2.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Status: Open  (was: Patch Available)

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17489.1.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Status: Patch Available  (was: Open)

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17489.1.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17489:

Attachment: HIVE-17489.1.patch

Patch for master.

> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17489.1.patch
>
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17489) Separate client-facing and server-side Kerberos principals, to support HA

2017-09-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17489:
---


> Separate client-facing and server-side Kerberos principals, to support HA
> -
>
> Key: HIVE-17489
> URL: https://issues.apache.org/jira/browse/HIVE-17489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
>
> On deployments of the Hive metastore where a farm of servers is fronted by a 
> VIP, the hostname of the VIP (e.g. {{mycluster-hcat.blue.myth.net}}) will 
> differ from the actual boxen in the farm (.e.g 
> {{mycluster-hcat-\[0..3\].blue.myth.net}}).
> Such a deployment messes up Kerberos auth, with principals like 
> {{hcat/mycluster-hcat.blue.myth@grid.myth.net}}. Host-based checks will 
> disallow servers behind the VIP from using the VIP's hostname in its 
> principal when accessing, say, HDFS.
> The solution would be to decouple the server-side principal (used to access 
> other services like HDFS as a client) from the client-facing principal (used 
> from Hive-client, BeeLine, etc.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17426) Execution framework in hive to run tasks in parallel other than MR Tasks

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159412#comment-16159412
 ] 

Hive QA commented on HIVE-17426:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886048/HIVE-17426.4.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11025 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) 
(batchId=218)
TestReplicationScenariosAcrossInstances - did not produce a TEST-*.xml file 
(likely timed out) (batchId=218)
TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely 
timed out) (batchId=218)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6735/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6735/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6735/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886048 - PreCommit-HIVE-Build

> Execution framework in hive to run tasks in parallel other than MR Tasks
> 
>
> Key: HIVE-17426
> URL: https://issues.apache.org/jira/browse/HIVE-17426
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-17426.0.patch, HIVE-17426.1.patch, 
> HIVE-17426.2.patch, HIVE-17426.3.patch, HIVE-17426.4.patch
>
>
> the execution framework currently only runs MR Tasks in parallel when {{set 
> hive.exec.parallel=true}}.
> Allow other types of tasks to run in parallel as well to support replication 
> scenarios in hive. TezTask / SparkTask will still not be allowed to run in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17488) Move first set of classes to standalone metastore

2017-09-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17488:
--
Status: Patch Available  (was: Open)

> Move first set of classes to standalone metastore
> -
>
> Key: HIVE-17488
> URL: https://issues.apache.org/jira/browse/HIVE-17488
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17488.patch
>
>
> There are a whole set of classes that can be moved with few changes other 
> than the config file, shims, etc.  This task will move those classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17488) Move first set of classes to standalone metastore

2017-09-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17488:
--
Attachment: HIVE-17488.patch

> Move first set of classes to standalone metastore
> -
>
> Key: HIVE-17488
> URL: https://issues.apache.org/jira/browse/HIVE-17488
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17488.patch
>
>
> There are a whole set of classes that can be moved with few changes other 
> than the config file, shims, etc.  This task will move those classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17488) Move first set of classes to standalone metastore

2017-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159406#comment-16159406
 ] 

ASF GitHub Bot commented on HIVE-17488:
---

GitHub user alanfgates opened a pull request:

https://github.com/apache/hive/pull/244

HIVE-17488 Move first set of metastore objects to standalone-metastore



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alanfgates/hive hive17488

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/244.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #244


commit 6914a81f9b65f127cd9fc40f66ac8a7b663ed89b
Author: Alan Gates 
Date:   2017-07-25T21:18:11Z

Moved TableType

commit 100e59af9bb03311b1dde284f0e8a8f046374e43
Author: Alan Gates 
Date:   2017-07-25T21:40:57Z

Moved HiveMetaHook, HiveMetaHookLoader, and DefaultHiveMetaHook

commit 0c0a921a5f7ef20192d8d3e320cc0fa66c0e675f
Author: Alan Gates 
Date:   2017-07-25T21:50:55Z

Moved IExtrapolatePartStatus and LinearExtrapolatePartStatus

commit f1eca5fec96926d44d4355b144359f159e0f7057
Author: Alan Gates 
Date:   2017-07-25T21:59:44Z

Moved IHMSHandler

commit 803cec349fe3d487381fcc4c1af57fca22e3f7d7
Author: Alan Gates 
Date:   2017-07-27T21:44:05Z

Moved Annotations.

commit 2b56157f60990e9f94ce3b1ad8d020d75e776815
Author: Alan Gates 
Date:   2017-07-25T22:55:56Z

Moved MetaStoreFilterHook and DefaultMetaStoreFilterHookImpl.  Since they 
took HiveConf in the constructor this did force a constructor change.

commit d40a78e45eefb265c45ba25302a320073b247446
Author: Alan Gates 
Date:   2017-07-25T23:08:06Z

Moved PartitionSpecProxy and its subclasses.

commit 9612ba2a1ff2d5117df8d8ab8eeb970890638b19
Author: Alan Gates 
Date:   2017-07-27T22:10:24Z

Moved JDOConnectionURLHook

commit 18f80e23d53854b5ae327fd65dd1ace93fed8a72
Author: Alan Gates 
Date:   2017-07-27T22:18:55Z

Moved TServerSocketKeepAlive

commit c8991bab89fec5a3404238223913d18d86142d68
Author: Alan Gates 
Date:   2017-07-27T22:28:30Z

Moved PartitionDropOptions

commit c8c5389d14c10f1cbe6bf362a159c9efe99f2293
Author: Alan Gates 
Date:   2017-07-27T22:37:56Z

Moved MetaStoreEndFunctionContext and MetaStoreEndFunctionListener

commit b3d4d52f2d804b71ab8913a9ee7b810e8fa5d22b
Author: Alan Gates 
Date:   2017-07-27T23:10:00Z

Moved MetaStoreFS and HiveMetaStoreFsImpl.  Created FileUtils and 
MetaStoreUtils and copied a couple of files out of the corresponding classes in 
other packages.

commit 478c3d5afab83cd016f95a92387d10561b1329b6
Author: Alan Gates 
Date:   2017-07-27T23:19:56Z

Moved DatabaseProduct

commit 5fc0b72bb17cbe557ffaf11229c6e4ef18aa6a20
Author: Alan Gates 
Date:   2017-07-28T00:00:13Z

Moved HiveMetaException

commit a827a7ae3e890740b245ee383f1a1730b3988ef1
Author: Alan Gates 
Date:   2017-07-28T13:40:14Z

Moved AggregateStatsCache

commit cc371862e380f66b7cb1bdaf74d9470c47f8ff10
Author: Alan Gates 
Date:   2017-07-28T13:52:13Z

Moved LockComponentBuilder, LockRequestBuilder, and TestLockRequestBuilder

commit 2f4a537445d0aef4d5862d6f3e0c65bb522fa4e6
Author: Alan Gates 
Date:   2017-07-28T14:53:47Z

Moved MetaStoreThread.  This required switching it from HiveConf to 
Configuration, which trickled down to the compactor threads themselves.

commit 10fd1c7f06ea3775a381f97b52eccd442eea2d1a
Author: Alan Gates 
Date:   2017-07-28T18:34:26Z

Copied in HiveVersionAnnotation, HiveVersionInfo, and saveVersion.sh from 
common and switched MetaStoreSchemaInfo to use the new Metastore versions of 
these files.

commit 83b4f138ff233325e407f520ebc2a8bd10494e1e
Author: Alan Gates 
Date:   2017-07-28T20:31:36Z

Moved IMetaStoreSchemaInfo, MetaStoreSchemaInfo, 
MetaStoreSchemaInfoFactory, TestMetaStoreSchemaFactory, and 
TestMetaStoreSchemaInfo.

commit 34877b4c8a469b589f659209d67b8ba16d5a3030
Author: Alan Gates 
Date:   2017-07-28T21:00:35Z

Moved MetaStoreInit.  Created JavaUtils but only copied over the method 
used.

commit 2a52f6b6289011c6759a1b466368fc6db9e04bb3
Author: Alan Gates 
Date:   2017-08-17T23:59:34Z

Fixed compilation errors.

commit 58478570539c6484be1fd0f43c464356f713abd4
Author: Alan Gates 
Date:   2017-08-22T22:53:14Z

Moved PartitionExpressionProxy.


[jira] [Commented] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-08 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159396#comment-16159396
 ] 

Alan Gates commented on HIVE-17466:
---

I'm certainly ok with the intention.

At some point I'd like to rationalize the thrift API, as it has many many 
calls.  But we should do that holistically rather than start blocking 
individual changes.

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch, HIVE-17466.2-branch-2.patch, 
> HIVE-17466.2.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17480) repl dump sub dir should use UUID instead of timestamp

2017-09-08 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-17480:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Patch pushed to master.

> repl dump sub dir should use UUID instead of timestamp
> --
>
> Key: HIVE-17480
> URL: https://issues.apache.org/jira/browse/HIVE-17480
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Fix For: 3.0.0
>
> Attachments: HIVE-17480.1.patch
>
>
> This is to fix the concurrency issue that multiple dump operations could end 
> up using the same timestamp for the dump dir name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17226) Use strong hashing as security improvement

2017-09-08 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17226:
--
Status: Open  (was: Patch Available)

> Use strong hashing as security improvement
> --
>
> Key: HIVE-17226
> URL: https://issues.apache.org/jira/browse/HIVE-17226
> Project: Hive
>  Issue Type: Improvement
>  Components: Security
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17226.1.patch
>
>
> There have been 2 places identified where weak hashing needs to be replaced 
> by SHA256.
> 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is 
> mapped to SHA-1, which is not secure enough according to today's standards. 
> We should use SHA-256 instead.
> 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak 
> and should be replaced by DigestUtils.sha256Hex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17226) Use strong hashing as security improvement

2017-09-08 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17226:
--
Status: Patch Available  (was: Open)

> Use strong hashing as security improvement
> --
>
> Key: HIVE-17226
> URL: https://issues.apache.org/jira/browse/HIVE-17226
> Project: Hive
>  Issue Type: Improvement
>  Components: Security
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17226.1.patch
>
>
> There have been 2 places identified where weak hashing needs to be replaced 
> by SHA256.
> 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is 
> mapped to SHA-1, which is not secure enough according to today's standards. 
> We should use SHA-256 instead.
> 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak 
> and should be replaced by DigestUtils.sha256Hex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17459) View deletion operation failed to replicate on target cluster

2017-09-08 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159357#comment-16159357
 ] 

Tao Li commented on HIVE-17459:
---

Test results looks good (failures are irrelevant).

> View deletion operation failed to replicate on target cluster
> -
>
> Key: HIVE-17459
> URL: https://issues.apache.org/jira/browse/HIVE-17459
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17459.1.patch, HIVE-17459.2.patch
>
>
> View dropping is not replicated during incremental repl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17480) repl dump sub dir should use UUID instead of timestamp

2017-09-08 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159356#comment-16159356
 ] 

Tao Li commented on HIVE-17480:
---

Test result looks good.

> repl dump sub dir should use UUID instead of timestamp
> --
>
> Key: HIVE-17480
> URL: https://issues.apache.org/jira/browse/HIVE-17480
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17480.1.patch
>
>
> This is to fix the concurrency issue that multiple dump operations could end 
> up using the same timestamp for the dump dir name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17426) Execution framework in hive to run tasks in parallel other than MR Tasks

2017-09-08 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159353#comment-16159353
 ] 

Lefty Leverenz commented on HIVE-17426:
---

Config review:  Please use newlines in the description of 
*hive.print.task.graph* so that it won't all be on one line in the generated 
template file (hive-default.xml.template).

While you're at it, you could edit the description of 
*hive.repl.approx.max.load.tasks* which was added by HIVE-16896:

bq.  Provide and approximate of the max number of tasks that should be executed 
in before dynamically generating the next set of tasks. The number is an 
approximate as we will stop at slightly higher number than above, the reason 
being some events might lead to an task increment that would cross the above 
limit

Suggested edits:

bq.  Provide an approximation of the maximum number of tasks that should be 
executed before dynamically generating the next set of tasks. The number is 
approximate as Hive will stop at a slightly higher number, the reason being 
some events might lead to a task increment that would cross the specified limit.

Trivia:  Indentation should be standardized for both configs, same as for 
adjacent configs.

> Execution framework in hive to run tasks in parallel other than MR Tasks
> 
>
> Key: HIVE-17426
> URL: https://issues.apache.org/jira/browse/HIVE-17426
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-17426.0.patch, HIVE-17426.1.patch, 
> HIVE-17426.2.patch, HIVE-17426.3.patch, HIVE-17426.4.patch
>
>
> the execution framework currently only runs MR Tasks in parallel when {{set 
> hive.exec.parallel=true}}.
> Allow other types of tasks to run in parallel as well to support replication 
> scenarios in hive. TezTask / SparkTask will still not be allowed to run in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17403) Fail concatenation for unmanaged and transactional tables

2017-09-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159338#comment-16159338
 ] 

Sergey Shelukhin commented on HIVE-17403:
-

+1

> Fail concatenation for unmanaged and transactional tables
> -
>
> Key: HIVE-17403
> URL: https://issues.apache.org/jira/browse/HIVE-17403
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Blocker
> Attachments: HIVE-17403.1.patch, HIVE-17403.2.patch
>
>
> ALTER TABLE .. CONCATENATE should fail if the table is not managed by hive. 
> For unmanaged tables, file names can be anything. Hive has some assumptions 
> about file names which can result in data loss for unmanaged tables. 
> Example of this is a table/partition having 2 different files files 
> (part-m-0__1417075294718 and part-m-00018__1417075294718). Although both 
> are completely different files, hive thinks these are files generated by 
> separate instances of same task (because of failure or speculative 
> execution). Hive will end up removing this file
> {code}
> 2017-08-28T18:19:29,516 WARN  [b27f10d5-d957-4695-ab2a-1453401793df main]: 
> exec.Utilities (:()) - Duplicate taskid file removed: 
> file:/Users/table/part=20141120/.hive-staging_hive_2017-08-28_18-19-27_210_3381701454205724533-1/_tmp.-ext-1/part-m-00018__1417075294718
>  with length 958510. Existing file: 
> file:/Users/table/part=20141120/.hive-staging_hive_2017-08-28_18-19-27_210_3381701454205724533-1/_tmp.-ext-1/part-m-0__1417075294718
>  with length 1123116
> {code}
> DDL should restrict concatenation for unmanaged tables. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17475) Disable mapjoin using hint

2017-09-08 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17475:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Disable mapjoin using hint
> --
>
> Key: HIVE-17475
> URL: https://issues.apache.org/jira/browse/HIVE-17475
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-17475.1.patch, HIVE-17475.2.patch
>
>
> Using hint disable mapjoin for a given query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-09-08 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159336#comment-16159336
 ] 

Lefty Leverenz commented on HIVE-16896:
---

Thanks for the doc, [~anishek].  I'm suggesting some edits with HIVE-17426.

Here's the direct link:

* [hive.repl.approx.max.load.tasks | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.repl.approx.max.load.tasks]

Removed the TODOC3.0 label.

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, 
> HIVE-16896.3.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-09-08 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16896:
--
Labels:   (was: TODOC3.0)

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, 
> HIVE-16896.3.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17403) Fail concatenation for unmanaged and transactional tables

2017-09-08 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159260#comment-16159260
 ] 

Prasanth Jayachandran commented on HIVE-17403:
--

Added explicit ordering in tests. [~sershe] could you please take another look?

> Fail concatenation for unmanaged and transactional tables
> -
>
> Key: HIVE-17403
> URL: https://issues.apache.org/jira/browse/HIVE-17403
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Blocker
> Attachments: HIVE-17403.1.patch, HIVE-17403.2.patch
>
>
> ALTER TABLE .. CONCATENATE should fail if the table is not managed by hive. 
> For unmanaged tables, file names can be anything. Hive has some assumptions 
> about file names which can result in data loss for unmanaged tables. 
> Example of this is a table/partition having 2 different files files 
> (part-m-0__1417075294718 and part-m-00018__1417075294718). Although both 
> are completely different files, hive thinks these are files generated by 
> separate instances of same task (because of failure or speculative 
> execution). Hive will end up removing this file
> {code}
> 2017-08-28T18:19:29,516 WARN  [b27f10d5-d957-4695-ab2a-1453401793df main]: 
> exec.Utilities (:()) - Duplicate taskid file removed: 
> file:/Users/table/part=20141120/.hive-staging_hive_2017-08-28_18-19-27_210_3381701454205724533-1/_tmp.-ext-1/part-m-00018__1417075294718
>  with length 958510. Existing file: 
> file:/Users/table/part=20141120/.hive-staging_hive_2017-08-28_18-19-27_210_3381701454205724533-1/_tmp.-ext-1/part-m-0__1417075294718
>  with length 1123116
> {code}
> DDL should restrict concatenation for unmanaged tables. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17403) Fail concatenation for unmanaged and transactional tables

2017-09-08 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17403:
-
Attachment: HIVE-17403.2.patch

> Fail concatenation for unmanaged and transactional tables
> -
>
> Key: HIVE-17403
> URL: https://issues.apache.org/jira/browse/HIVE-17403
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Blocker
> Attachments: HIVE-17403.1.patch, HIVE-17403.2.patch
>
>
> ALTER TABLE .. CONCATENATE should fail if the table is not managed by hive. 
> For unmanaged tables, file names can be anything. Hive has some assumptions 
> about file names which can result in data loss for unmanaged tables. 
> Example of this is a table/partition having 2 different files files 
> (part-m-0__1417075294718 and part-m-00018__1417075294718). Although both 
> are completely different files, hive thinks these are files generated by 
> separate instances of same task (because of failure or speculative 
> execution). Hive will end up removing this file
> {code}
> 2017-08-28T18:19:29,516 WARN  [b27f10d5-d957-4695-ab2a-1453401793df main]: 
> exec.Utilities (:()) - Duplicate taskid file removed: 
> file:/Users/table/part=20141120/.hive-staging_hive_2017-08-28_18-19-27_210_3381701454205724533-1/_tmp.-ext-1/part-m-00018__1417075294718
>  with length 958510. Existing file: 
> file:/Users/table/part=20141120/.hive-staging_hive_2017-08-28_18-19-27_210_3381701454205724533-1/_tmp.-ext-1/part-m-0__1417075294718
>  with length 1123116
> {code}
> DDL should restrict concatenation for unmanaged tables. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17426) Execution framework in hive to run tasks in parallel other than MR Tasks

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159218#comment-16159218
 ] 

Hive QA commented on HIVE-17426:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886048/HIVE-17426.4.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11025 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) 
(batchId=218)
TestReplicationScenariosAcrossInstances - did not produce a TEST-*.xml file 
(likely timed out) (batchId=218)
TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely 
timed out) (batchId=218)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6734/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6734/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6734/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886048 - PreCommit-HIVE-Build

> Execution framework in hive to run tasks in parallel other than MR Tasks
> 
>
> Key: HIVE-17426
> URL: https://issues.apache.org/jira/browse/HIVE-17426
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-17426.0.patch, HIVE-17426.1.patch, 
> HIVE-17426.2.patch, HIVE-17426.3.patch, HIVE-17426.4.patch
>
>
> the execution framework currently only runs MR Tasks in parallel when {{set 
> hive.exec.parallel=true}}.
> Allow other types of tasks to run in parallel as well to support replication 
> scenarios in hive. TezTask / SparkTask will still not be allowed to run in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17488) Move first set of classes to standalone metastore

2017-09-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned HIVE-17488:
-


> Move first set of classes to standalone metastore
> -
>
> Key: HIVE-17488
> URL: https://issues.apache.org/jira/browse/HIVE-17488
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> There are a whole set of classes that can be moved with few changes other 
> than the config file, shims, etc.  This task will move those classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17359) Deal with TypeInfo dependencies in the metastore

2017-09-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17359:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Patch committed.  Thanks Owen for the review.

Owen's review is on the link github PR.

> Deal with TypeInfo dependencies in the metastore
> 
>
> Key: HIVE-17359
> URL: https://issues.apache.org/jira/browse/HIVE-17359
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 3.0.0
>
> Attachments: HIVE-17359.patch
>
>
> The metastore uses TypeInfo, which resides in the serdes package.  In order 
> to move the metastore to be separately releasable we need to deal with this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17359) Deal with TypeInfo dependencies in the metastore

2017-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159201#comment-16159201
 ] 

ASF GitHub Bot commented on HIVE-17359:
---

Github user asfgit closed the pull request at:

https://github.com/apache/hive/pull/239


> Deal with TypeInfo dependencies in the metastore
> 
>
> Key: HIVE-17359
> URL: https://issues.apache.org/jira/browse/HIVE-17359
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17359.patch
>
>
> The metastore uses TypeInfo, which resides in the serdes package.  In order 
> to move the metastore to be separately releasable we need to deal with this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-09-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159113#comment-16159113
 ] 

Sergio Peña commented on HIVE-16886:


It looks good [~daijy] [~anishek]. I run the Mysql tests in a cluster with HMS 
HA and execute several DDL statements. There are no duplicates nor gaps, this 
is a good approach. I haven't tested the performance, though, but that's 
something we would handle later if we see something bad with it.

+1

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, 
> HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch, 
> HIVE-16886.5.patch, HIVE-16886.6.patch, HIVE-16886.7.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17344) LocalCache element memory usage is not calculated properly.

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159097#comment-16159097
 ] 

Hive QA commented on HIVE-17344:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886027/HIVE-17344.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11032 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat]
 (batchId=170)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6733/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6733/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6733/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886027 - PreCommit-HIVE-Build

> LocalCache element memory usage is not calculated properly.
> ---
>
> Key: HIVE-17344
> URL: https://issues.apache.org/jira/browse/HIVE-17344
> Project: Hive
>  Issue Type: Bug
>Reporter: Janos Gub
>Assignee: Janos Gub
> Attachments: HIVE-17344.2.patch, HIVE-17344.patch
>
>
> Orc footer cache has a calculation of memory usage:
> {code:java}
> public int getMemoryUsage() {
>   return bb.remaining() + 100; // 100 is for 2 longs, BB and java overheads 
> (semi-arbitrary).
> }
> {code}
> ByteBuffer.remaining returns the remaining space in the bytebuffer, thus 
> allowing this cache have elements MAXWEIGHT/100 of arbitrary size. I think 
> the correct solution would be bb.capacity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17472:

Attachment: HIVE-17472.2-branch-2.patch

Patch for branch-2.

> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17472.1.patch, HIVE-17472.2-branch-2.patch, 
> HIVE-17472.2.patch
>
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17466:

Status: Patch Available  (was: Open)

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch, HIVE-17466.2-branch-2.patch, 
> HIVE-17466.2.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17466:

Status: Open  (was: Patch Available)

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch, HIVE-17466.2-branch-2.patch, 
> HIVE-17466.2.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-08 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159025#comment-16159025
 ] 

Mithun Radhakrishnan commented on HIVE-17466:
-

[~alangates], I was wondering if you're alright the intention of this API. 
We're using HIVE-17467 (that depends on this patch) for Oozie workflow launch, 
internally.

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch, HIVE-17466.2-branch-2.patch, 
> HIVE-17466.2.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17344) LocalCache element memory usage is not calculated properly.

2017-09-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159020#comment-16159020
 ] 

Sergey Shelukhin commented on HIVE-17344:
-

+1 pending tests

> LocalCache element memory usage is not calculated properly.
> ---
>
> Key: HIVE-17344
> URL: https://issues.apache.org/jira/browse/HIVE-17344
> Project: Hive
>  Issue Type: Bug
>Reporter: Janos Gub
>Assignee: Janos Gub
> Attachments: HIVE-17344.2.patch, HIVE-17344.patch
>
>
> Orc footer cache has a calculation of memory usage:
> {code:java}
> public int getMemoryUsage() {
>   return bb.remaining() + 100; // 100 is for 2 longs, BB and java overheads 
> (semi-arbitrary).
> }
> {code}
> ByteBuffer.remaining returns the remaining space in the bytebuffer, thus 
> allowing this cache have elements MAXWEIGHT/100 of arbitrary size. I think 
> the correct solution would be bb.capacity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17466:

Attachment: HIVE-17466.2-branch-2.patch

And here's the patch for branch-2.

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch, HIVE-17466.2-branch-2.patch, 
> HIVE-17466.2.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17475) Disable mapjoin using hint

2017-09-08 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159014#comment-16159014
 ] 

Jason Dere commented on HIVE-17475:
---

+1

> Disable mapjoin using hint
> --
>
> Key: HIVE-17475
> URL: https://issues.apache.org/jira/browse/HIVE-17475
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17475.1.patch, HIVE-17475.2.patch
>
>
> Using hint disable mapjoin for a given query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15212) merge branch into master

2017-09-08 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158955#comment-16158955
 ] 

Wei Zheng commented on HIVE-15212:
--

[~sershe] [~ekoifman] Patch 9 is the latest diff'ed patch between master and 
hive-14535 branch. Looks like there are some relevant test failures as well as 
many related to replication. I will try to fix those non-replication related 
failures today.

It's strange that the test run didn't generate a test report although it sent 
the result here to the JIRA.

> merge branch into master
> 
>
> Key: HIVE-15212
> URL: https://issues.apache.org/jira/browse/HIVE-15212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-15212.01.patch, HIVE-15212.02.patch, 
> HIVE-15212.03.patch, HIVE-15212.04.patch, HIVE-15212.05.patch, 
> HIVE-15212.06.patch, HIVE-15212.07.patch, HIVE-15212.08.patch, 
> HIVE-15212.09.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17486) Enable SharedWorkOptimizer in tez on HOS

2017-09-08 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158893#comment-16158893
 ] 

Sahil Takiar commented on HIVE-17486:
-

Is the {{SharedWorkOptimizer}} different from HoS's 
{{CombineEquivalentWorkResolver}}?

> Enable SharedWorkOptimizer in tez on HOS
> 
>
> Key: HIVE-17486
> URL: https://issues.apache.org/jira/browse/HIVE-17486
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
>
> in HIVE-16602, Implement shared scans with Tez.
> Given a query plan, the goal is to identify scans on input tables that can be 
> merged so the data is read only once. Optimization will be carried out at the 
> physical level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15212) merge branch into master

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158886#comment-16158886
 ] 

Hive QA commented on HIVE-15212:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886001/HIVE-15212.09.patch

{color:green}SUCCESS:{color} +1 due to 20 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 46 failed/errored test(s), 11042 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_all] (batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repl_2_exim_basic] 
(batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repl_3_exim_metadata] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin] (batchId=22)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[skewjoin] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query64] 
(batchId=234)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[skewjoin] 
(batchId=111)
org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion02 
(batchId=270)
org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion3 
(batchId=270)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion02
 (batchId=279)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion3
 (batchId=279)
org.apache.hadoop.hive.ql.parse.TestExportImport.dataImportAfterMetadataOnlyImport
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConcatenatePartitionedTable
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConcatenateTable 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testDropsWithCM 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testDumpLimit 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testEventTypesForDynamicAddPartitionByInsert
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testExchangePartition 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalAdds 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalInsertDropPartitionedTable
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalInsertDropUnpartitionedTable
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalInsertToPartition
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalInserts 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalLoad 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalLoadFailAndRetry
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalLoadWithVariableLengthEventId
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalRepeatEventOnExistingObject
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalRepeatEventOnMissingObject
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testInsertOverwriteOnPartitionedTableWithCM
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testInsertOverwriteOnUnpartitionedTableWithCM
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testInsertToMultiKeyPartition
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testRemoveStats 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testRenamePartitionWithCM
 (batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testRenameTableWithCM 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testStatus 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testTruncateTable 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testTruncateWithCM 
(batchId=218)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testViewsReplication 
(batchId=218)
org.apache.hive.hcatalog.api.repl.commands.TestCommands.testBasicReplEximCommands
 (batchId=180)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteTinyint 
(batchId=183)

[jira] [Updated] (HIVE-17419) ANALYZE TABLE...COMPUTE STATISTICS FOR COLUMNS command shows computed stats for masked tables

2017-09-08 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17419:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for reviewing [~ashutoshc]!

> ANALYZE TABLE...COMPUTE STATISTICS FOR COLUMNS command shows computed stats 
> for masked tables
> -
>
> Key: HIVE-17419
> URL: https://issues.apache.org/jira/browse/HIVE-17419
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17419.patch
>
>
> As {{ANALYZE TABLE...COMPUTE STATISTICS FOR COLUMNS}} is rewritten internally 
> as a {{SELECT}} query, there is an interaction between the rewriting to mask 
> the columns/rows interacts and the ColumnStatsSemanticAnalyzer rewriting that 
> leads to showing the computed stats after running the command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-08 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17468:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to master, thanks [~bslim]

> Shade and package appropriate jackson version for druid storage handler
> ---
>
> Key: HIVE-17468
> URL: https://issues.apache.org/jira/browse/HIVE-17468
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17468.2.patch, HIVE-17468.3.patch, 
> HIVE-17468.4.patch, HIVE-17468.patch, hive-druid-deps.txt
>
>
> Currently we are excluding all the jackson core dependencies coming from 
> druid. This is wrong in my opinion since this will lead to the packaging of 
> unwanted jackson library from other projects.
> As you can see the file hive-druid-deps.txt currently jacskon core is coming 
> from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
> by druid. This patch exclude the unwanted jars and make sure to bring in 
> druid jackson dependency from druid it self.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17410) repl load task during subsequent DAG generation does not start from the last partition processed

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158765#comment-16158765
 ] 

Hive QA commented on HIVE-17410:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885997/HIVE-17410.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11030 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6731/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6731/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6731/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885997 - PreCommit-HIVE-Build

> repl load task during subsequent DAG generation does not start from the last 
> partition processed
> 
>
> Key: HIVE-17410
> URL: https://issues.apache.org/jira/browse/HIVE-17410
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-17410.1.patch, HIVE-17410.2.patch, 
> HIVE-17410.3.patch
>
>
> DAG generation for repl load task was to be generated dynamically such that 
> if the load break happens at a partition load time then for subsequent runs 
> we should start post the last partition processed.
> We currently identify the point from where we have to process the event but 
> reinitialize the iterator to start from beginning of all partition's to 
> process.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17317) Make Dbcp configurable using hive properties in hive-site.xml

2017-09-08 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-17317:
---
Attachment: HIVE-17317.02.patch

Fixing unit tests.

> Make Dbcp configurable using hive properties in hive-site.xml
> -
>
> Key: HIVE-17317
> URL: https://issues.apache.org/jira/browse/HIVE-17317
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17317.01.patch, HIVE-17317.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17485) Hive-Druid table on indexing for few segments- DruidRecordWriter.pushSegments throws ArrayIndexOutOfBoundsException

2017-09-08 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158714#comment-16158714
 ] 

slim bouguerra commented on HIVE-17485:
---

[~dileep529] Thanks for reporting this issue, the shading patch should fix it. 

> Hive-Druid table on indexing for few segments- DruidRecordWriter.pushSegments 
> throws ArrayIndexOutOfBoundsException
> ---
>
> Key: HIVE-17485
> URL: https://issues.apache.org/jira/browse/HIVE-17485
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Dileep Kumar Chiguruvada
>Assignee: slim bouguerra
>
> Hive-Druid table on indexing for few segments DruidRecordWriter.pushSegments 
> throws ArrayIndexOutOfBoundsException.
> Error says
> {code}
> ERROR : Vertex failed, vertexName=Reducer 2, 
> vertexId=vertex_1502725432788_0017_2_01, diagnostics=[Task failed, 
> taskId=task_1502725432788_0017_2_01_02, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1502725432788_0017_2_01_02_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 
> 1) Column vector types: 1:TIMESTAMP, 2:LONG, 3:BYTES, 4:LONG, 5:LONG, 6:LONG, 
> 7:LONG, 8:LONG, 9:LONG, 10:LONG, 11:LONG, 12:LONG, 13:LONG, 14:LONG, 
> 15:BYTES, 16:BYTES, 17:BYTES, 18:BYTES, 19:BYTES, 20:LONG, 21:LONG, 22:LONG, 
> 23:LONG, 24:BYTES, 25:BYTES, 26:BYTES, 27:BYTES, 28:BYTES, 0:TIMESTAMP
> [1900-01-18 00:00:00.0, 2415038, "OLJNECAA", 0, 3, 1, 1900, 3, 1, 18, 
> 1, 1900, 1, 3, "Wednesday", "1900Q1", "N", "N", "N", 2415021, 2415020, 
> 2414673, 2414946, "N", "N", "N", "N", "N", 1900-01-18 00:00:00.0]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing vector batch (tag=0) (vectorizedVertexNum 1) Column vector types: 
> 1:TIMESTAMP, 2:LONG, 3:BYTES, 4:LONG, 5:LONG, 6:LONG, 7:LONG, 8:LONG, 9:LONG, 
> 10:LONG, 11:LONG, 12:LONG, 13:LONG, 14:LONG, 15:BYTES, 16:BYTES, 17:BYTES, 
> 18:BYTES, 19:BYTES, 20:LONG, 21:LONG, 22:LONG, 23:LONG, 24:BYTES, 25:BYTES, 
> 26:BYTES, 27:BYTES, 28:BYTES, 0:TIMESTAMP
> [1900-01-18 00:00:00.0, 2415038, "OLJNECAA", 0, 3, 1, 1900, 3, 1, 18, 
> 1, 1900, 1, 3, "Wednesday", "1900Q1", "N", "N", "N", 2415021, 2415020, 
> 2414673, 2414946, "N", "N", "N", "N", "N", 1900-01-18 00:00:00.0]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:406)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:248)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:189)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing vector batch (tag=0) (vectorizedVertexNum 1) Column 
> vector types: 1:TIMESTAMP, 2:LONG, 3:BYTES, 4:LONG, 5:LONG, 6:LONG, 7:LONG, 
> 8:LONG, 9:LONG, 10:LONG, 11:LONG, 12:LONG, 13:LONG, 14:LONG, 15:BYTES, 
> 16:BYTES, 17:BYTES, 

[jira] [Updated] (HIVE-17030) HPL/SQL: Many cast operations are ignored without warning or notice.

2017-09-08 Thread Dmitry Tolpeko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Tolpeko updated HIVE-17030:
--
Attachment: HIVE-17030.1.patch

> HPL/SQL: Many cast operations are ignored without warning or notice.
> 
>
> Key: HIVE-17030
> URL: https://issues.apache.org/jira/browse/HIVE-17030
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Carter Shanklin
>Assignee: Dmitry Tolpeko
>Priority: Critical
> Attachments: HIVE-17030.1.patch
>
>
> This bug is part of a series of issues and surprising behavior I encountered 
> writing a reporting script that would aggregate values and give rows 
> different classifications based on an the aggregate. Addressing some or all 
> of these issues would make HPL/SQL more accessible to newcomers.
> Consider this code:
> {code}
>   val1d := cast('10.0' as double);
>   val2d := cast('5.0' as double);
>   declare val1i int = 5;
>   declare val2i int = 5;
>   val1i = val1d;
>   diff := val1i - val2i;
> {code}
> What is the value of diff? You might think it is 5 but in fact it is 0. Why? 
> Because when you attempt to assign val1i to val1d, this code in Var.java is 
> executed:
> {code}
> else if (type == Type.BIGINT) {
>   if (val.type == Type.STRING) {
> value = Long.parseLong((String)val.value);
>   }
> }
> else if (type == Type.DECIMAL) {
> {code}
> Since there is no case for assigning a double to a bigint, the expression is 
> essentially ignored and the value remains the same. This behavior leads to 
> many surprising results.
> It would be best if HPL/SQL could re-use the cast code from Hive since there 
> are a lot of cases to consider.
> Version = 3.0.0-SNAPSHOT r71f52d8ad512904b3f2c4f04fe39a33f2834f1f2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17030) HPL/SQL: Many cast operations are ignored without warning or notice.

2017-09-08 Thread Dmitry Tolpeko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Tolpeko updated HIVE-17030:
--
Status: Patch Available  (was: Open)

> HPL/SQL: Many cast operations are ignored without warning or notice.
> 
>
> Key: HIVE-17030
> URL: https://issues.apache.org/jira/browse/HIVE-17030
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Carter Shanklin
>Assignee: Dmitry Tolpeko
>Priority: Critical
> Attachments: HIVE-17030.1.patch
>
>
> This bug is part of a series of issues and surprising behavior I encountered 
> writing a reporting script that would aggregate values and give rows 
> different classifications based on an the aggregate. Addressing some or all 
> of these issues would make HPL/SQL more accessible to newcomers.
> Consider this code:
> {code}
>   val1d := cast('10.0' as double);
>   val2d := cast('5.0' as double);
>   declare val1i int = 5;
>   declare val2i int = 5;
>   val1i = val1d;
>   diff := val1i - val2i;
> {code}
> What is the value of diff? You might think it is 5 but in fact it is 0. Why? 
> Because when you attempt to assign val1i to val1d, this code in Var.java is 
> executed:
> {code}
> else if (type == Type.BIGINT) {
>   if (val.type == Type.STRING) {
> value = Long.parseLong((String)val.value);
>   }
> }
> else if (type == Type.DECIMAL) {
> {code}
> Since there is no case for assigning a double to a bigint, the expression is 
> essentially ignored and the value remains the same. This behavior leads to 
> many surprising results.
> It would be best if HPL/SQL could re-use the cast code from Hive since there 
> are a lot of cases to consider.
> Version = 3.0.0-SNAPSHOT r71f52d8ad512904b3f2c4f04fe39a33f2834f1f2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17410) repl load task during subsequent DAG generation does not start from the last partition processed

2017-09-08 Thread Sankar Hariappan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158687#comment-16158687
 ] 

Sankar Hariappan commented on HIVE-17410:
-

+1

cc [~thejas], [~daijy]

> repl load task during subsequent DAG generation does not start from the last 
> partition processed
> 
>
> Key: HIVE-17410
> URL: https://issues.apache.org/jira/browse/HIVE-17410
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-17410.1.patch, HIVE-17410.2.patch, 
> HIVE-17410.3.patch
>
>
> DAG generation for repl load task was to be generated dynamically such that 
> if the load break happens at a partition load time then for subsequent runs 
> we should start post the last partition processed.
> We currently identify the point from where we have to process the event but 
> reinitialize the iterator to start from beginning of all partition's to 
> process.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17473) implement workload management pools

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158639#comment-16158639
 ] 

Hive QA commented on HIVE-17473:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885984/HIVE-17473.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11041 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit (batchId=276)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking9 
(batchId=282)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=227)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6730/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6730/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6730/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885984 - PreCommit-HIVE-Build

> implement workload management pools
> ---
>
> Key: HIVE-17473
> URL: https://issues.apache.org/jira/browse/HIVE-17473
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17473.only.patch, HIVE-17473.patch, 
> HIVE-17473.WIP.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17338) Utilities.get*Tasks multiple methods duplicate code

2017-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Hajós updated HIVE-17338:
-
Status: Patch Available  (was: Open)

> Utilities.get*Tasks multiple methods duplicate code
> ---
>
> Key: HIVE-17338
> URL: https://issues.apache.org/jira/browse/HIVE-17338
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Gergely Hajós
> Attachments: HIVE-17338.1.patch
>
>
> As discussed in https://github.com/apache/hive/pull/212/files, the 3 
> functions can share a more general function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17338) Utilities.get*Tasks multiple methods duplicate code

2017-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Hajós updated HIVE-17338:
-
Attachment: HIVE-17338.1.patch

code refactoring: eliminate code duplicates with the use of generics

> Utilities.get*Tasks multiple methods duplicate code
> ---
>
> Key: HIVE-17338
> URL: https://issues.apache.org/jira/browse/HIVE-17338
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Gergely Hajós
> Attachments: HIVE-17338.1.patch
>
>
> As discussed in https://github.com/apache/hive/pull/212/files, the 3 
> functions can share a more general function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17482) External LLAP client: acquire locks for tables queried directly by LLAP

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158517#comment-16158517
 ] 

Hive QA commented on HIVE-17482:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885966/HIVE-17482.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11030 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.createTable (batchId=282)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6729/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6729/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6729/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885966 - PreCommit-HIVE-Build

> External LLAP client: acquire locks for tables queried directly by LLAP
> ---
>
> Key: HIVE-17482
> URL: https://issues.apache.org/jira/browse/HIVE-17482
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17482.1.patch
>
>
> When using the LLAP external client with simple queries (filter/project of 
> single table), the appropriate locks should be taken on the table being read 
> like they are for normal Hive queries. This is important in the case of 
> transactional tables being queried, since the compactor relies on the 
> presence of table locks to determine whether it can safely delete old 
> versions of compacted files without affecting currently running queries.
> This does not have to happen in the complex query case, since a query is used 
> (with the appropriate locking mechanisms) to create/populate the temp table 
> holding the results to the complex query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17426) Execution framework in hive to run tasks in parallel other than MR Tasks

2017-09-08 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-17426:
---
Attachment: HIVE-17426.4.patch

docs, added test case, changed default number of tasks in hiveConf for parallel 
upload as memory pressure is not too much for 10k tasks

> Execution framework in hive to run tasks in parallel other than MR Tasks
> 
>
> Key: HIVE-17426
> URL: https://issues.apache.org/jira/browse/HIVE-17426
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-17426.0.patch, HIVE-17426.1.patch, 
> HIVE-17426.2.patch, HIVE-17426.3.patch, HIVE-17426.4.patch
>
>
> the execution framework currently only runs MR Tasks in parallel when {{set 
> hive.exec.parallel=true}}.
> Allow other types of tasks to run in parallel as well to support replication 
> scenarios in hive. TezTask / SparkTask will still not be allowed to run in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158443#comment-16158443
 ] 

Hive QA commented on HIVE-17466:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885946/HIVE-17466.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11016 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=101)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6728/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6728/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6728/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885946 - PreCommit-HIVE-Build

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch, HIVE-17466.2.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17426) Execution framework in hive to run tasks in parallel other than MR Tasks

2017-09-08 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-17426:
---
Attachment: HIVE-17426.3.patch

ability to print graphs for tasks along with fix to allow parallel task to run 

> Execution framework in hive to run tasks in parallel other than MR Tasks
> 
>
> Key: HIVE-17426
> URL: https://issues.apache.org/jira/browse/HIVE-17426
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-17426.0.patch, HIVE-17426.1.patch, 
> HIVE-17426.2.patch, HIVE-17426.3.patch
>
>
> the execution framework currently only runs MR Tasks in parallel when {{set 
> hive.exec.parallel=true}}.
> Allow other types of tasks to run in parallel as well to support replication 
> scenarios in hive. TezTask / SparkTask will still not be allowed to run in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17450) rename TestTxnCommandsBase

2017-09-08 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17450:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master.
Thanks for the review [~kgyrtkirk]!

> rename TestTxnCommandsBase 
> ---
>
> Key: HIVE-17450
> URL: https://issues.apache.org/jira/browse/HIVE-17450
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Peter Vary
> Fix For: 3.0.0
>
> Attachments: HIVE-17450.02.patch, HIVE-17450.patch
>
>
> TestTxnCommandsBase is an abstract class, added in HIVE-17205; it matches the 
> maven test pattern...because of that there is a failining test in every test 
> output



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17344) LocalCache element memory usage is not calculated properly.

2017-09-08 Thread Janos Gub (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Gub updated HIVE-17344:
-
Attachment: HIVE-17344.2.patch

Attaching patch reflecting to Sergey's comments.

> LocalCache element memory usage is not calculated properly.
> ---
>
> Key: HIVE-17344
> URL: https://issues.apache.org/jira/browse/HIVE-17344
> Project: Hive
>  Issue Type: Bug
>Reporter: Janos Gub
>Assignee: Janos Gub
> Attachments: HIVE-17344.2.patch, HIVE-17344.patch
>
>
> Orc footer cache has a calculation of memory usage:
> {code:java}
> public int getMemoryUsage() {
>   return bb.remaining() + 100; // 100 is for 2 longs, BB and java overheads 
> (semi-arbitrary).
> }
> {code}
> ByteBuffer.remaining returns the remaining space in the bytebuffer, thus 
> allowing this cache have elements MAXWEIGHT/100 of arbitrary size. I think 
> the correct solution would be bb.capacity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17485) Hive-Druid table on indexing for few segments- DruidRecordWriter.pushSegments throws ArrayIndexOutOfBoundsException

2017-09-08 Thread Dileep Kumar Chiguruvada (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dileep Kumar Chiguruvada updated HIVE-17485:

Affects Version/s: (was: 2.1.0)
   3.0.0

> Hive-Druid table on indexing for few segments- DruidRecordWriter.pushSegments 
> throws ArrayIndexOutOfBoundsException
> ---
>
> Key: HIVE-17485
> URL: https://issues.apache.org/jira/browse/HIVE-17485
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Dileep Kumar Chiguruvada
>Assignee: slim bouguerra
>
> Hive-Druid table on indexing for few segments DruidRecordWriter.pushSegments 
> throws ArrayIndexOutOfBoundsException.
> Error says
> {code}
> ERROR : Vertex failed, vertexName=Reducer 2, 
> vertexId=vertex_1502725432788_0017_2_01, diagnostics=[Task failed, 
> taskId=task_1502725432788_0017_2_01_02, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1502725432788_0017_2_01_02_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 
> 1) Column vector types: 1:TIMESTAMP, 2:LONG, 3:BYTES, 4:LONG, 5:LONG, 6:LONG, 
> 7:LONG, 8:LONG, 9:LONG, 10:LONG, 11:LONG, 12:LONG, 13:LONG, 14:LONG, 
> 15:BYTES, 16:BYTES, 17:BYTES, 18:BYTES, 19:BYTES, 20:LONG, 21:LONG, 22:LONG, 
> 23:LONG, 24:BYTES, 25:BYTES, 26:BYTES, 27:BYTES, 28:BYTES, 0:TIMESTAMP
> [1900-01-18 00:00:00.0, 2415038, "OLJNECAA", 0, 3, 1, 1900, 3, 1, 18, 
> 1, 1900, 1, 3, "Wednesday", "1900Q1", "N", "N", "N", 2415021, 2415020, 
> 2414673, 2414946, "N", "N", "N", "N", "N", 1900-01-18 00:00:00.0]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing vector batch (tag=0) (vectorizedVertexNum 1) Column vector types: 
> 1:TIMESTAMP, 2:LONG, 3:BYTES, 4:LONG, 5:LONG, 6:LONG, 7:LONG, 8:LONG, 9:LONG, 
> 10:LONG, 11:LONG, 12:LONG, 13:LONG, 14:LONG, 15:BYTES, 16:BYTES, 17:BYTES, 
> 18:BYTES, 19:BYTES, 20:LONG, 21:LONG, 22:LONG, 23:LONG, 24:BYTES, 25:BYTES, 
> 26:BYTES, 27:BYTES, 28:BYTES, 0:TIMESTAMP
> [1900-01-18 00:00:00.0, 2415038, "OLJNECAA", 0, 3, 1, 1900, 3, 1, 18, 
> 1, 1900, 1, 3, "Wednesday", "1900Q1", "N", "N", "N", 2415021, 2415020, 
> 2414673, 2414946, "N", "N", "N", "N", "N", 1900-01-18 00:00:00.0]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:406)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:248)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:189)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing vector batch (tag=0) (vectorizedVertexNum 1) Column 
> vector types: 1:TIMESTAMP, 2:LONG, 3:BYTES, 4:LONG, 5:LONG, 6:LONG, 7:LONG, 
> 8:LONG, 9:LONG, 10:LONG, 11:LONG, 12:LONG, 13:LONG, 14:LONG, 15:BYTES, 
> 16:BYTES, 17:BYTES, 18:BYTES, 19:BYTES, 20:LONG, 

[jira] [Commented] (HIVE-17480) repl dump sub dir should use UUID instead of timestamp

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158352#comment-16158352
 ] 

Hive QA commented on HIVE-17480:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885921/HIVE-17480.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11030 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_udf_octet_length] 
(batchId=2)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6727/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6727/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6727/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885921 - PreCommit-HIVE-Build

> repl dump sub dir should use UUID instead of timestamp
> --
>
> Key: HIVE-17480
> URL: https://issues.apache.org/jira/browse/HIVE-17480
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17480.1.patch
>
>
> This is to fix the concurrency issue that multiple dump operations could end 
> up using the same timestamp for the dump dir name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17486) Enable SharedWorkOptimizer in tez on HOS

2017-09-08 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel reassigned HIVE-17486:
---


> Enable SharedWorkOptimizer in tez on HOS
> 
>
> Key: HIVE-17486
> URL: https://issues.apache.org/jira/browse/HIVE-17486
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
>
> in HIVE-16602, Implement shared scans with Tez.
> Given a query plan, the goal is to identify scans on input tables that can be 
> merged so the data is read only once. Optimization will be carried out at the 
> physical level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17483) HS2 kill command to kill queries using query id

2017-09-08 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-17483:
-

Assignee: Teddy Choi

> HS2 kill command to kill queries using query id
> ---
>
> Key: HIVE-17483
> URL: https://issues.apache.org/jira/browse/HIVE-17483
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Teddy Choi
>
> For administrators, it is important to be able to kill queries if required. 
> Currently, there is no clean way to do it.
> It would help to have a "kill query " command that can be run using 
> odbc/jdbc against a HiveServer2 instance, to kill a query with that queryid 
> running in that instance.
> Authorization will have to be done to ensure that the user that is invoking 
> the API is allowed to perform this action.
> In case of SQL std authorization, this would require admin role.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17338) Utilities.get*Tasks multiple methods duplicate code

2017-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Hajós reassigned HIVE-17338:


Assignee: Gergely Hajós

> Utilities.get*Tasks multiple methods duplicate code
> ---
>
> Key: HIVE-17338
> URL: https://issues.apache.org/jira/browse/HIVE-17338
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Gergely Hajós
>
> As discussed in https://github.com/apache/hive/pull/212/files, the 3 
> functions can share a more general function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17362) The MAX_PREWARM_TIME should be configurable on HoS

2017-09-08 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158275#comment-16158275
 ] 

Peter Vary commented on HIVE-17362:
---

Thanks [~leftylev] and [~xuefuz]!

> The MAX_PREWARM_TIME should be configurable on HoS
> --
>
> Key: HIVE-17362
> URL: https://issues.apache.org/jira/browse/HIVE-17362
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Fix For: 3.0.0
>
> Attachments: HIVE-17362.2.patch, HIVE-17362.patch
>
>
> When using HIVE_PREWARM_ENABLED, we are waiting MAX_PREWARM_TIME for the 
> containers to warm up. This is currently set to 5s. This is often not enough 
> for a spark session to initialize the executors. We should be able to 
> configure this, so we can set a value which has an effect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17475) Disable mapjoin using hint

2017-09-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158254#comment-16158254
 ] 

Hive QA commented on HIVE-17475:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885927/HIVE-17475.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11031 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6726/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6726/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6726/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885927 - PreCommit-HIVE-Build

> Disable mapjoin using hint
> --
>
> Key: HIVE-17475
> URL: https://issues.apache.org/jira/browse/HIVE-17475
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17475.1.patch, HIVE-17475.2.patch
>
>
> Using hint disable mapjoin for a given query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17485) Hive-Druid table on indexing for few segments- DruidRecordWriter.pushSegments throws ArrayIndexOutOfBoundsException

2017-09-08 Thread Dileep Kumar Chiguruvada (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158216#comment-16158216
 ] 

Dileep Kumar Chiguruvada commented on HIVE-17485:
-

Even with setting ( "druid.query.granularity" = "DAY", 
"druid.segment.granularity" = "YEAR")  still seeing ArrayIndexOutofBound for 
few segments (But amount of data actually indexed is more- better than earlier)

{code}
0: jdbc:hive2://ctr-e134-1499953498516-98952-> CREATE TABLE  
date_dim_store_drd_subset
0: jdbc:hive2://ctr-e134-1499953498516-98952-> STORED BY 
'org.apache.hadoop.hive.druid.DruidStorageHandler'
0: jdbc:hive2://ctr-e134-1499953498516-98952-> TBLPROPERTIES 
("druid.datasource" = "date_dim_store_drd_subset", "druid.query.granularity" = 
"DAY", "druid.segment.granularity" = "YEAR") AS
0: jdbc:hive2://ctr-e134-1499953498516-98952-> SELECT CAST(d_date AS TIMESTAMP) 
AS `__time`,
0: jdbc:hive2://ctr-e134-1499953498516-98952-> 
s_store_id,s_rec_start_date,s_store_name
0: jdbc:hive2://ctr-e134-1499953498516-98952-> FROM date_dim,store;

[1906-06-21 00:00:00.0, "IAAA", "2000-03-13", "eing", 1906-01-01 
00:00:00.0]
[1906-06-21 00:00:00.0, "KAAA", "1999-03-14", "ought", 1906-01-01 
00:00:00.0]
[1906-06-23 00:00:00.0, "CAAA", "1997-03-13", "able", 1906-01-01 
00:00:00.0]
[1906-06-23 00:00:00.0, "EAAA", "1997-03-13", "ese", 1906-01-01 
00:00:00.0]
[1906-06-23 00:00:00.0, "EAAA", "2001-03-13", "cally", 1906-01-01 
00:00:00.0]
[1906-06-23 00:00:00.0, "IAAA", "1997-03-13", "eing", 1906-01-01 
00:00:00.0]
[1906-06-23 00:00:00.0, "KAAA", "1997-03-13", "bar", 1906-01-01 
00:00:00.0]
[1906-06-23 00:00:00.0, "KAAA", "2001-03-13", "ought", 1906-01-01 
00:00:00.0]
[1906-06-23 00:00:00.0, "BAAA", "1997-03-13", "ought", 1906-01-01 
00:00:00.0]
[1906-06-23 00:00:00.0, "CAAA", "2000-03-13", "able", 1906-01-01 
00:00:00.0]
[1906-06-23 00:00:00.0, "EAAA", "1999-03-14", "anti", 1906-01-01 
00:00:00.0]
[1906-06-23 00:00:00.0, "HAAA", "1997-03-13", "ation", 1906-01-01 
00:00:00.0]
[1906-06-23 00:00:00.0, "IAAA", "2000-03-13", "eing", 1906-01-01 
00:00:00.0]
[1906-06-23 00:00:00.0, "KAAA", "1999-03-14", "ought", 1906-01-01 
00:00:00.0]
[1906-06-25 00:00:00.0, "CAAA", "1997-03-13", "able", 1906-01-01 
00:00:00.0]
[1906-06-25 00:00:00.0, "EAAA", "1997-03-13", "ese", 1906-01-01 
00:00:00.0]
[1906-06-25 00:00:00.0, "EAAA", "2001-03-13", "cally", 1906-01-01 
00:00:00.0]
[1906-06-25 00:00:00.0, "IAAA", "1997-03-13", "eing", 1906-01-01 
00:00:00.0]
[1906-06-25 00:00:00.0, "KAAA", "1997-03-13", "bar", 1906-01-01 
00:00:00.0]
[1906-06-25 00:00:00.0, "KAAA", "2001-03-13", "ought", 1906-01-01 
00:00:00.0]
[1906-06-25 00:00:00.0, "BAAA", "1997-03-13", "ought", 1906-01-01 
00:00:00.0]
[1906-06-25 00:00:00.0, "CAAA", "2000-03-13", "able", 1906-01-01 
00:00:00.0]
[1906-06-25 00:00:00.0, "EAAA", "1999-03-14", "anti", 1906-01-01 
00:00:00.0]
[1906-06-25 00:00:00.0, "HAAA", "1997-03-13", "ation", 1906-01-01 
00:00:00.0]
[1906-06-25 00:00:00.0, "IAAA", "2000-03-13", "eing", 1906-01-01 
00:00:00.0]
[1906-06-25 00:00:00.0, "KAAA", "1999-03-14", "ought", 1906-01-01 
00:00:00.0]
[1906-06-27 00:00:00.0, "CAAA", "1997-03-13", "able", 1906-01-01 
00:00:00.0]
[1906-06-27 00:00:00.0, "EAAA", "1997-03-13", "ese", 1906-01-01 
00:00:00.0]
[1906-06-27 00:00:00.0, "EAAA", "2001-03-13", "cally", 1906-01-01 
00:00:00.0]
[1906-06-27 00:00:00.0, "IAAA", "1997-03-13", "eing", 1906-01-01 
00:00:00.0]
[1906-06-27 00:00:00.0, "KAAA", "1997-03-13", "bar", 1906-01-01 
00:00:00.0]
[1906-06-27 00:00:00.0, "KAAA", "2001-03-13", "ought", 1906-01-01 
00:00:00.0]
[1906-06-27 00:00:00.0, "BAAA", "1997-03-13", "ought", 1906-01-01 
00:00:00.0]
[1906-06-27 00:00:00.0, "CAAA", "2000-03-13", "able", 1906-01-01 
00:00:00.0]
[1906-06-27 00:00:00.0, "EAAA", "1999-03-14", "anti", 1906-01-01 
00:00:00.0]
[1906-06-27 00:00:00.0, "HAAA", "1997-03-13", "ation", 1906-01-01 
00:00:00.0]
[1906-06-27 00:00:00.0, "IAAA", "2000-03-13", "eing", 1906-01-01 
00:00:00.0]
[1906-06-27 00:00:00.0, "KAAA", "1999-03-14", "ought", 1906-01-01 
00:00:00.0]
[1906-06-29 00:00:00.0, "CAAA", "1997-03-13", "able", 1906-01-01 
00:00:00.0]
[1906-06-29 00:00:00.0, "EAAA", "1997-03-13", "ese", 1906-01-01 
00:00:00.0]
[1906-06-29 00:00:00.0, "EAAA", "2001-03-13", "cally", 1906-01-01 
00:00:00.0]
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:489)
at 

[jira] [Assigned] (HIVE-17485) Hive-Druid table on indexing for few segments- DruidRecordWriter.pushSegments throws ArrayIndexOutOfBoundsException

2017-09-08 Thread Dileep Kumar Chiguruvada (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dileep Kumar Chiguruvada reassigned HIVE-17485:
---

Assignee: slim bouguerra

> Hive-Druid table on indexing for few segments- DruidRecordWriter.pushSegments 
> throws ArrayIndexOutOfBoundsException
> ---
>
> Key: HIVE-17485
> URL: https://issues.apache.org/jira/browse/HIVE-17485
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.1.0
>Reporter: Dileep Kumar Chiguruvada
>Assignee: slim bouguerra
>
> Hive-Druid table on indexing for few segments DruidRecordWriter.pushSegments 
> throws ArrayIndexOutOfBoundsException.
> Error says
> {code}
> ERROR : Vertex failed, vertexName=Reducer 2, 
> vertexId=vertex_1502725432788_0017_2_01, diagnostics=[Task failed, 
> taskId=task_1502725432788_0017_2_01_02, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1502725432788_0017_2_01_02_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 
> 1) Column vector types: 1:TIMESTAMP, 2:LONG, 3:BYTES, 4:LONG, 5:LONG, 6:LONG, 
> 7:LONG, 8:LONG, 9:LONG, 10:LONG, 11:LONG, 12:LONG, 13:LONG, 14:LONG, 
> 15:BYTES, 16:BYTES, 17:BYTES, 18:BYTES, 19:BYTES, 20:LONG, 21:LONG, 22:LONG, 
> 23:LONG, 24:BYTES, 25:BYTES, 26:BYTES, 27:BYTES, 28:BYTES, 0:TIMESTAMP
> [1900-01-18 00:00:00.0, 2415038, "OLJNECAA", 0, 3, 1, 1900, 3, 1, 18, 
> 1, 1900, 1, 3, "Wednesday", "1900Q1", "N", "N", "N", 2415021, 2415020, 
> 2414673, 2414946, "N", "N", "N", "N", "N", 1900-01-18 00:00:00.0]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing vector batch (tag=0) (vectorizedVertexNum 1) Column vector types: 
> 1:TIMESTAMP, 2:LONG, 3:BYTES, 4:LONG, 5:LONG, 6:LONG, 7:LONG, 8:LONG, 9:LONG, 
> 10:LONG, 11:LONG, 12:LONG, 13:LONG, 14:LONG, 15:BYTES, 16:BYTES, 17:BYTES, 
> 18:BYTES, 19:BYTES, 20:LONG, 21:LONG, 22:LONG, 23:LONG, 24:BYTES, 25:BYTES, 
> 26:BYTES, 27:BYTES, 28:BYTES, 0:TIMESTAMP
> [1900-01-18 00:00:00.0, 2415038, "OLJNECAA", 0, 3, 1, 1900, 3, 1, 18, 
> 1, 1900, 1, 3, "Wednesday", "1900Q1", "N", "N", "N", 2415021, 2415020, 
> 2414673, 2414946, "N", "N", "N", "N", "N", 1900-01-18 00:00:00.0]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:406)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:248)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:189)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing vector batch (tag=0) (vectorizedVertexNum 1) Column 
> vector types: 1:TIMESTAMP, 2:LONG, 3:BYTES, 4:LONG, 5:LONG, 6:LONG, 7:LONG, 
> 8:LONG, 9:LONG, 10:LONG, 11:LONG, 12:LONG, 13:LONG, 14:LONG, 15:BYTES, 
> 16:BYTES, 17:BYTES, 18:BYTES, 19:BYTES, 20:LONG, 21:LONG, 22:LONG, 23:LONG, 
> 

  1   2   >