[GitHub] hive pull request #425: Kafkahandler hive 20377

2018-08-28 Thread b-slim
GitHub user b-slim opened a pull request:

https://github.com/apache/hive/pull/425

Kafkahandler hive 20377



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/b-slim/hive kafkahandler_HIVE-20377

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/425.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #425


commit 3996b5b2e86f7b3cd4ca319227163852e268aec6
Author: Slim Bouguerra 
Date:   2018-07-19T16:30:48Z

Working version of kafka storage handler

Change-Id: Ief161074d151917c3a7ed443cf78374bcaf7bcfc

trying random things

fix typo

working version for demo only change might be getRecordreader call

adding kafka Trimmer first cut

Change-Id: If9bf7f561b867e80ab31f5c8c3c40730128986af

link the code and add some docs

Change-Id: I7e15d90de772fffef8ee0930352069742af12ac7

add static function to avoid dep on hive exec

Change-Id: Ib61901cd45027d1469d72a890e26f73997402974

working version of filter pushdown partition and offsets

Change-Id: I3e3d157438fcc965491380e3f1afa9c81a6cd75b

fix case when offset filter is out of range and add some code comments

Change-Id: Ic3321a14cc9a06b8eeb85cc20ea89f21e2765a93

add filter push down based on ts

Change-Id: I5a1da7634cfc80990036add3463a28810ce642c2

clean code

Change-Id: I753cfe4e9f0a69396635b7a9fe16c29f7bb0675e

case filter is null can imply no filter

Change-Id: I68fa9afe09009329ebe8ac9e5d3f7923a2003ebb

some refactor for how to build consumer

Change-Id: If007a89d93c215542027aec7ea0aff2fa6dc83fb

major refactor:
removed and/or methods
switch logic of ANDing filters

Change-Id: Id36f45842db7c4579edaa1e6062bddd291d53bc9

adding tests and refactor fetch by time

Change-Id: I6329ca3fa8c6f29e90d0001034064d2a603d2580

adding more tests

Change-Id: Ic9d1f1a86ffd0f99b298965db5c5d1f36670b15b

more tests

Change-Id: I4aada643b67e3e1207a504267a50dc75bc0176e3

small refactor to where column names are stored

Change-Id: I0635c9627520f786e4f45e501eb63158f915d2a0

Start working on GenericSerde

Change-Id: I1adddfb639da7d6a64dde06c46bc67c072806469

first cut that uses default_json

Change-Id: Icad9eb9eb36b76eea4ba1342193c649a9d419287

cleaning some tests stuff

Change-Id: I889c4788250590f6cc7f0d9d7a1756f71e9e5cc5

setting the default serde for storage handler

Change-Id: I00179ec97d43d0a955500a23ebdab770d26930a7

fix how reading the property

Change-Id: I1bba7e19defda3316b4ddb0e0721e45cf11be063

working version of generic serde

Change-Id: I370a87aaf55f599db1695775ec2737e54af81270

fixup squash with previous

Change-Id: I6adfc93efed84c38aa1ad7092660e4cca49bc29a

use nanos for logging

Change-Id: I9055e0813b3b4bcbf72db0900ab9d2cb480c8f8b

major refactor plus tests working q files tests

Change-Id: I5ffc1cfcb4708e7a89163c371027e92782f2e4fc

adding q files test

Change-Id: I887fff5e3fdcdb0322770e52f9a8ab732a8dbe86

commit d28f1c94956b65374d58f3cda94fbbee5ed3e6b4
Author: Slim Bouguerra 
Date:   2018-08-13T18:00:55Z

re activate the test

Change-Id: I7f8ef8a44271286abbd5a36a92ccde87d2ef8839

commit f71f68d8f7458409d5f341c1ab262d6a894ebcc0
Author: Slim Bouguerra 
Date:   2018-08-13T20:02:07Z

refactor names and added copy jars

Change-Id: I11ec3aa4f9e96efc81ca8e9994c7409625384764

commit 5c442137e3f4b02086fd2c42ce354e4d4dbe4cd3
Author: Slim Bouguerra 
Date:   2018-08-13T20:11:35Z

fix headers

Change-Id: I478c0709ba2ca77a1139011006170e4ad0683617

commit dce3a0f8b4eb77a9929ff2b2edf9fde44d364244
Author: Slim Bouguerra 
Date:   2018-08-13T22:25:32Z

clean code and add comments

Change-Id: I1f8b3b748b5ab4f8f7e594bf85433affca83b50e

commit 80d9d5bbd55ea6d1a5d2de8ddffa393e495008ed
Author: Slim Bouguerra 
Date:   2018-08-14T00:10:11Z

fix hive package

Change-Id: I2f255590aab7cd0897583ab11ea02961d1114bbd

commit 172019a6ab06d6026a7ba69f4b2c719f2afb2408
Author: Slim Bouguerra 
Date:   2018-08-14T00:35:57Z

added more tests

Change-Id: I59c8bee67877bb54a4dcf5803b9e15ab2c8f0c42

commit 77700b20b7e2aa604caaad89bcd030dfc8b8e925
Author: Slim Bouguerra 
Date:   2018-08-14T01:01:09Z

intelliJ friendly warning supress

Change-Id: Ib6fb5a8a2fdc2e7cb13dde7f8386bf39b86a8926

commit 8cfbce6e3cd70a472ebb818f126d72596b39bd91
Author: Slim Bouguerra 
Date:   2018-08-14T20:29:04Z

fix style issue

Change-Id: I36be0353de253e46fbe16e35b52c258c7784a63a

commit 

[jira] [Created] (HIVE-20484) Disable Block Cache By Default With HBase SerDe

2018-08-28 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20484:
--

 Summary: Disable Block Cache By Default With HBase SerDe
 Key: HIVE-20484
 URL: https://issues.apache.org/jira/browse/HIVE-20484
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 1.2.3, 2.4.0, 4.0.0, 3.2.0
Reporter: BELUGA BEHR


{quote}
Scan instances can be set to use the block cache in the RegionServer via the 
setCacheBlocks method. For input Scans to MapReduce jobs, this should be false. 

https://hbase.apache.org/book.html#perf.hbase.client.blockcache
{quote}

However, from the Hive code, we can see that this is not the case.

{code}
public static final String HBASE_SCAN_CACHEBLOCKS = "hbase.scan.cacheblock";

...

String scanCacheBlocks = 
tableProperties.getProperty(HBaseSerDe.HBASE_SCAN_CACHEBLOCKS);
if (scanCacheBlocks != null) {
  jobProperties.put(HBaseSerDe.HBASE_SCAN_CACHEBLOCKS, scanCacheBlocks);
}

...

String scanCacheBlocks = jobConf.get(HBaseSerDe.HBASE_SCAN_CACHEBLOCKS);
if (scanCacheBlocks != null) {
  scan.setCacheBlocks(Boolean.parseBoolean(scanCacheBlocks));
}
{code}

In the Hive code, we can see that if {{hbase.scan.cacheblock}} is not specified 
in the {{SERDEPROPERTIES}} then {{setCacheBlocks}} is not called and the 
default value of the HBase {{Scan}} class is used.

{code:java|title=Scan.java}
  /**
   * Set whether blocks should be cached for this Scan.
   * 
   * This is true by default.  When true, default settings of the table and
   * family are used (this will never override caching blocks if the block
   * cache is disabled for that family or entirely).
   *
   * @param cacheBlocks if false, default settings are overridden and blocks
   * will not be cached
   */
  public Scan setCacheBlocks(boolean cacheBlocks) {
this.cacheBlocks = cacheBlocks;
return this;
  }
{code}

Hive is doing full scans of the table with MapReduce/Spark and therefore, 
according to the HBase docs, the default behavior here should be that blocks 
are not cached.  Hive should set this value to "false" by default unless the 
table {{SERDEPROPERTIES}} override this.

{code:sql}
-- Commands for HBase
-- create 'test', 't'

CREATE EXTERNAL TABLE test(value map, row_key string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "t:,:key",
"hbase.scan.cacheblock" = "false"
);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 68552: HIVE-20483: Really move metastore common classes into metastore-common

2018-08-28 Thread Alexander Kolbasov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68552/
---

Review request for hive, Alan Gates, Peter Vary, and Vihang Karajgaonkar.


Bugs: HIVE-20483
https://issues.apache.org/jira/browse/HIVE-20483


Repository: hive-git


Description
---

HIVE-20483: Really move metastore common classes into metastore-common


Diffs
-

  beeline/pom.xml 4567d5e09b706ba9023cb901dc796389f7ccf586 
  metastore/pom.xml 7f751a49359b9661e0f54528688a25669973785b 
  ql/pom.xml a55cbe380db5123b6080eb938e3cc248cf5f5038 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/StatsSetupConst.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ColumnType.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/FileMetadataHandler.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreFS.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetadataStore.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreTaskThread.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionExpressionProxy.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/TableIterable.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/conf/TimeValidator.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/partition/spec/CompositePartitionSpecProxy.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/partition/spec/PartitionListComposingSpecProxy.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/partition/spec/PartitionSpecProxy.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/partition/spec/PartitionSpecWithSharedSDProxy.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/security/DelegationTokenIdentifier.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/security/DelegationTokenSecretManager.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/security/DelegationTokenSelector.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/security/HadoopThriftAuthBridge.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/security/HadoopThriftAuthBridge23.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/HdfsUtils.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
  
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/SecurityUtils.java
  


Diff: https://reviews.apache.org/r/68552/diff/1/


Testing
---


Thanks,

Alexander Kolbasov



[jira] [Created] (HIVE-20483) Really move metastore common classes into metastore-common

2018-08-28 Thread Alexander Kolbasov (JIRA)
Alexander Kolbasov created HIVE-20483:
-

 Summary: Really move metastore common classes into metastore-common
 Key: HIVE-20483
 URL: https://issues.apache.org/jira/browse/HIVE-20483
 Project: Hive
  Issue Type: Sub-task
  Components: Standalone Metastore
Affects Versions: 3.0.1, 4.0.0
Reporter: Alexander Kolbasov
Assignee: Alexander Kolbasov


HIVE-20482 patch was supposed to move a bunch of files from metastore-server to 
metastore-common but for some reason it didn't happen, so now these files 
should be moved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20482) Remove dependency on metastore-server

2018-08-28 Thread Alexander Kolbasov (JIRA)
Alexander Kolbasov created HIVE-20482:
-

 Summary: Remove dependency on metastore-server
 Key: HIVE-20482
 URL: https://issues.apache.org/jira/browse/HIVE-20482
 Project: Hive
  Issue Type: Sub-task
  Components: Standalone Metastore
Affects Versions: 3.0.1, 4.0.0
Reporter: Alexander Kolbasov
Assignee: Alexander Kolbasov


Now that we separated common and server classes we should remove dependency on 
the server module from poms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68509: HIVE-20451: Metastore client and server tarball issues

2018-08-28 Thread Alexander Kolbasov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68509/
---

(Updated Aug. 29, 2018, 12:03 a.m.)


Review request for hive, Alan Gates, Peter Vary, and Vihang Karajgaonkar.


Changes
---

Suppress generation of source tarballs in submodules.


Bugs: HIVE-20451
https://issues.apache.org/jira/browse/HIVE-20451


Repository: hive-git


Description
---

HIVE-20451: Metastore client and server tarball issues


Diffs (updated)
-

  standalone-metastore/metastore-common/pom.xml 
b0d9eee927e77728c27da95196d25f6ba1dfcb0c 
  standalone-metastore/metastore-common/src/assembly/bin.xml 
5992a484cf649d8028eb91db61a8a44e8f71cb8e 
  standalone-metastore/metastore-common/src/assembly/src.xml 
a2405443ea3a0d0c332707bdd46497f8a6694590 
  standalone-metastore/metastore-server/pom.xml 
f2c10b08cf08d194947f8d7737d745c9582dd837 
  standalone-metastore/metastore-server/src/assembly/src.xml 
a2405443ea3a0d0c332707bdd46497f8a6694590 
  standalone-metastore/metastore-tools/pom.xml 
f6fb6dc95e62cce68de6761d0d15b84be6dcd6ce 
  standalone-metastore/pom.xml ee3daed92a8d2e8f6d43957cb2358b285c24065c 
  standalone-metastore/src/assembly/src.xml PRE-CREATION 


Diff: https://reviews.apache.org/r/68509/diff/3/

Changes: https://reviews.apache.org/r/68509/diff/2-3/


Testing
---


Thanks,

Alexander Kolbasov



Re: Review Request 68509: HIVE-20451: Metastore client and server tarball issues

2018-08-28 Thread Alexander Kolbasov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68509/
---

(Updated Aug. 28, 2018, 9:53 p.m.)


Review request for hive, Alan Gates, Peter Vary, and Vihang Karajgaonkar.


Changes
---

Cleaned up configurations for assembly plugin. The result looks like this:

1. Only one binary tarball is generated in metastore-server target directory. 
It is called apache-hive-standalone-metastore-server-4.0.0-SNAPSHOT-bin.tar.gz. 
I don't think that we need binary tarbal for the metastore-common - there are 
no binaries to run from there.
2. standalone-metastore/target has the overall tarball called 
apache-hive-standalone-metastore-4.0.0-SNAPSHOT-src.tar.gz which includes full 
source and can be used to build standalone metastore.
submodules have their own source tarballs that can be used to build these 
submodules. They all have different names.


Bugs: HIVE-20451
https://issues.apache.org/jira/browse/HIVE-20451


Repository: hive-git


Description
---

HIVE-20451: Metastore client and server tarball issues


Diffs (updated)
-

  standalone-metastore/metastore-common/pom.xml 
b0d9eee927e77728c27da95196d25f6ba1dfcb0c 
  standalone-metastore/metastore-common/src/assembly/bin.xml 
5992a484cf649d8028eb91db61a8a44e8f71cb8e 
  standalone-metastore/metastore-common/src/assembly/src.xml 
a2405443ea3a0d0c332707bdd46497f8a6694590 
  standalone-metastore/metastore-server/pom.xml 
f2c10b08cf08d194947f8d7737d745c9582dd837 
  standalone-metastore/metastore-server/src/assembly/src.xml 
a2405443ea3a0d0c332707bdd46497f8a6694590 
  standalone-metastore/pom.xml ee3daed92a8d2e8f6d43957cb2358b285c24065c 
  standalone-metastore/src/assembly/src.xml PRE-CREATION 


Diff: https://reviews.apache.org/r/68509/diff/2/

Changes: https://reviews.apache.org/r/68509/diff/1-2/


Testing
---


Thanks,

Alexander Kolbasov



[jira] [Created] (HIVE-20481) Add the Kafka Key record as part of the row.

2018-08-28 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20481:
-

 Summary: Add the Kafka Key record as part of the row.
 Key: HIVE-20481
 URL: https://issues.apache.org/jira/browse/HIVE-20481
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
Assignee: slim bouguerra


Kafka records are keyed, most of the case this key is null or used to route 
records to the same partition. This patch adds this column as a binary column 
{code} __record_key{code}.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20480) Implement column stats annotation rules for the UDTFOperator: Follow up for HIVE-20262

2018-08-28 Thread George Pachitariu (JIRA)
George Pachitariu created HIVE-20480:


 Summary: Implement column stats annotation rules for the 
UDTFOperator: Follow up for HIVE-20262
 Key: HIVE-20480
 URL: https://issues.apache.org/jira/browse/HIVE-20480
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Reporter: George Pachitariu
Assignee: George Pachitariu


 

Implementing the rule for column stats: Follow up task for 
[HIVE-20262|http://example.com]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20479) Update content/people.mdtext in cms

2018-08-28 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-20479:
-

 Summary: Update content/people.mdtext in cms 
 Key: HIVE-20479
 URL: https://issues.apache.org/jira/browse/HIVE-20479
 Project: Hive
  Issue Type: Task
Reporter: Andrew Sherman
Assignee: Andrew Sherman


I added myself to the committers list. 

 
{code:java}
asherman 
Andrew Sherman 
http://cloudera.com/;>Cloudera 
 

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20478) Metastore: Null checks needed in DecimalColumnStatsAggregator

2018-08-28 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-20478:
---

 Summary: Metastore: Null checks needed in 
DecimalColumnStatsAggregator
 Key: HIVE-20478
 URL: https://issues.apache.org/jira/browse/HIVE-20478
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Affects Versions: 3.1.0
Reporter: Vaibhav Gumashta






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #416: HIVE-20371: Queries failing with Internal error proc...

2018-08-28 Thread sankarh
Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/416


---


[GitHub] hive pull request #410: HIVE-20264: Bootstrap repl dump with concurrent writ...

2018-08-28 Thread sankarh
Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/410


---


[GitHub] hive pull request #424: HIVE-20476: CopyUtils used by REPL LOAD and EXPORT/I...

2018-08-28 Thread sankarh
GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/424

HIVE-20476: CopyUtils used by REPL LOAD and EXPORT/IMPORT operations ignore 
distcp error.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-20476

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #424






---


[ANNOUNCE] New committer: Andrew Sherman

2018-08-28 Thread Ashutosh Chauhan
Apache Hive's Project Management Committee (PMC) has invited Andrew Sherman
to become a committer, and we are pleased to announce that he has accepted.

Andrew, welcome, thank you for your contributions, and we look forward to
your
further interactions with the community!

Ashutosh Chauhan (on behalf of the Apache Hive PMC)


Re: Review Request 68523: Improve org.apache.hadoop.hive.ql.exec.FunctionTask Experience

2018-08-28 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68523/#review208033
---



Thanks for the patch.
LGTM, just one question regarding the tests.

Thanks,
Peter


ql/src/test/queries/clientnegative/create_unknown_permanent_udf.q
Lines 1 (patched)


What is the difference between the create_function_nonexistent_class.q and 
this new test?
Do we need both?


- Peter Vary


On aug. 28, 2018, 1:50 du, denys kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68523/
> ---
> 
> (Updated aug. 28, 2018, 1:50 du)
> 
> 
> Review request for hive, Marta Kuczora, Peter Vary, and Adam Szita.
> 
> 
> Bugs: HIVE-20466
> https://issues.apache.org/jira/browse/HIVE-20466
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> When a create function statement is submitted, it may fail with the following 
> error:
> 
> Error while processing statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> 
> This is not a user-friendly error message.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 44591842bb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java 6f8a8f5504 
>   ql/src/test/queries/clientnegative/create_unknown_permanent_udf.q 
> PRE-CREATION 
>   ql/src/test/results/clientnegative/create_function_nonexistent_class.q.out 
> 77467f66e3 
>   ql/src/test/results/clientnegative/create_unknown_permanent_udf.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68523/diff/2/
> 
> 
> Testing
> ---
> 
> added negative qtest to cover this scenario
> 
> 
> Thanks,
> 
> denys kuzmenko
> 
>



Re: Review Request 68523: Improve org.apache.hadoop.hive.ql.exec.FunctionTask Experience

2018-08-28 Thread denys kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68523/
---

(Updated Aug. 28, 2018, 1:50 p.m.)


Review request for hive, Marta Kuczora, Peter Vary, and Adam Szita.


Bugs: HIVE-20466
https://issues.apache.org/jira/browse/HIVE-20466


Repository: hive-git


Description
---

When a create function statement is submitted, it may fail with the following 
error:

Error while processing statement: FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.FunctionTask

This is not a user-friendly error message.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 44591842bb 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java 6f8a8f5504 
  ql/src/test/queries/clientnegative/create_unknown_permanent_udf.q 
PRE-CREATION 
  ql/src/test/results/clientnegative/create_function_nonexistent_class.q.out 
77467f66e3 
  ql/src/test/results/clientnegative/create_unknown_permanent_udf.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68523/diff/2/

Changes: https://reviews.apache.org/r/68523/diff/1-2/


Testing
---

added negative qtest to cover this scenario


Thanks,

denys kuzmenko



[jira] [Created] (HIVE-20477) OptimizedSql is not shown if the expression contains INs

2018-08-28 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-20477:
---

 Summary: OptimizedSql is not shown if the expression contains INs
 Key: HIVE-20477
 URL: https://issues.apache.org/jira/browse/HIVE-20477
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


This ticket should fix HiveIn to be able to unparse; currently if an IN is 
unparsed - there are some exceptions because HiveIn is a special operator; but 
doesn't have unparse implemented.

CALCITE-2444 is also needed to fix rel2sql to be able to process INs which are 
not there represent a subquery.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68518: ProxyFileSystem.listStatusIterator function override required once migrated to Hadoop 3.2.0+

2018-08-28 Thread denys kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68518/
---

(Updated Aug. 28, 2018, 11:34 a.m.)


Review request for hive, Marta Kuczora and Peter Vary.


Bugs: HIVE-20465
https://issues.apache.org/jira/browse/HIVE-20465


Repository: hive-git


Description
---

Few list_bucket_query_oneskew_*.q tests started to fail. It is the side effect 
of this Hadoop change:
http://github.mtv.cloudera.com/CDH/hadoop/commit/073a38ee09f40b25677cc49eff777241c8fb2eba#diff-4bf68d76a459a69bbb0affbf579ebcf3

In Hive there is the ProxyFileSystem class which has the 'swizzleParamPath' 
method. This method will replace the 'pfile' prefix to 'file' in the path. The 
ProxyFileSystem will be used as file system during the test execution and this 
class implements some methods of the hadoop FileSystem class. Before the linked 
Hadoop patch, the file status are fetched with the 'listStatus' method call 
which is implemented in the ProxyFileSystem. But the patch changed this code 
path and now it calls the FileSystem.listStatusIterator method which is not 
implemented in the ProxyFileSystem, so it will fall back to the Hadoop 
implementation which will throw error for the 'pfile' prefix.


Diffs (updated)
-

  shims/common/src/main/java/org/apache/hadoop/fs/ProxyFileSystem.java 
608c7e0578 
  shims/common/src/main/test/org/apache/hadoop/fs/TestProxyFileSystem.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68518/diff/2/

Changes: https://reviews.apache.org/r/68518/diff/1-2/


Testing
---


Thanks,

denys kuzmenko



Re: Review Request 68496: Optimized & cleaned up HBaseQTest runner

2018-08-28 Thread denys kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68496/
---

(Updated Aug. 28, 2018, 11:33 a.m.)


Review request for hive, Zoltan Haindrich, Zoltan Haindrich, Marta Kuczora, and 
Peter Vary.


Bugs: HIVE-20394
https://issues.apache.org/jira/browse/HIVE-20394


Repository: hive-git


Description
---

1. Set proper cluster destroy order
2. Propagated proper HBaseTestContext
3. Ported downstream fixes (CDH-63695)
4. General clean up


Diffs (updated)
-

  hbase-handler/src/test/queries/negative/cascade_dbdrop.q 48be8cd070 
  hbase-handler/src/test/queries/positive/hbase_handler_snapshot.q e4290717a8 
  hbase-handler/src/test/results/negative/cascade_dbdrop.q.out 803e35e406 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestLocationQueries.java
 31195c4523 
  
itests/util/src/main/java/org/apache/hadoop/hive/accumulo/AccumuloQTestUtil.java
 956478d778 
  
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/AbstractCoreBlobstoreCliDriver.java
 3cf5ebb3df 
  
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreCliDriver.java 
1ead1448d1 
  
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreCompareCliDriver.java
 6b4c6c6a79 
  
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreHBaseCliDriver.java
 70cbf04823 
  
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreHBaseNegativeCliDriver.java
 c76a70e7dd 
  
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreNegativeCliDriver.java
 07ae6ac206 
  
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CorePerfCliDriver.java
 55e744e0f3 
  itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseQTestUtil.java 
07df0c9d1e 
  itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 
7b203a9281 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestArguments.java 
PRE-CREATION 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 5adbb63693 
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/parse/CoreParseNegative.java
 8f5744d2f1 


Diff: https://reviews.apache.org/r/68496/diff/2/

Changes: https://reviews.apache.org/r/68496/diff/1-2/


Testing
---

fixed existing tests


Thanks,

denys kuzmenko



[jira] [Created] (HIVE-20476) REPL LOAD and EXPORT/IMPORT operations ignores distcp failures.

2018-08-28 Thread Sankar Hariappan (JIRA)
Sankar Hariappan created HIVE-20476:
---

 Summary: REPL LOAD and EXPORT/IMPORT operations ignores distcp 
failures.
 Key: HIVE-20476
 URL: https://issues.apache.org/jira/browse/HIVE-20476
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, repl
Affects Versions: 3.1.0
Reporter: Sankar Hariappan
Assignee: Sankar Hariappan


CopyUtils uses FileUtils.distCp to copy files but doesn't check the return 
value. It returns false if the copy fails.
Now, REPL LOAD and EXPORT/IMPORT commands internally uses CopyUtils to copy 
data files across clusters and here it may return success even if file copy 
fails and may cause data loss.

Need to throw error and retry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20475) Hive Thrift Server 2 stops frequently

2018-08-28 Thread Vinod Nerella (JIRA)
Vinod Nerella created HIVE-20475:


 Summary: Hive Thrift Server 2 stops frequently
 Key: HIVE-20475
 URL: https://issues.apache.org/jira/browse/HIVE-20475
 Project: Hive
  Issue Type: Bug
 Environment: HDP 2.6.5.0

Hive 1.2.1000

Spark 2.3.0
Reporter: Vinod Nerella


18/08/28 02:18:05 ERROR TThreadPoolServer: Error occurred during processing of 
message.

java.lang.RuntimeException: 
org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in 
the stream

        at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)

        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.thrift.transport.TSaslTransportException: No data or no 
sasl data in the stream

        at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:328)

        at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)

        at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)

        ... 4 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #423: HIVE-20445: Add randomized tests to TestArrowColumna...

2018-08-28 Thread pudidic
GitHub user pudidic opened a pull request:

https://github.com/apache/hive/pull/423

HIVE-20445: Add randomized tests to TestArrowColumnarBatchSerDe



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pudidic/hive HIVE-20445

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/423.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #423


commit a4ef8df46a5b881dde9895aab7c60ff24185b10a
Author: Teddy Choi 
Date:   2018-08-24T04:21:35Z

Working

commit f5b7b4b7f972bae2e6d1cbb7e029b96018da973e
Author: Teddy Choi 
Date:   2018-08-27T03:58:58Z

Merge branch 'master' into HIVE-20445

commit 71d28a3d0d2ff827046c19eab2e8e5c37c08e107
Author: Teddy Choi 
Date:   2018-08-28T02:36:03Z

Working

commit c68fea097a6791819d7907708f6403943a5dd9d8
Author: Teddy Choi 
Date:   2018-08-28T02:36:45Z

Merge branch 'master' into HIVE-20445

commit afdc0fd4c0bb1fa33604ac2aecf882ff15e65609
Author: Teddy Choi 
Date:   2018-08-28T06:33:43Z

Working

commit 2e084cb455d225a8e728edce9a7b9b0212430e3b
Author: Teddy Choi 
Date:   2018-08-28T06:39:17Z

Removed unnecessary code




---


[jira] [Created] (HIVE-20474) SASL negotiation failure

2018-08-28 Thread kevin (JIRA)
kevin created HIVE-20474:


 Summary:  SASL negotiation failure
 Key: HIVE-20474
 URL: https://issues.apache.org/jira/browse/HIVE-20474
 Project: Hive
  Issue Type: Bug
  Components: Authentication
Affects Versions: 1.1.1
Reporter: kevin


 error when running hiveserver2 behind the load balancer, When running through 
beeline it is working fine but getting error when running through oozie
 
[HiveServer2-Handler-Pool: Thread-59]: SASL negotiation failure
{color:#00}javax.security.sasl.SaslException: DIGEST-MD5: IO error 
acquiring password [Caused by 
org.apache.hadoop.security.token.SecretManager$InvalidToken: token expired or 
does not exist: HIVE_DELEGATION_TOKEN owner=hadoop, renewer=hive, 
realUser=hive/namenodestandby.lakala@lakala.com, issueDate=1535046422518, 
maxDate=1535651222518, sequenceNumber=7, masterKeyId=2]
 at 
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598)
 at 
com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
 at 
org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539)
 at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283)
 at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
 at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
 at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:793)
 at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:790)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:360)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900)
 at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:790)
 at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
expired or does not exist: HIVE_DELEGATION_TOKEN owner=hadoop, renewer=hive, 
realUser=hive/namenodestandby.lakala@lakala.com, issueDate=1535046422518, 
maxDate=1535651222518, sequenceNumber=7, masterKeyId=2
 at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:114)
 at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:56)
 at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.getPassword(HadoopThriftAuthBridge.java:607)
 at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.handle(HadoopThriftAuthBridge.java:638)
 at 
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
 ... 15 more

[HiveServer2-Handler-Pool: Thread-59]: Error occurred during processing of 
message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
DIGEST-MD5: IO error acquiring password
 at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
 at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:793)
 at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:790)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:360)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900)
 at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:790)
 at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: DIGEST-MD5: IO 
error acquiring password
 at 
org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
 at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:316)
 at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
 

[jira] [Created] (HIVE-20473) Optimization for materialized views

2018-08-28 Thread Shyam Rai (JIRA)
Shyam Rai created HIVE-20473:


 Summary: Optimization for materialized views
 Key: HIVE-20473
 URL: https://issues.apache.org/jira/browse/HIVE-20473
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 3.0.0
 Environment: Can be reproduced on a Single node pseudo cluster. 
Reporter: Shyam Rai


Optimizer is taking advantage of materialized view only when the query syntax 
matches the way view was created. Here is an example.

*Source table on which materialized views are created*

{code}
++
|   createtab_stmt   |
++
| CREATE TABLE `mysource`(   |
|   `id` int,|
|   `name` string,   |
|   `start_date` date)   |
| ROW FORMAT SERDE   |
|   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
| WITH SERDEPROPERTIES ( |
|   'field.delim'=',',   |
|   'serialization.format'=',')  |
| STORED AS INPUTFORMAT  |
|   'org.apache.hadoop.mapred.TextInputFormat'   |
| OUTPUTFORMAT   |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION   |
|   
'hdfs://xlhive3.openstacklocal:8020/warehouse/tablespace/managed/hive/mysource' 
|
| TBLPROPERTIES (|
|   'bucketing_version'='2', |
|   'transactional'='true',  |
|   'transactional_properties'='insert_only',|
|   'transient_lastDdlTime'='1535392655')|
++
{code}

One of the materialized views "view_1" is created to fetch the data between IDs 
1 and 2 using this statement
{code}
select `mysource`.`id`, `mysource`.`name`, `mysource`.`start_date` from 
`default`.`mysource` where `mysource`.`id` between 1 and 2
{code}

*When a SELECT is executed against the source table using the following SELECT 
statement, this works fine and can be validated with the explain plan.
*
{code}
0: jdbc:hive2://localhost:1/default> explain select * from mysource where 
id between 1 and 2;
INFO  : Compiling 
command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c): 
explain select * from mysource where id between 1 and 2
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c); Time 
taken: 0.224 seconds
INFO  : Executing 
command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c): 
explain select * from mysource where id between 1 and 2
INFO  : Starting task [Stage-1:EXPLAIN] in serial mode
INFO  : Completed executing 
command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c); Time 
taken: 0.006 seconds
INFO  : OK
++
|  Explain   |
++
| STAGE DEPENDENCIES:|
|   Stage-0 is a root stage  |
||
| STAGE PLANS:   |
|   Stage: Stage-0   |
| Fetch Operator |
|   limit: -1|
|   Processor Tree:  |
| TableScan  |
|   alias: default.view_1|
|   Select Operator  |
| expressions: id (type: int), name (type: string), start_date 
(type: date) |
| outputColumnNames: _col0, _col1, _col2 |
| ListSink   |
||
++
{code}

If the rewrite of the same SELECT is written using >= and <=, which should 
yield the same result, the optimizer does not take advantage of the 
materialized view, unless of course we create another view with this >= and <= 
syntax. 

{code}
0: jdbc:hive2://localhost:1/default> explain select * from mysource where 
id >= 1 and <=2;
Error: Error while compiling statement: FAILED: ParseException line 1:49 cannot 
recognize input near '<=' '2' '' in expression specification 
(state=42000,code=4)
0: jdbc:hive2://localhost:1/default> explain