Re: Review Request 60289: HIVE-15665 LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-09-12 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60289/
---

(Updated Sept. 13, 2017, 1:58 a.m.)


Review request for hive, Gopal V and Prasanth_J.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 24c5db0e47 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cache/EvictionDispatcher.java 
c5248ceb5f 
  llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java 
f42622b892 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcColumnVectorProducer.java
 6edd84b8b0 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
 b5db3029d1 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java
 dc053ee7cf 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileMetadata.java
 b9d7a77d5b 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcMetadataCache.java
 601b622b49 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcStripeMetadata.java
 4565d11988 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/ParquetMetadataCacheImpl.java
 b61a8ca022 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java
 13c7767a3b 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestOrcMetadataCache.java
 03a955c6f7 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 69a9f9f35e 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReader.java 
7540e72b53 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
690cce798e 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/Reader.java cdd58df370 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/ReaderImpl.java 
d47ba6b31a 
  ql/src/test/results/clientpositive/llap/orc_llap_counters.q.out 8af84dce19 
  ql/src/test/results/clientpositive/llap/orc_llap_counters1.q.out 4536cbbfb9 
  ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 77b7f5a2f7 
  ql/src/test/results/clientpositive/llap/orc_ppd_schema_evol_3a.q.out 
b799527e30 
  storage-api/src/java/org/apache/hadoop/hive/common/io/FileMetadataCache.java 
403c3ada61 


Diff: https://reviews.apache.org/r/60289/diff/5/

Changes: https://reviews.apache.org/r/60289/diff/4-5/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-17525) get rid of types in EncodedReaderImpl in favor of TypeDescription

2017-09-12 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-17525:
---

 Summary: get rid of types in EncodedReaderImpl in favor of 
TypeDescription
 Key: HIVE-17525
 URL: https://issues.apache.org/jira/browse/HIVE-17525
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


They are redundant and TypeDescription is better.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 60289: HIVE-15665 LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-09-12 Thread Sergey Shelukhin


> On Sept. 8, 2017, 6:24 p.m., Prasanth_J wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/ParquetMetadataCacheImpl.java
> > Lines 52 (patched)
> > 
> >
> > This file looks renamed from Parquet to generic MetadataCache but 
> > contains OrcSpecific objects. If it is generic remove orc related stuff or 
> > rename the class if it is orc specific?

It's a generic class for buffers, and contains some format-specific sub-parts


> On Sept. 8, 2017, 6:24 p.m., Prasanth_J wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/ParquetMetadataCacheImpl.java
> > Lines 106 (patched)
> > 
> >
> > why lock and unlock notification back to back?

To "use" the item as far as eviction policy is concerned.


> On Sept. 8, 2017, 6:24 p.m., Prasanth_J wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/ParquetMetadataCacheImpl.java
> > Lines 134 (patched)
> > 
> >
> > can you create follow up? will be useful for debugging. or this could 
> > be jmx info. something that can be looked easily instead of logs.

HIVE-17524


> On Sept. 8, 2017, 6:24 p.m., Prasanth_J wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/ParquetMetadataCacheImpl.java
> > Lines 235 (patched)
> > 
> >
> > is readFully fixed in hadoop 2.8? if so, now that hive moved to 2.8.0 
> > can that be used here and other places?

We may be running against 2.7; also it anyway does the same thing and requires 
a cast, so there's no point


> On Sept. 8, 2017, 6:24 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java
> > Lines 148 (patched)
> > 
> >
> > May be we should start using TypeDescription everywhere. OrcProto.Type 
> > can be huge object when compared to TypeDescription.

added HIVE-17525


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60289/#review184992
---


On Sept. 1, 2017, 12:41 a.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60289/
> ---
> 
> (Updated Sept. 1, 2017, 12:41 a.m.)
> 
> 
> Review request for hive, Gopal V and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e4b09a2cdd 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/cache/EvictionDispatcher.java
>  c5248ceb5f 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java 
> f42622b892 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcColumnVectorProducer.java
>  6edd84b8b0 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
>  b5db3029d1 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java
>  dc053ee7cf 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileMetadata.java
>  b9d7a77d5b 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcMetadataCache.java
>  601b622b49 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcStripeMetadata.java
>  4565d11988 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/ParquetMetadataCacheImpl.java
>  b61a8ca022 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java
>  13c7767a3b 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestOrcMetadataCache.java
>  03a955c6f7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 69a9f9f35e 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReader.java 
> 7540e72b53 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
> 690cce798e 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/Reader.java cdd58df370 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/ReaderImpl.java 
> d47ba6b31a 
>   ql/src/test/results/clientpositive/llap/orc_llap_counters.q.out 8af84dce19 
>   ql/src/test/results/clientpositive/llap/orc_llap_counters1.q.out 4536cbbfb9 
>   ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 77b7f5a2f7 
>   ql/src/test/results/clientpositive/llap/orc_ppd_schema_evol_3a.q.out 
> b799527e30 
>   
> 

[jira] [Created] (HIVE-17524) instrument LLAP metadata cache with separate counters by format

2017-09-12 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-17524:
---

 Summary: instrument LLAP metadata cache with separate counters by 
format
 Key: HIVE-17524
 URL: https://issues.apache.org/jira/browse/HIVE-17524
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


Followup from HIVE-15665. ORC file tails, Parquet file tails, and ORC stripe 
tails should be separately counted for jmx/iomem output. Technically cache 
should know nothing about what bytebuffer is what, so perhaps it should be by 
key type, and key types should be separated between the 3 more explicitly 
(including the replacement on Long HDFS inode based keys with 2 separate 
wrappers for primitive long).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] hive pull request #249: [HIVE-17523] Fix insert into bug

2017-09-12 Thread b-slim
GitHub user b-slim opened a pull request:

https://github.com/apache/hive/pull/249

[HIVE-17523] Fix insert into bug

https://issues.apache.org/jira/browse/HIVE-17523

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/b-slim/hive fix_insert_into

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #249


commit 579483ee878a16491d61c0eada5891880f507302
Author: Slim Bouguerra 
Date:   2017-09-09T00:12:14Z

set max tries to zero to make test faster.

Change-Id: Ia1523f3b565f4b08c76067fa0bac32d5171fb46e

commit ca5c7ccb1cc41dc3c8c76ccefb2d05b7d096a4ca
Author: Slim Bouguerra 
Date:   2017-09-11T22:39:40Z

Make insert into use data segment pusher to avoid duplication of logic some 
refactoring of exeception logging and handeling

Change-Id: I7bd8f29a83720f4cfba338acf27fb85b9774eafe

commit 3f2dd91c04839df7c068e1608b3e2af98babdc11
Author: Slim Bouguerra 
Date:   2017-09-13T01:04:05Z

cleaning and refactor

Change-Id: I9e2b14e6e32af095d2c4d2f9f1fbe8a9cced30ad




---


[jira] [Created] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-12 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17523:
-

 Summary: Insert into druid table  hangs Hive server2 in an infinit 
loop
 Key: HIVE-17523
 URL: https://issues.apache.org/jira/browse/HIVE-17523
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Inserting data via insert into table backed by druid can lead to a Hive server 
hang.
This is due to some bug in the naming of druid segments partitions.
To reproduce the issue 
{code}
drop table login_hive;
create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
double);
insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);

insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);

insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);

insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);


drop table login_druid;
CREATE TABLE login_druid
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
"druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
AS
select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
select * FROM login_druid;

insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
{code}

This patch unifies the logic of pushing and segments naming by using Druid data 
segment pusher as much as possible.
This patch also has some minor code refactoring and test enhancements.
 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62091: HIVE-17386 support LLAP workload management in HS2 (low level only)

2017-09-12 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62091/
---

(Updated Sept. 13, 2017, 1:04 a.m.)


Review request for hive, Zhiyuan Yang, Gunther Hagleitner, and Siddharth Seth.


Changes
---

removing protobuf changes, please diff iterations 2 and 4, ignore 3


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 24c5db0e47 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java b3677322ca 
  
llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java
 b6501842e8 
  llap-client/src/java/org/apache/hadoop/hive/registry/impl/TezAmInstance.java 
a71904cf34 
  llap-client/src/test/org/apache/hadoop/hive/llap/TestAsyncPbRpcProxy.java 
1c4f0e7a09 
  llap-common/src/java/org/apache/hadoop/hive/llap/AsyncPbRpcProxy.java 
7726794fea 
  
llap-common/src/java/org/apache/hadoop/hive/llap/impl/LlapPluginProtocolClientImpl.java
 19e81e6fa5 
  llap-common/src/java/org/apache/hadoop/hive/llap/impl/ProtobufProxy.java 
fa99536bea 
  llap-common/src/protobuf/LlapPluginProtocol.proto 39349b119d 
  
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
 26747fc5ca 
  
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/endpoint/LlapPluginServerImpl.java
 4d5333f995 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 93a36c612d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/AmPluginNode.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClient.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClientImpl.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/QueryAllocationManager.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 6e8122dc85 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
9f721553d6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java 
8ecdbbf999 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 
170de2143d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java e6e236de6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WmTezSession.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 
9e2846ca6c 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LlapClusterStateForCompile.java
 7a02a563e9 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 
4e5d99134b 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestGuaranteedTaskAllocator.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 
5e1e68cfa8 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 9b9eead0af 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java 
PRE-CREATION 
  service/src/java/org/apache/hive/service/server/HiveServer2.java e5f449122b 


Diff: https://reviews.apache.org/r/62091/diff/4/

Changes: https://reviews.apache.org/r/62091/diff/3-4/


Testing
---


Thanks,

Sergey Shelukhin



Re: Review Request 62091: HIVE-17386 support LLAP workload management in HS2 (low level only)

2017-09-12 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62091/
---

(Updated Sept. 13, 2017, 12:59 a.m.)


Review request for hive, Zhiyuan Yang, Gunther Hagleitner, and Siddharth Seth.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 24c5db0e47 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java b3677322ca 
  
llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java
 b6501842e8 
  llap-client/src/java/org/apache/hadoop/hive/registry/impl/TezAmInstance.java 
a71904cf34 
  llap-client/src/test/org/apache/hadoop/hive/llap/TestAsyncPbRpcProxy.java 
1c4f0e7a09 
  
llap-common/src/gen/protobuf/gen-java/org/apache/hadoop/hive/llap/plugin/rpc/LlapPluginProtocolProtos.java
 61eb21afb0 
  llap-common/src/java/org/apache/hadoop/hive/llap/AsyncPbRpcProxy.java 
7726794fea 
  
llap-common/src/java/org/apache/hadoop/hive/llap/impl/LlapPluginProtocolClientImpl.java
 19e81e6fa5 
  llap-common/src/java/org/apache/hadoop/hive/llap/impl/ProtobufProxy.java 
fa99536bea 
  llap-common/src/protobuf/LlapPluginProtocol.proto 39349b119d 
  
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
 26747fc5ca 
  
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/endpoint/LlapPluginServerImpl.java
 4d5333f995 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 93a36c612d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/AmPluginNode.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClient.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClientImpl.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/QueryAllocationManager.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 6e8122dc85 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
9f721553d6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java 
8ecdbbf999 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 
170de2143d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java e6e236de6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WmTezSession.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 
9e2846ca6c 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LlapClusterStateForCompile.java
 7a02a563e9 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 
4e5d99134b 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestGuaranteedTaskAllocator.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 
5e1e68cfa8 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 9b9eead0af 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java 
PRE-CREATION 
  service/src/java/org/apache/hive/service/server/HiveServer2.java e5f449122b 


Diff: https://reviews.apache.org/r/62091/diff/3/

Changes: https://reviews.apache.org/r/62091/diff/2-3/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-17522) cleanup old 'repl dump' dirs

2017-09-12 Thread Tao Li (JIRA)
Tao Li created HIVE-17522:
-

 Summary: cleanup old 'repl dump' dirs
 Key: HIVE-17522
 URL: https://issues.apache.org/jira/browse/HIVE-17522
 Project: Hive
  Issue Type: Bug
  Components: repl
Reporter: Tao Li
Assignee: Tao Li


We want to clean up the old dump dirs to save space and reduce scan time when 
needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Anthony Hsu via Review Board


> On 九月 12, 2017, 5:02 p.m., Ratandeep Ratti wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
> > Line 305 (original), 305 (patched)
> > 
> >
> > This comment is misleading now and can be removed.

Carl fixed this before committing. Thanks, Carl!


- Anthony


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62247/#review185212
---


On 九月 12, 2017, 10:43 p.m., Anthony Hsu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62247/
> ---
> 
> (Updated 九月 12, 2017, 10:43 p.m.)
> 
> 
> Review request for hive, Carl Steinbach and Ratandeep Ratti.
> 
> 
> Bugs: HIVE-17394
> https://issues.apache.org/jira/browse/HIVE-17394
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Previously, when Avro found a nullable union in the reader schema, it would 
> regenerate the TypeInfo for the field for every record. This patch reuses the 
> same TypeInfo that only needs to be calculated once.
> 
> In our testing, we found this improved count() queries by 2x.
> 
> 
> Diffs
> -
> 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
> ecfe15f59dac04bda3f8f1275babebf736608a6b 
> 
> 
> Diff: https://reviews.apache.org/r/62247/diff/2/
> 
> 
> Testing
> ---
> 
> `mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded.
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>



Re: Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Anthony Hsu via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62247/
---

(Updated 九月 12, 2017, 10:43 p.m.)


Review request for hive, Carl Steinbach and Ratandeep Ratti.


Changes
---

Addressed Ratandeep's comment.


Bugs: HIVE-17394
https://issues.apache.org/jira/browse/HIVE-17394


Repository: hive-git


Description
---

Previously, when Avro found a nullable union in the reader schema, it would 
regenerate the TypeInfo for the field for every record. This patch reuses the 
same TypeInfo that only needs to be calculated once.

In our testing, we found this improved count() queries by 2x.


Diffs (updated)
-

  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
ecfe15f59dac04bda3f8f1275babebf736608a6b 


Diff: https://reviews.apache.org/r/62247/diff/2/

Changes: https://reviews.apache.org/r/62247/diff/1-2/


Testing
---

`mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded.


Thanks,

Anthony Hsu



Re: Review Request 62091: HIVE-17386 support LLAP workload management in HS2 (low level only)

2017-09-12 Thread Sergey Shelukhin


> On Sept. 12, 2017, 6:56 p.m., Zhiyuan Yang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java
> > Lines 191 (patched)
> > 
> >
> > How would AM registry help in AM recovery? If that's not the case, this 
> > piece means any update during AM failure & recovery will fail the session, 
> > which make AM recovery in vain.

What do you mean by AM recovery? Reopening the session would produce a new 
session object in the pool.


> On Sept. 12, 2017, 6:56 p.m., Zhiyuan Yang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java
> > Lines 201-215 (patched)
> > 
> >
> > You are really determined to knock out that field...

This is gone now.


> On Sept. 12, 2017, 6:56 p.m., Zhiyuan Yang wrote:
> > service/src/java/org/apache/hive/service/server/HiveServer2.java
> > Lines 169 (patched)
> > 
> >
> > Where is the code that really put this wm instance in use? Additional 
> > jira?

It's used thru the global, see getInstance called from TezTask. I have a 
separate patch to get rid of some globals in HS2 that may make it more clear.


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62091/#review185018
---


On Sept. 5, 2017, 6:52 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62091/
> ---
> 
> (Updated Sept. 5, 2017, 6:52 p.m.)
> 
> 
> Review request for hive, Zhiyuan Yang, Gunther Hagleitner, and Siddharth Seth.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6de07d2e76 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> b3677322ca 
>   
> llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java
>  b6501842e8 
>   llap-client/src/test/org/apache/hadoop/hive/llap/TestAsyncPbRpcProxy.java 
> 1c4f0e7a09 
>   llap-common/src/java/org/apache/hadoop/hive/llap/AsyncPbRpcProxy.java 
> 7726794fea 
>   
> llap-common/src/java/org/apache/hadoop/hive/llap/impl/LlapPluginProtocolClientImpl.java
>  19e81e6fa5 
>   llap-common/src/java/org/apache/hadoop/hive/llap/impl/ProtobufProxy.java 
> fa99536bea 
>   llap-common/src/protobuf/LlapPluginProtocol.proto 39349b119d 
>   
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
>  cf8bd469dc 
>   
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/endpoint/LlapPluginServerImpl.java
>  f3c0d5213f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 93a36c612d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClient.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClientImpl.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/QueryAllocationManager.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 
> 4f58565a4c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
> 1f4705c083 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java 
> 005eeedc02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 
> fe5c6a1e45 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java f1f10286a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WmTezSession.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 
> 9e2846ca6c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LlapClusterStateForCompile.java
>  7a02a563e9 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 
> 973c0cc630 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestGuaranteedTaskAllocator.java
>  PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 
> d2b98c46ca 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java e5f449122b 
> 
> 
> Diff: https://reviews.apache.org/r/62091/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



[jira] [Created] (HIVE-17521) Improve defaults for few runtime configs

2017-09-12 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-17521:
---

 Summary: Improve defaults for few runtime configs
 Key: HIVE-17521
 URL: https://issues.apache.org/jira/browse/HIVE-17521
 Project: Hive
  Issue Type: Task
  Components: Configuration
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62228: HIVE-17495: CachedStore: prewarm improvements, refactoring and caching some aggregate stats

2017-09-12 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62228/#review185229
---




metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 258 (original), 264 (patched)


If we want to reduce the num of sql statement, shall we do the same single 
fetch to all table statistics for symmetry?



metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 295 (patched)


It should be better to get both all/allbutdefault statistics from 
mergeColStatsForPartitions, rather than invoke get_aggr_stats_for twice, that 
would be colstly.


- Daniel Dai


On Sept. 11, 2017, 9:25 p.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62228/
> ---
> 
> (Updated Sept. 11, 2017, 9:25 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Thejas Nair.
> 
> 
> Bugs: HIVE-17495
> https://issues.apache.org/jira/browse/HIVE-17495
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-17495
> 
> 
> Diffs
> -
> 
>   
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
>  8d861e4 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> dc1245e 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
> bbe13fd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 3053dcb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 71982a0 
>   metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 
> 3ba81ce 
>   metastore/src/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 
> 80b17e0 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/BinaryColumnStatsAggregator.java
>  e6c836b 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/BooleanColumnStatsAggregator.java
>  a34bc9f 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/ColumnStatsAggregator.java
>  a52e5e5 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/ColumnStatsAggregatorFactory.java
>  dfae708 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DateColumnStatsAggregator.java
>  ee95396 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DecimalColumnStatsAggregator.java
>  284c12c 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DoubleColumnStatsAggregator.java
>  bb4a725 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/LongColumnStatsAggregator.java
>  5b1145e 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/StringColumnStatsAggregator.java
>  1b29f92 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
>  4db203d 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  fb16cfc 
> 
> 
> Diff: https://reviews.apache.org/r/62228/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>



Re: Review Request 62091: HIVE-17386 support LLAP workload management in HS2 (low level only)

2017-09-12 Thread Zhiyuan Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62091/#review185018
---




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 2385-2386 (patched)


Should mention setting this conf means enable workload management



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java
Lines 101-106 (patched)


Why is this here given it's already a daemon thread



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java
Lines 147 (patched)


Additional define statement will be better.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java
Lines 191 (patched)


How would AM registry help in AM recovery? If that's not the case, this 
piece means any update during AM failure & recovery will fail the session, 
which make AM recovery in vain.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java
Lines 201-215 (patched)


You are really determined to knock out that field...



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClientImpl.java
Lines 61 (patched)


git apply complains

HIVE-17386.02.patch:1162: trailing whitespace.
  }
warning: 1 line adds whitespace errors.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java
Lines 220 (patched)


Wrong log message



service/src/java/org/apache/hive/service/server/HiveServer2.java
Lines 169 (patched)


Where is the code that really put this wm instance in use? Additional jira?


- Zhiyuan Yang


On Sept. 5, 2017, 6:52 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62091/
> ---
> 
> (Updated Sept. 5, 2017, 6:52 p.m.)
> 
> 
> Review request for hive, Zhiyuan Yang, Gunther Hagleitner, and Siddharth Seth.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6de07d2e76 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> b3677322ca 
>   
> llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java
>  b6501842e8 
>   llap-client/src/test/org/apache/hadoop/hive/llap/TestAsyncPbRpcProxy.java 
> 1c4f0e7a09 
>   llap-common/src/java/org/apache/hadoop/hive/llap/AsyncPbRpcProxy.java 
> 7726794fea 
>   
> llap-common/src/java/org/apache/hadoop/hive/llap/impl/LlapPluginProtocolClientImpl.java
>  19e81e6fa5 
>   llap-common/src/java/org/apache/hadoop/hive/llap/impl/ProtobufProxy.java 
> fa99536bea 
>   llap-common/src/protobuf/LlapPluginProtocol.proto 39349b119d 
>   
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
>  cf8bd469dc 
>   
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/endpoint/LlapPluginServerImpl.java
>  f3c0d5213f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 93a36c612d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClient.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClientImpl.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/QueryAllocationManager.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 
> 4f58565a4c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
> 1f4705c083 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java 
> 005eeedc02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 
> fe5c6a1e45 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java f1f10286a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WmTezSession.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 
> 9e2846ca6c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LlapClusterStateForCompile.java
>  7a02a563e9 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 
> 973c0cc630 
>   
> 

[jira] [Created] (HIVE-17520) Temp table with CTAS tries to create staging directory in the actual database directory

2017-09-12 Thread Jason Dere (JIRA)
Jason Dere created HIVE-17520:
-

 Summary: Temp table with CTAS tries to create staging directory in 
the actual database directory
 Key: HIVE-17520
 URL: https://issues.apache.org/jira/browse/HIVE-17520
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Jason Dere
Assignee: Jason Dere


User does not have FS permissions on the database directory.
Note that normal CREATE TEMPORARY TABLE (no CTAS) on that database does work.
However trying temp table with CTAS fails with the following error:

{noformat}
hive> create temporary table jdere_temp as select * from tpch_text_1000.nation;
FAILED: SemanticException 0:0 Error creating temporary folder on: 
hdfs://cn105-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/tpch_text_1000.db.
 Error encountered near token 'TOK_TMP_FILE'
{noformat}

Simple fix would be to set the staging directory to the temp table location for 
the case of temp tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Ratandeep Ratti

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62247/#review185212
---




serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
Line 305 (original), 305 (patched)


This comment is misleading now and can be removed.


- Ratandeep Ratti


On Sept. 12, 2017, 3:04 p.m., Anthony Hsu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62247/
> ---
> 
> (Updated Sept. 12, 2017, 3:04 p.m.)
> 
> 
> Review request for hive, Carl Steinbach and Ratandeep Ratti.
> 
> 
> Bugs: HIVE-17394
> https://issues.apache.org/jira/browse/HIVE-17394
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Previously, when Avro found a nullable union in the reader schema, it would 
> regenerate the TypeInfo for the field for every record. This patch reuses the 
> same TypeInfo that only needs to be calculated once.
> 
> In our testing, we found this improved count() queries by 2x.
> 
> 
> Diffs
> -
> 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
> ecfe15f59dac04bda3f8f1275babebf736608a6b 
> 
> 
> Diff: https://reviews.apache.org/r/62247/diff/1/
> 
> 
> Testing
> ---
> 
> `mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded.
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>



Re: Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Ratandeep Ratti

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62247/#review185210
---


Ship it!




LGTM

- Ratandeep Ratti


On Sept. 12, 2017, 3:04 p.m., Anthony Hsu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62247/
> ---
> 
> (Updated Sept. 12, 2017, 3:04 p.m.)
> 
> 
> Review request for hive, Carl Steinbach and Ratandeep Ratti.
> 
> 
> Bugs: HIVE-17394
> https://issues.apache.org/jira/browse/HIVE-17394
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Previously, when Avro found a nullable union in the reader schema, it would 
> regenerate the TypeInfo for the field for every record. This patch reuses the 
> same TypeInfo that only needs to be calculated once.
> 
> In our testing, we found this improved count() queries by 2x.
> 
> 
> Diffs
> -
> 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
> ecfe15f59dac04bda3f8f1275babebf736608a6b 
> 
> 
> Diff: https://reviews.apache.org/r/62247/diff/1/
> 
> 
> Testing
> ---
> 
> `mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded.
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>



Re: Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62247/#review185195
---


Ship it!




+1

- Carl Steinbach


On Sept. 12, 2017, 3:04 p.m., Anthony Hsu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62247/
> ---
> 
> (Updated Sept. 12, 2017, 3:04 p.m.)
> 
> 
> Review request for hive, Carl Steinbach and Ratandeep Ratti.
> 
> 
> Bugs: HIVE-17394
> https://issues.apache.org/jira/browse/HIVE-17394
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Previously, when Avro found a nullable union in the reader schema, it would 
> regenerate the TypeInfo for the field for every record. This patch reuses the 
> same TypeInfo that only needs to be calculated once.
> 
> In our testing, we found this improved count() queries by 2x.
> 
> 
> Diffs
> -
> 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
> ecfe15f59dac04bda3f8f1275babebf736608a6b 
> 
> 
> Diff: https://reviews.apache.org/r/62247/diff/1/
> 
> 
> Testing
> ---
> 
> `mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded.
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>



Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Anthony Hsu via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62247/
---

Review request for hive, Carl Steinbach and Ratandeep Ratti.


Bugs: HIVE-17394
https://issues.apache.org/jira/browse/HIVE-17394


Repository: hive-git


Description
---

Previously, when Avro found a nullable union in the reader schema, it would 
regenerate the TypeInfo for the field for every record. This patch reuses the 
same TypeInfo that only needs to be calculated once.

In our testing, we found this improved count() queries by 2x.


Diffs
-

  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
ecfe15f59dac04bda3f8f1275babebf736608a6b 


Diff: https://reviews.apache.org/r/62247/diff/1/


Testing
---

`mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded.


Thanks,

Anthony Hsu



[GitHub] hive pull request #248: HIVE-17494: Bootstrap REPL DUMP throws exception if ...

2017-09-12 Thread sankarh
GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/248

HIVE-17494: Bootstrap REPL DUMP throws exception if a partitioned table is 
dropped while reading partitions.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-17494

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/248.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #248


commit 96c96dbf25b3dd6b8a3fcffc12c383a5b78152cf
Author: Sankar Hariappan 
Date:   2017-09-12T13:35:19Z

HIVE-17494: Bootstrap REPL DUMP throws exception if a partitioned table is 
dropped while reading partitions.




---


[jira] [Created] (HIVE-17519) Transpose column stats display

2017-09-12 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-17519:
---

 Summary: Transpose column stats display
 Key: HIVE-17519
 URL: https://issues.apache.org/jira/browse/HIVE-17519
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


currently {{describe formatted table1 insert_num}} shows the column 
informations in a table like format...which is very hard to read - because 
there are to many columns

{code}
# col_name  data_type   min max 
num_nulls   distinct_count  avg_col_len 
max_col_len num_trues   num_falses  
comment bitVector   

 
insert_num  int 


from deserializer   

{code}

I think it would be better to show the same information like this:

{code}
col_nameinsert_num  
data_type   int 
min 
max 
num_nulls   
distinct_count  
avg_col_len 
max_col_len 
num_trues   
num_falses  
comment from deserializer   
bitVector   
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17518) Empty stages in index_auto_update.q.out

2017-09-12 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-17518:
---

 Summary: Empty stages in index_auto_update.q.out
 Key: HIVE-17518
 URL: https://issues.apache.org/jira/browse/HIVE-17518
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


There are some seemingly odd "empty" stages in the explains of 
{{index_auto_update.q.out}}...

{code}

  Stage: Stage-4

  Stage: Stage-5

{code}

this might be nothing serious...however it looks odd to me...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17517) Order explain stages in plans by stage idx

2017-09-12 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-17517:
---

 Summary: Order explain stages in plans by stage idx
 Key: HIVE-17517
 URL: https://issues.apache.org/jira/browse/HIVE-17517
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


Currently the order is seemingly undefined...it would possible make it easier 
to compare q.outs; and at the same time could possibly aid users who are 
looking for a specific "Stage" in their plans...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17516) There is something wrong that table data size don't obviously decrease when change rcfile to orc .

2017-09-12 Thread TODD (JIRA)
TODD created HIVE-17516:
---

 Summary: There is something wrong that table data size don't 
obviously decrease when change rcfile to orc .
 Key: HIVE-17516
 URL: https://issues.apache.org/jira/browse/HIVE-17516
 Project: Hive
  Issue Type: Bug
Reporter: TODD






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Skip rows while reading data from the table

2017-09-12 Thread Pavan Kumar Prakash Savanur
Hello,

I am new to Hive. Is there a way to skip rows while reading a table. For
example, when I use select * from mytable, I want it to output while
skipping a few rows.

Thanks


CustomRecordReader in Hive query

2017-09-12 Thread Pavan Kumar Prakash Savanur
I have written a CustomRecordReader which skips records randomly. I want to
write a hive query which uses my CustomRecordReader. How do i do that?


[jira] [Created] (HIVE-17515) Use SHA-256 for GenericUDFMaskHash to improve security

2017-09-12 Thread Tao Li (JIRA)
Tao Li created HIVE-17515:
-

 Summary: Use SHA-256 for GenericUDFMaskHash to improve security
 Key: HIVE-17515
 URL: https://issues.apache.org/jira/browse/HIVE-17515
 Project: Hive
  Issue Type: Sub-task
  Components: UDF
Reporter: Tao Li
Assignee: Tao Li






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17514) Use SHA-256 for cookie signer to improve security

2017-09-12 Thread Tao Li (JIRA)
Tao Li created HIVE-17514:
-

 Summary: Use SHA-256 for cookie signer to improve security
 Key: HIVE-17514
 URL: https://issues.apache.org/jira/browse/HIVE-17514
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Tao Li
Assignee: Tao Li






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17513) Refactor PathUtils to not contain state (instance fields)

2017-09-12 Thread Tao Li (JIRA)
Tao Li created HIVE-17513:
-

 Summary: Refactor PathUtils to not contain state (instance fields)
 Key: HIVE-17513
 URL: https://issues.apache.org/jira/browse/HIVE-17513
 Project: Hive
  Issue Type: Improvement
  Components: repl
Reporter: Tao Li
Assignee: Tao Li
Priority: Minor


This util class should just provide the static helper methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Any hooks to invoke the custom database's statistics for aggregate hive queries

2017-09-12 Thread Amey Barve
Hi All,

We have developed a custom storgeHandler implementing *HiveStorageHandler*.
We also have Api's/statistics for totalCount, max, min etc. for the data
stored in our database.

See below example queries:
1. select count(*) from my_table;
2. select max(id_column) from my_table;

So for above queries instead of full table scan, storageHandler should be
able to invoke our totalCount, max, min etc. methods

So are there any hooks to invoke these statistics api's for aggregate hive
queries which can do simple look up of these statistics?

Thanks,
Amey


Re: Review Request 62228: HIVE-17495: CachedStore: prewarm improvements, refactoring and caching some aggregate stats

2017-09-12 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62228/#review185136
---




metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
Lines 1432-1433 (original)


It will be useful to retain (improved) comment.



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
Line 1977 (original), 1977 (patched)


Better comment: Group stats by colName for each partition



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
Lines 1993 (patched)


LOG.debug



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
Line 1999 (original), 2007 (patched)


for number of threads, better logic could be
Math.min(colStatsMap.size(), Runtime.getRuntime().availableProcessors())



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
Line 2008 (original), 2010 (patched)


LOG.debug



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
Lines 2024 (patched)


Can remove this



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
Lines 2025 (patched)


LOG.debug(e.getMessage())



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
Line 2025 (original), 2035 (patched)


future will never be null



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
Line 2027 (original), 2039 (patched)


Better to keep pool.shutdownNow()
and remove e.printsTacktrace()



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
Lines 2045 (patched)


LOG.debug



metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 275 (patched)


Here we are listing all partitions for table and than we immediately aggr 
stats for all partitions. Another (better) way is to not retrieve partNames and 
do a sql query to aggr stats for partitions by partFilterExpr. Essentially 
get_aggr_stats_for(dbName, tblName, partFilterExpr).
Here, partFilterExpr = * 
That will allow many roundtrips to backend DB.



metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 295 (patched)


And here it will be partFilterExpr = partNames not in (defaultPartition)



metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 267 (original), 301 (patched)


Useful to log time taken to prewarm:
 LOG.info("Time taken to prewarm: " + 
(System.currentTimeMillis()-start)/1000);



metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 1535 (original), 1569 (patched)


Caller from CachedStore made this call before calling this method. Might as 
well pass from there.



metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 1536 (original), 1570 (patched)


This if condition will always be true for cachedstore prewarm invocation.
can you please add comments for that.



metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 1588 (patched)


LOG.debug



metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 1563 (original), 1623 (patched)


LOG.debug and
Previous construction of msg was better.



metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 1630 (patched)


should this be < 1 instead of <=1 ?



metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 1631 (patched)


LOG.debug with {} instead of +



metastore/src/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
Lines 461 (patched)


{} instead of +



metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/ColumnStatsAggregatorFactory.java
Lines 44 (patched)


LOG.trace




[jira] [Created] (HIVE-17512) Not use doAs if distcp privileged user same as user running hive

2017-09-12 Thread anishek (JIRA)
anishek created HIVE-17512:
--

 Summary: Not use doAs if distcp privileged user same as user 
running hive
 Key: HIVE-17512
 URL: https://issues.apache.org/jira/browse/HIVE-17512
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 3.0.0
Reporter: anishek
Assignee: anishek
Priority: Minor
 Fix For: 3.0.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)