[jira] [Created] (HIVE-19283) Select count(distinct()) a couple of times stuck in last reducer

2018-04-23 Thread Goun Na (JIRA)
Goun Na created HIVE-19283:
--

 Summary: Select count(distinct()) a couple of times stuck in last 
reducer
 Key: HIVE-19283
 URL: https://issues.apache.org/jira/browse/HIVE-19283
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Affects Versions: 2.1.1
Reporter: Goun Na
Assignee: Ashutosh Chauhan


 

Distinct count query performance is significantly improved due to HIVE-10568. 
{code:java}
select count(distinct elevenst_id)
from 11st.log_table
where part_dt between '20180101' and '20180131'{code}
 

However, some queries that contain several distinct counts are still slow. It 
starts with multiple mappers, but stuck in the last one reducer.

 
{code:java}
select 
  count(distinct elevenst_id)
, count(distinct member_id)
, count(distinct user_id)
, count(distinct action_id)
, count(distinct other_id)
 from 11st.log_table
where part_dt between '20180101' and '20180131'{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19282) don't nest delta directories inside LB directories for ACID tables

2018-04-23 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-19282:
---

 Summary: don't nest delta directories inside LB directories for 
ACID tables
 Key: HIVE-19282
 URL: https://issues.apache.org/jira/browse/HIVE-19282
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


org.apache.hadoop.hive.ql.metadata.HiveMetaStoreClientFactory

2018-04-23 Thread Elliot West
Hello,

I'm looking for an abstraction to use for integrating with different
(non-Thrift) metadata catalog implementations. I know that AWS Glue manages
this and so have explored in EMR (Hive 2.3.2) a little. I see that it uses
the "org.apache.hadoop.hive.ql.metadata.HiveMetaStoreClientFactory"
interface to do this. However, I cannot find this class anywhere in vanilla
Apache Hive.

Is this an Amazon specific construct (if so then why is it namespaced to
org.apache.hadoop.hive?) or are my code searching abilities failing me.
Does this class exist in Apache Hive, and if so, where? (A link in GitHub
would be appreciated).

Cheers,

Elliot.


Re: Review Request 63972: [HIVE-18037] Migrate Slider LLAP package to YARN Service framework for Hadoop 3.x

2018-04-23 Thread Gour Saha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63972/
---

(Updated April 24, 2018, 1:16 a.m.)


Review request for hive and Sergey Shelukhin.


Changes
---

Patch HIVE-18037.004.patch


Bugs: HIVE-18037
https://issues.apache.org/jira/browse/HIVE-18037


Repository: hive-git


Description
---

First phase of migration of slider based llap app-package to YARN Services in 
Hadoop 3.x. There will be follow up changes to migrate status, log links, 
diagnostics and completely eliminate Slider dependency.


Diffs (updated)
-

  bin/ext/llap.sh 0462d265c8 
  binary-package-licenses/README c801896663 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 73492ff99c 
  jdbc/pom.xml 3c23a75492 
  
llap-client/src/java/org/apache/hadoop/hive/llap/registry/LlapServiceInstance.java
 30b1810c4e 
  llap-server/bin/llapDaemon.sh 4945473a0e 
  llap-server/changes_for_non_slider_install.txt ec20fe1a25 
  llap-server/pom.xml 6928f7703b 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapOptionsProcessor.java 
c906a5da79 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapServiceDriver.java 
3eaaed716e 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapSliderUtils.java 
8e5ae09359 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusServiceDriver.java
 65b4d81000 
  llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java 
d2e93963e0 
  llap-server/src/main/resources/llap.py 26756cefe5 
  llap-server/src/main/resources/package.py 21c34e9b97 
  llap-server/src/main/resources/params.py 8972ba10d2 
  llap-server/src/main/resources/templates.py 3d747a2c5b 
  packaging/src/main/assembly/bin.xml 5d934ac53a 
  pom.xml 21ce5cbff1 


Diff: https://reviews.apache.org/r/63972/diff/2/

Changes: https://reviews.apache.org/r/63972/diff/1-2/


Testing
---

Package created and successfully deployed in a Hadoop 3.0 cluster, using cmd 
line shell script and programatically via Java APIs.


File Attachments


HIVE-18037.001.patch
  
https://reviews.apache.org/media/uploaded/files/2017/11/21/e0844c04-be9b-4334-80b0-bae05e9ed885__HIVE-18037.001.patch


Thanks,

Gour Saha



[jira] [Created] (HIVE-19281) incorrect protocol name for LLAP AM plugin

2018-04-23 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-19281:
---

 Summary: incorrect protocol name for LLAP AM plugin
 Key: HIVE-19281
 URL: https://issues.apache.org/jira/browse/HIVE-19281
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66188: HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns

2018-04-23 Thread Aihua Xu via Review Board


> On March 22, 2018, 11:03 a.m., Peter Vary wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> > Lines 154-158 (original), 154-158 (patched)
> > 
> >
> > It might be a good idea, to use this around our batching as well:
> > - DatabaseProduct.needsInBatching(dbType)
> > 
> > What do you think @Aihua?

Thanks Peter for reviewing. This is slightly different problem. In directSQL, 
some databases may not need batch, some do. While in DN, the limitation is in 
DN, so it applies to all the databases.


- Aihua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66188/#review199751
---


On April 23, 2018, 10:51 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66188/
> ---
> 
> (Updated April 23, 2018, 10:51 p.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Yongzhi Chen.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> If the table contains a lot of columns e.g, 5k, simple table rename would 
> fail with the following stack trace. The issue is datanucleus can't handle 
> the query with lots of colName='c1' && colName='c2' && ... .
> 
> I'm breaking the query into multiple smaller queries and then we aggregate 
> the result together.
> 
> 
> Diffs
> -
> 
>   ql/src/test/queries/clientpositive/alter_rename_table.q 53fb230cf6 
>   ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a28d8 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java
>  PRE-CREATION 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
>  997f5fdb88 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  125d5a79f2 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
>  59749e4947 
> 
> 
> Diff: https://reviews.apache.org/r/66188/diff/3/
> 
> 
> Testing
> ---
> 
> Manual test has been done for large column of tables.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: Review Request 66188: HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns

2018-04-23 Thread Aihua Xu via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66188/
---

(Updated April 23, 2018, 10:51 p.m.)


Review request for hive, Alexander Kolbasov and Yongzhi Chen.


Changes
---

Address comments.


Repository: hive-git


Description
---

If the table contains a lot of columns e.g, 5k, simple table rename would fail 
with the following stack trace. The issue is datanucleus can't handle the query 
with lots of colName='c1' && colName='c2' && ... .

I'm breaking the query into multiple smaller queries and then we aggregate the 
result together.


Diffs (updated)
-

  ql/src/test/queries/clientpositive/alter_rename_table.q 53fb230cf6 
  ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a28d8 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java
 PRE-CREATION 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
 997f5fdb88 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
 125d5a79f2 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
 59749e4947 


Diff: https://reviews.apache.org/r/66188/diff/3/

Changes: https://reviews.apache.org/r/66188/diff/2-3/


Testing
---

Manual test has been done for large column of tables.


Thanks,

Aihua Xu



[jira] [Created] (HIVE-19280) Invalid error messages for UPDATE/DELETE on insert-only transactional tables

2018-04-23 Thread Steve Yeom (JIRA)
Steve Yeom created HIVE-19280:
-

 Summary: Invalid error messages for UPDATE/DELETE on insert-only 
transactional tables
 Key: HIVE-19280
 URL: https://issues.apache.org/jira/browse/HIVE-19280
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 3.0.0
Reporter: Steve Yeom
Assignee: Steve Yeom
 Fix For: 3.0.0


UPDATE/DELETE on MM tables fails with 
"FAILED: SemanticException Error 10297: Attempt to do update or delete on table 
tpch.tbl_default_mm that is not transactional". 
This is invalid since the MM table is transactional. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-04-23 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/
---

(Updated April 23, 2018, 9:56 p.m.)


Review request for hive, Sahil Takiar and Vihang Karajgaonkar.


Bugs: HIVE-14388
https://issues.apache.org/jira/browse/HIVE-14388


Repository: hive-git


Description
---

Currently, when you run insert command on beeline, it returns a message saying 
"No rows affected .."
A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"

Added the numRows parameter as part of QueryState.
Adding the numRows to the response as well to display in beeline.

Getting the count in FileSinkOperator and setting it in statsMap, when it 
operates only on table specific rows for the particular operation. (so that we 
can get only the insert to table count and avoid counting non-table specific 
file-sink operations happening during query execution).


Diffs (updated)
-

  jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
06542cee02e5dc4696f2621bb45cc4f24c67dfda 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
9cb2ff101581d22965b447e82601970d909daefd 
  ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
cf9c2273159c0d779ea90ad029613678fb0967a6 
  ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
c084fa054cb771bfdb033d244935713e3c7eb874 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
fcdc9967f12a454a9d3f31031e2261f264479118 
  service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
b2b62c71492b844f4439367364c5c81aa62f3908 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
 15e8220eb3eb12b72c7b64029410dced33bc0d72 
  service-rpc/src/gen/thrift/gen-php/Types.php 
abb7c1ff3a2c8b72dc97689758266b675880e32b 
  service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
60183dae9e9927bd09a9676e49eeb4aea2401737 
  service/src/java/org/apache/hive/service/cli/CLIService.java 
c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
  service/src/java/org/apache/hive/service/cli/OperationStatus.java 
52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
  service/src/java/org/apache/hive/service/cli/operation/Operation.java 
3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
c64c99120ad21ee98af81ec6659a2722e3e1d1c7 


Diff: https://reviews.apache.org/r/66290/diff/3/

Changes: https://reviews.apache.org/r/66290/diff/2-3/


Testing
---


Thanks,

Bharathkrishna Guruvayoor Murali



Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-04-23 Thread Bharathkrishna Guruvayoor Murali via Review Board


- Bharathkrishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/#review201637
---


On April 18, 2018, 11:53 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66290/
> ---
> 
> (Updated April 18, 2018, 11:53 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-14388
> https://issues.apache.org/jira/browse/HIVE-14388
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, when you run insert command on beeline, it returns a message 
> saying "No rows affected .."
> A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"
> 
> Added the numRows parameter as part of QueryState.
> Adding the numRows to the response as well to display in beeline.
> 
> Getting the count in FileSinkOperator and setting it in statsMap, when it 
> operates only on table specific rows for the particular operation. (so that 
> we can get only the insert to table count and avoid counting non-table 
> specific file-sink operations happening during query execution).
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> 06542cee02e5dc4696f2621bb45cc4f24c67dfda 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> a88453c97835db847d74b4b4c3ef318d4d6c0ce5 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
> cf9c2273159c0d779ea90ad029613678fb0967a6 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
> 706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
> c084fa054cb771bfdb033d244935713e3c7eb874 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> fcdc9967f12a454a9d3f31031e2261f264479118 
>   service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
> 4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
> b2b62c71492b844f4439367364c5c81aa62f3908 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
>  15e8220eb3eb12b72c7b64029410dced33bc0d72 
>   service-rpc/src/gen/thrift/gen-php/Types.php 
> abb7c1ff3a2c8b72dc97689758266b675880e32b 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
> 0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
> 60183dae9e9927bd09a9676e49eeb4aea2401737 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 
> c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
>   service/src/java/org/apache/hive/service/cli/OperationStatus.java 
> 52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> c64c99120ad21ee98af81ec6659a2722e3e1d1c7 
> 
> 
> Diff: https://reviews.apache.org/r/66290/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-04-23 Thread Bharathkrishna Guruvayoor Murali via Review Board


- Bharathkrishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/#review200042
---


On April 18, 2018, 11:53 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66290/
> ---
> 
> (Updated April 18, 2018, 11:53 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-14388
> https://issues.apache.org/jira/browse/HIVE-14388
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, when you run insert command on beeline, it returns a message 
> saying "No rows affected .."
> A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"
> 
> Added the numRows parameter as part of QueryState.
> Adding the numRows to the response as well to display in beeline.
> 
> Getting the count in FileSinkOperator and setting it in statsMap, when it 
> operates only on table specific rows for the particular operation. (so that 
> we can get only the insert to table count and avoid counting non-table 
> specific file-sink operations happening during query execution).
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> 06542cee02e5dc4696f2621bb45cc4f24c67dfda 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> a88453c97835db847d74b4b4c3ef318d4d6c0ce5 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
> cf9c2273159c0d779ea90ad029613678fb0967a6 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
> 706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
> c084fa054cb771bfdb033d244935713e3c7eb874 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> fcdc9967f12a454a9d3f31031e2261f264479118 
>   service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
> 4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
> b2b62c71492b844f4439367364c5c81aa62f3908 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
>  15e8220eb3eb12b72c7b64029410dced33bc0d72 
>   service-rpc/src/gen/thrift/gen-php/Types.php 
> abb7c1ff3a2c8b72dc97689758266b675880e32b 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
> 0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
> 60183dae9e9927bd09a9676e49eeb4aea2401737 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 
> c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
>   service/src/java/org/apache/hive/service/cli/OperationStatus.java 
> 52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> c64c99120ad21ee98af81ec6659a2722e3e1d1c7 
> 
> 
> Diff: https://reviews.apache.org/r/66290/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-04-23 Thread Bharathkrishna Guruvayoor Murali via Review Board


> On March 27, 2018, 12:53 p.m., Peter Vary wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java
> > Line 712 (original), 712 (patched)
> > 
> >
> > Is it possible to behave differently, when we have information about 
> > the number of rows, and when we do not know anything? The returned number 
> > will be 0 in this case, which might cause interesting behavior I guess :)
> 
> Bharathkrishna Guruvayoor Murali wrote:
> This is called only when resultSet is not present (like in insert 
> queries). If it returns zero, it will display No rows affected, which is like 
> the current behavior.
> 
> Peter Vary wrote:
> So the code handles -1, and 0 in the same way?
> Previously we returned -1 indicating we do not have info about the 
> affected rows. No we always will return 0 if we do not have the exact info, 
> like when running HoS

I have changed the default value in QueryState for numModifiedRows to -1 from 0.
Currently it shows the same message on beeline for both -1 and 0, but I guess 
it is good to have default as -1 in case we have any use-case that needs to 
distinguish these two cases.


- Bharathkrishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/#review200042
---


On April 18, 2018, 11:53 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66290/
> ---
> 
> (Updated April 18, 2018, 11:53 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-14388
> https://issues.apache.org/jira/browse/HIVE-14388
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, when you run insert command on beeline, it returns a message 
> saying "No rows affected .."
> A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"
> 
> Added the numRows parameter as part of QueryState.
> Adding the numRows to the response as well to display in beeline.
> 
> Getting the count in FileSinkOperator and setting it in statsMap, when it 
> operates only on table specific rows for the particular operation. (so that 
> we can get only the insert to table count and avoid counting non-table 
> specific file-sink operations happening during query execution).
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> 06542cee02e5dc4696f2621bb45cc4f24c67dfda 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> a88453c97835db847d74b4b4c3ef318d4d6c0ce5 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
> cf9c2273159c0d779ea90ad029613678fb0967a6 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
> 706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
> c084fa054cb771bfdb033d244935713e3c7eb874 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> fcdc9967f12a454a9d3f31031e2261f264479118 
>   service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
> 4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
> b2b62c71492b844f4439367364c5c81aa62f3908 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
>  15e8220eb3eb12b72c7b64029410dced33bc0d72 
>   service-rpc/src/gen/thrift/gen-php/Types.php 
> abb7c1ff3a2c8b72dc97689758266b675880e32b 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
> 0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
> 60183dae9e9927bd09a9676e49eeb4aea2401737 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 
> c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
>   service/src/java/org/apache/hive/service/cli/OperationStatus.java 
> 52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> c64c99120ad21ee98af81ec6659a2722e3e1d1c7 
> 
> 
> Diff: https://reviews.apache.org/r/66290/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-04-23 Thread Bharathkrishna Guruvayoor Murali via Review Board


> On April 20, 2018, 4:34 p.m., Sahil Takiar wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java
> > Line 711 (original), 711 (patched)
> > 
> >
> > why change the method call? don't both methods return the same thing?
> > 
> > also its more like `numModifiedRows`, right?

Changed numRows to numModifiedRows.

The return value from the method waitForOperationToComplete() was never used 
here.
This method gets called once from execute(), and the boolean 
isOperationComplete will already be set to true.
Hence, when we call this method again, it returns the TGetOperationStatusResp 
as null.
So I think using the alternate method suffices the requirement here.


> On April 20, 2018, 4:34 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
> > Lines 431 (patched)
> > 
> >
> > When you look at the counters in the log, is there anything like 
> > `RECORDS_OUT_[table-name]` do we know how that gets populated?

I am populating the value for the counter I defined in the same way as the 
above mentioned counter is populated. I agree that it is like a duplicate for 
this counter.
But to get the value of the RECORDS_OUT_{ID}_{table_name} counter, I need to 
get the destTableId and tableName as present in the FileSinkOperator.
I could not find any straight-forward way to get these values. Hence, I thought 
of creating a counter with fixed name so that it can be used from places 
outside the Operator context.


- Bharathkrishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/#review201637
---


On April 18, 2018, 11:53 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66290/
> ---
> 
> (Updated April 18, 2018, 11:53 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-14388
> https://issues.apache.org/jira/browse/HIVE-14388
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, when you run insert command on beeline, it returns a message 
> saying "No rows affected .."
> A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"
> 
> Added the numRows parameter as part of QueryState.
> Adding the numRows to the response as well to display in beeline.
> 
> Getting the count in FileSinkOperator and setting it in statsMap, when it 
> operates only on table specific rows for the particular operation. (so that 
> we can get only the insert to table count and avoid counting non-table 
> specific file-sink operations happening during query execution).
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> 06542cee02e5dc4696f2621bb45cc4f24c67dfda 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> a88453c97835db847d74b4b4c3ef318d4d6c0ce5 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
> cf9c2273159c0d779ea90ad029613678fb0967a6 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
> 706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
> c084fa054cb771bfdb033d244935713e3c7eb874 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> fcdc9967f12a454a9d3f31031e2261f264479118 
>   service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
> 4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
> b2b62c71492b844f4439367364c5c81aa62f3908 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
>  15e8220eb3eb12b72c7b64029410dced33bc0d72 
>   service-rpc/src/gen/thrift/gen-php/Types.php 
> abb7c1ff3a2c8b72dc97689758266b675880e32b 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
> 0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
> 60183dae9e9927bd09a9676e49eeb4aea2401737 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 
> c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
>   service/src/java/org/apache/hive/service/cli/OperationStatus.java 
> 52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> c64c99120ad21ee98af81ec6659a2722e3e1d1c7 
> 
> 
> Diff: 

[jira] [Created] (HIVE-19279) remove magic directory skipping from CopyTask

2018-04-23 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-19279:
---

 Summary: remove magic directory skipping from CopyTask
 Key: HIVE-19279
 URL: https://issues.apache.org/jira/browse/HIVE-19279
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


Follow up from HIVE-17657.
Code exists in copytask that copies files (fancy that); however, when listing 
the files, if a single directory exists at the source with no other files, it 
will skip the directory and copy the files inside instead.
This directory in various tests happens to be the "data" directory from export, 
or some random partition directory ("foo=bar") that if not skipped makes it 
into the real partition directory at the destination.
It won't do that if any other files or directories are present.

This seems brittle. Caller of the CopyTask should specify exactly what it wants 
copied instead of relying on this behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66720: HIVE-17657 export/import for MM tables is broken

2018-04-23 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66720/
---

(Updated April 23, 2018, 9:18 p.m.)


Review request for hive and Eugene Koifman.


Repository: hive-git


Description
---

.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/CopyTask.java ce683c8a8d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExportTask.java aba65918f8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6395c31ec7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ce0757cba2 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java 
d3c62a2775 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
b850ddc9d0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 
820046388a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/PartitionExport.java 
5844f3d97f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/TableExport.java 
abb2e8874b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java 
866d3513b1 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CopyWork.java c0e4a43d9c 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExportWork.java 72ce79836c 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java 12d57c6feb 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java 0e53697be2 
  ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java a2adb966fe 
  ql/src/test/queries/clientpositive/mm_exim.q c47342bd23 
  ql/src/test/results/clientpositive/llap/mm_exim.q.out 1f40754373 


Diff: https://reviews.apache.org/r/66720/diff/3/

Changes: https://reviews.apache.org/r/66720/diff/2-3/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-19278) hive build warnings while compiling parser code

2018-04-23 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-19278:
--

 Summary: hive build warnings while compiling parser code
 Key: HIVE-19278
 URL: https://issues.apache.org/jira/browse/HIVE-19278
 Project: Hive
  Issue Type: Task
Reporter: Vineet Garg
Assignee: Vineet Garg


{noformat}
warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
Decision can match input such as "KW_CHECK {KW_EXISTS, KW_TINYINT}" using 
multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
Decision can match input such as "KW_CHECK KW_STRUCT LESSTHAN" using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
Decision can match input such as "KW_CHECK KW_DATETIME" using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
Decision can match input such as "KW_CHECK KW_DATE {LPAREN, StringLiteral}" 
using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
Decision can match input such as "KW_CHECK KW_UNIONTYPE LESSTHAN" using 
multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66485: HIVE-19124 implement a basic major compactor for MM tables

2018-04-23 Thread Sergey Shelukhin


> On April 23, 2018, 7:24 p.m., Gopal V wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> > Lines 356 (patched)
> > 
> >
> > Is that supposed to be a "," or a +?

,


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66485/#review201756
---


On April 23, 2018, 7:03 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66485/
> ---
> 
> (Updated April 23, 2018, 7:03 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2403d7ac6c 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java a35a215bfc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 4e10649136 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java dde20ed56e 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> b1c2288d01 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 22765b8e63 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java fe0aaa4ff5 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java 
> c053860b36 
>   
> standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
>  cb1d40a4a8 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
>  7b02865e18 
>   
> storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java
>  107ea9028a 
> 
> 
> Diff: https://reviews.apache.org/r/66485/diff/8/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



[jira] [Created] (HIVE-19277) Active/Passive HA web endpoints does not allow cross origin requests

2018-04-23 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-19277:


 Summary: Active/Passive HA web endpoints does not allow cross 
origin requests
 Key: HIVE-19277
 URL: https://issues.apache.org/jira/browse/HIVE-19277
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 3.0.0, 3.1.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


CORS is not allowed with web endpoints added for active/passive HA. Enable CORS 
by default for all web endpoints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66485: HIVE-19124 implement a basic major compactor for MM tables

2018-04-23 Thread Gopal V

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66485/#review201756
---




ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 356 (patched)


Is that supposed to be a "," or a +?


- Gopal V


On April 23, 2018, 7:03 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66485/
> ---
> 
> (Updated April 23, 2018, 7:03 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2403d7ac6c 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java a35a215bfc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 4e10649136 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java dde20ed56e 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> b1c2288d01 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 22765b8e63 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java fe0aaa4ff5 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java 
> c053860b36 
>   
> standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
>  cb1d40a4a8 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
>  7b02865e18 
>   
> storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java
>  107ea9028a 
> 
> 
> Diff: https://reviews.apache.org/r/66485/diff/8/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



Re: Review Request 66485: HIVE-19124 implement a basic major compactor for MM tables

2018-04-23 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66485/
---

(Updated April 23, 2018, 7:03 p.m.)


Review request for hive and Eugene Koifman.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2403d7ac6c 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 82ba775286 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java a35a215bfc 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 4e10649136 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java dde20ed56e 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
b1c2288d01 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 22765b8e63 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java fe0aaa4ff5 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java 
c053860b36 
  
standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
 cb1d40a4a8 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
 7b02865e18 
  
storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java 
107ea9028a 


Diff: https://reviews.apache.org/r/66485/diff/8/

Changes: https://reviews.apache.org/r/66485/diff/7-8/


Testing
---


Thanks,

Sergey Shelukhin



Re: Review Request 66485: HIVE-19124 implement a basic major compactor for MM tables

2018-04-23 Thread Sergey Shelukhin


> On April 23, 2018, 5:04 a.m., Gopal V wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> > Lines 353 (patched)
> > 
> >
> > Add a timestamp to the tmp-table and fail-retry if it already exists.
> > 
> > Dropping it might make it harder to debug this.

This is a temporary table... it may be gone anyway. If not, this might 
necessitate a follow-up for cleanup of these failed tables.


> On April 23, 2018, 5:04 a.m., Gopal V wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> > Lines 1236 (patched)
> > 
> >
> > Add comment about not needing locks because these are insert-only 
> > tables and the base writer doesn't need locks anyway.

Hmm? Can you elaborate? The tmp table is session scoped and not insert only, so 
we don't need any locks.
The original query locks are taken care of by driver. It's anyway a read query.


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66485/#review201717
---


On April 20, 2018, 11:15 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66485/
> ---
> 
> (Updated April 20, 2018, 11:15 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 536c7b427f 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 9cb2ff1015 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java c8cb8a40b4 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java dde20ed56e 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> b1c2288d01 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 22765b8e63 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java fe0aaa4ff5 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsForMmTable.java 
> c053860b36 
>   
> standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
>  cb1d40a4a8 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
>  7b02865e18 
>   
> storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java
>  107ea9028a 
> 
> 
> Diff: https://reviews.apache.org/r/66485/diff/7/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



[jira] [Created] (HIVE-19276) Metastore delegation tokens should return non-empty service field in delegation tokens

2018-04-23 Thread Yuxiang Chen (JIRA)
Yuxiang Chen created HIVE-19276:
---

 Summary: Metastore delegation tokens should return non-empty 
service field in delegation tokens
 Key: HIVE-19276
 URL: https://issues.apache.org/jira/browse/HIVE-19276
 Project: Hive
  Issue Type: Task
  Components: Hive
Reporter: Yuxiang Chen
Assignee: Yuxiang Chen


Metastore does not set the token signature by default in the metastore (returns 
an empty string), and thus the service field, which is assigned to the value of 
token signature, would also be empty. This would cause problem that the clients 
cannot retrieve the right token from the token file, using this service field.

Metastore delegation tokens currently return an empty string in the delegation 
token for the "service" field. The service field should be set to something 
that meaningfully identifies the metastore/cluster, so clients can find it in 
the token file.

Meanwhile, we also want to change the code on hive metastore to use metastore 
uris as the service field when token signature is empty. However, to make the 
changes effective, we also need to do similar modifications on the client side, 
as we mentioned above.

Thus to do this, we need to add the following logic:

(1) On the client side, the client will first try to get token using the 
current service field, if it gets nothing, we fallback to use 
"hive.metastore.uris" as the service field and retry token selection.

(2) On the metastore side, if the current token signature is not empty, we set 
the service field to be the value of token signature; otherwise, we use 
hive.metastore.uris as the service field.

 

The patch is in the attachment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-23 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/
---

(Updated April 23, 2018, 5:26 p.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Matt 
McCline.


Changes
---

Fixed a minor bug introduced in previous patch. Please disregard the version 
before it for comparison purpose.


Bugs: HIVE-18910
https://issues.apache.org/jira/browse/HIVE-18910


Repository: hive-git


Description
---

Hive uses JAVA hash which is not as good as murmur for better distribution and 
efficiency in bucketing a table.
Migrate to murmur hash but still keep backward compatibility for existing users 
so that they dont have to reload the existing tables.

To keep backward compatibility, bucket_version is added as a table property, 
resulting in high number of result updates.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2403d7ac6c 
  hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
153613e6d0 
  hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
  hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 924e233293 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
 5dd0b8ea5b 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
 ad14c7265f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 3733e3d02f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
 03c28a33c8 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
 996329195c 
  
hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
 f9ee9d9a03 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
 caa00292b8 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
ab8ad77074 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
 2b28a6677e 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
 cdb67dd786 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
 2c23a7e94f 
  
itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
 a1be085ea5 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 82ba775286 
  itests/src/test/resources/testconfiguration.properties 3aaa68b11f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d4363fdf91 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5fbe045df5 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
 a42c299537 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java
 ddb26e529e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
 86f466fc4e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 4077552a56 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
 1bc3fdabac 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 71498a125c 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java dc6cc62fbb 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 49c355be01 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
7121bceb22 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
 5f65f638ca 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
2be3c9b9a2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 1c5656267d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
 0e995d79d2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
 69d9f3125a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
068f25e75f 
  

[jira] [Created] (HIVE-19275) Vectorization: Wrong Results / Execution Failures when Vectorization turned on in Spark

2018-04-23 Thread Matt McCline (JIRA)
Matt McCline created HIVE-19275:
---

 Summary: Vectorization: Wrong Results / Execution Failures when 
Vectorization turned on in Spark
 Key: HIVE-19275
 URL: https://issues.apache.org/jira/browse/HIVE-19275
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 3.0.0, 3.1.0


Quite a number of the bucket* tests had Wrong Results or Execution Failures.

And others like semijoin, skewjoin, avro_decimal_native, mapjoin_addjar, 
mapjoin_decimal, nullgroup, decimal_join, mapjoin1.

Some of the problems might be as simple as "-- SORT_QUERY_RESULTS" is missing.

The bucket* problems looked more serious.

This change sets "hive.vectorized.execution.enabled" to false at the top of 
those Q files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: org.apache.hadoop.hive.ql.metadata.HiveMetaStoreClientFactory

2018-04-23 Thread Alan Gates
Your code searching skills are intact (or, at least this doesn't prove they
aren't).  That class doesn't exist in Hive and I assume was invented by
Amazon.  I assume they placed it in o.a.h.hive so they could access package
specific classes etc.  Apache doesn't prohibit using its name in package
names, only in maven artifacts.  If you're in contact with them you could
poke them to contribute it back.

Alan.

On Mon, Apr 23, 2018 at 2:17 AM, Elliot West  wrote:

> Hello,
>
> I'm looking for an abstraction to use for integrating with different
> (non-Thrift) metadata catalog implementations. I know that AWS Glue manages
> this and so have explored in EMR (Hive 2.3.2) a little. I see that it uses
> the "org.apache.hadoop.hive.ql.metadata.HiveMetaStoreClientFactory"
> interface to do this. However, I cannot find this class anywhere in vanilla
> Apache Hive.
>
> Is this an Amazon specific construct (if so then why is it namespaced to
> org.apache.hadoop.hive?) or are my code searching abilities failing me.
> Does this class exist in Apache Hive, and if so, where? (A link in GitHub
> would be appreciated).
>
> Cheers,
>
> Elliot.
>
>


Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-23 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/
---

(Updated April 23, 2018, 4:36 p.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Matt 
McCline.


Changes
---

Implemented several minor fixes and result updates.


Bugs: HIVE-18910
https://issues.apache.org/jira/browse/HIVE-18910


Repository: hive-git


Description
---

Hive uses JAVA hash which is not as good as murmur for better distribution and 
efficiency in bucketing a table.
Migrate to murmur hash but still keep backward compatibility for existing users 
so that they dont have to reload the existing tables.

To keep backward compatibility, bucket_version is added as a table property, 
resulting in high number of result updates.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 536c7b427f 
  hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
153613e6d0 
  hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
  hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 924e233293 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
 5dd0b8ea5b 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
 ad14c7265f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 3733e3d02f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
 03c28a33c8 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
 996329195c 
  
hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
 f9ee9d9a03 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
 caa00292b8 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
ab8ad77074 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
 2b28a6677e 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
 cdb67dd786 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
 2c23a7e94f 
  
itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
 a1be085ea5 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 82ba775286 
  itests/src/test/resources/testconfiguration.properties 3aaa68b11f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d4363fdf91 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5fbe045df5 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
 a42c299537 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java
 ddb26e529e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
 86f466fc4e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 4077552a56 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
 1bc3fdabac 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 71498a125c 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java fe109d7b96 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 49c355be01 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
7121bceb22 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
 5f65f638ca 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
2be3c9b9a2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 1c5656267d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
 0e995d79d2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
 69d9f3125a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
068f25e75f 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 7b1fd5f206 
  

[jira] [Created] (HIVE-19274) Add an OpTreeSignature persistence checker hook

2018-04-23 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-19274:
---

 Summary: Add an OpTreeSignature persistence checker hook
 Key: HIVE-19274
 URL: https://issues.apache.org/jira/browse/HIVE-19274
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


Adding a Hook to run during testing which checks that OpTreeSignatures are 
working as expected would be really usefull; it should run at least during the 
PerfCliDriver 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19273) Fix TestBeeLineWithArgs.testQueryProgressParallel

2018-04-23 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-19273:
---

 Summary: Fix TestBeeLineWithArgs.testQueryProgressParallel
 Key: HIVE-19273
 URL: https://issues.apache.org/jira/browse/HIVE-19273
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


seems to be failing from time-to-time:
https://builds.apache.org/job/PreCommit-HIVE-Build/10429/testReport/org.apache.hive.beeline/TestBeeLineWithArgs/testQueryProgressParallel/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66663: HIVE-19171 Persist runtime statistics in metastore

2018-04-23 Thread Zoltan Haindrich

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3/
---

(Updated April 23, 2018, 11:25 a.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

patch#03


Bugs: HIVE-19171
https://issues.apache.org/jira/browse/HIVE-19171


Repository: hive-git


Description
---

*


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 536c7b427f 
  
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
 801de7aca2 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 88022be9b9 
  metastore/scripts/upgrade/derby/056-HIVE-19171.derby.sql PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/signature/OpSignature.java 
e87bbceb7a 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/signature/OpTreeSignature.java 
c3dc848a32 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/signature/OpTreeSignatureFactory.java
 3df5ee946e 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/signature/RuntimeStatsMap.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/signature/RuntimeStatsPersister.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/signature/SignatureUtils.java 
4f3e3384a9 
  ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java e15a49f838 
  ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java a61a47e390 
  ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java e7ca7f617c 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java 54b705db6e 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/CachingStatsSource.java 
c51527621f 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/EmptyStatsSource.java 
19df13a843 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/MetastoreStatsConnector.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java a37280407d 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SimpleRuntimeStatsSource.java 
3d6c257026 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java 
a4e33c3804 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java 409cc7312c 
  ql/src/java/org/apache/hadoop/hive/ql/stats/OperatorStats.java 52e18a8030 
  
ql/src/test/org/apache/hadoop/hive/ql/optimizer/signature/TestRuntimeStatsPersistence.java
 PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/plan/mapping/TestCounterMapping.java 
81269702de 
  ql/src/test/org/apache/hadoop/hive/ql/plan/mapping/TestReOptimization.java 
b7263005ed 
  service/src/java/org/apache/hive/service/server/HiveServer2.java 16423578d5 
  standalone-metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 802d8e3fb2 
  standalone-metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 
dfa13a0614 
  
standalone-metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp
 c0a39f80e0 
  standalone-metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 2c95007daa 
  standalone-metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 
99024279c5 
  
standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetRuntimeStatsRequest.java
 PRE-CREATION 
  
standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/RuntimeStat.java
 PRE-CREATION 
  
standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
 a354f27cad 
  standalone-metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php 
9c949429c5 
  standalone-metastore/src/gen/thrift/gen-php/metastore/Types.php c4969d567f 
  
standalone-metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote
 079c7fc322 
  
standalone-metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
 d241414bc3 
  standalone-metastore/src/gen/thrift/gen-py/hive_metastore/ttypes.py 
9bf9843314 
  standalone-metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb 3dbe4d8068 
  standalone-metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 
58ebd29523 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 cd50e1b0c7 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 feae991bb3 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
 27f8775a10 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
 125d5a79f2 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
 f6c46ee7bd 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RuntimeStatsCleanerTask.java
 PRE-CREATION 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
 ebdcbc237e 
  

[jira] [Created] (HIVE-19272) There should be a feature to control the .hivehistory (Hive CLI) command history count limit on user's wish

2018-04-23 Thread vaibhav (JIRA)
vaibhav created HIVE-19272:
--

 Summary: There should be a feature to control the .hivehistory 
(Hive CLI) command history count limit on user's wish
 Key: HIVE-19272
 URL: https://issues.apache.org/jira/browse/HIVE-19272
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 2.4.0
Reporter: vaibhav


By default - Hive saves the last 100,00 lines of commands lines into a file 
$HOME/.hivehistory source:

refe link :
[https://stackoverflow.com/questions/32126506/hive-command-line-cli-history]

==> But the user want to increase this number on his own wish to manage his 
command history .

There should be a parameter setting which this case be controlled .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 64632: HIVE-18247: Use DB auto-increment for indexes

2018-04-23 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64632/#review201724
---



Hi Sasha,

I have a few questions:
- Is this changing the table definition? Do we need alter scripts to upgrade 
old tables to the new one?
- I have read this on the datanucleus site:
-- "This generation strategy should only be used if there is a single "root" 
table for the inheritance tree. If you have more than 1 root table (e.g using 
subclass-table inheritance) then you should choose a different generation 
strategy" - I do not think we are affected
-- "Please note that if using optimistic transactions, this strategy will mean 
that the value is only set when the object is actually persisted (i.e at 
flush() or commit())" - this might be more interesting - is there a way to 
check if we are affected, or we just hope the tests are covering all the 
scenarios?
- Does this have a measurable performance impact?
- Do you know a good way to test this kind of changes on multiple backend 
databases?

Thanks,
Peter

- Peter Vary


On Dec. 15, 2017, 8:33 a.m., Alexander Kolbasov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64632/
> ---
> 
> (Updated Dec. 15, 2017, 8:33 a.m.)
> 
> 
> Review request for hive, Aihua Xu, Andrew Sherman, Janaki Lahorani, Sergio 
> Pena, and Sahil Takiar.
> 
> 
> Bugs: HIVE-18247
> https://issues.apache.org/jira/browse/HIVE-18247
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-18247: Use DB auto-increment for indexes
> 
> 
> Diffs
> -
> 
>   standalone-metastore/src/main/resources/package.jdo 
> 57e75f890dbbd2d5105614aaeac04ef37131e8cd 
> 
> 
> Diff: https://reviews.apache.org/r/64632/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Alexander Kolbasov
> 
>



Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-04-23 Thread Peter Vary via Review Board


> On March 27, 2018, 12:53 p.m., Peter Vary wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java
> > Line 712 (original), 712 (patched)
> > 
> >
> > Is it possible to behave differently, when we have information about 
> > the number of rows, and when we do not know anything? The returned number 
> > will be 0 in this case, which might cause interesting behavior I guess :)
> 
> Bharathkrishna Guruvayoor Murali wrote:
> This is called only when resultSet is not present (like in insert 
> queries). If it returns zero, it will display No rows affected, which is like 
> the current behavior.

So the code handles -1, and 0 in the same way?
Previously we returned -1 indicating we do not have info about the affected 
rows. No we always will return 0 if we do not have the exact info, like when 
running HoS


> On March 27, 2018, 12:53 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java
> > Lines 2213 (patched)
> > 
> >
> > I think this information is only available when the execution engine is 
> > MR. Do we have information available when using Spark as an execution 
> > enginge?
> 
> Bharathkrishna Guruvayoor Murali wrote:
> Currently in this JIRA, I am focussing on updating the count for MR as 
> execution engine. I will create follow-up JIRA to have this working in Spark 
> as well, once I have this fix in place.

Ok. Do it on a followup jira..


- Peter


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/#review200042
---


On April 18, 2018, 11:53 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66290/
> ---
> 
> (Updated April 18, 2018, 11:53 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-14388
> https://issues.apache.org/jira/browse/HIVE-14388
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, when you run insert command on beeline, it returns a message 
> saying "No rows affected .."
> A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"
> 
> Added the numRows parameter as part of QueryState.
> Adding the numRows to the response as well to display in beeline.
> 
> Getting the count in FileSinkOperator and setting it in statsMap, when it 
> operates only on table specific rows for the particular operation. (so that 
> we can get only the insert to table count and avoid counting non-table 
> specific file-sink operations happening during query execution).
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> 06542cee02e5dc4696f2621bb45cc4f24c67dfda 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> a88453c97835db847d74b4b4c3ef318d4d6c0ce5 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
> cf9c2273159c0d779ea90ad029613678fb0967a6 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
> 706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
> c084fa054cb771bfdb033d244935713e3c7eb874 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> fcdc9967f12a454a9d3f31031e2261f264479118 
>   service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
> 4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
> b2b62c71492b844f4439367364c5c81aa62f3908 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
>  15e8220eb3eb12b72c7b64029410dced33bc0d72 
>   service-rpc/src/gen/thrift/gen-php/Types.php 
> abb7c1ff3a2c8b72dc97689758266b675880e32b 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
> 0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
> 60183dae9e9927bd09a9676e49eeb4aea2401737 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 
> c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
>   service/src/java/org/apache/hive/service/cli/OperationStatus.java 
> 52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> c64c99120ad21ee98af81ec6659a2722e3e1d1c7 
> 
> 
> Diff: https://reviews.apache.org/r/66290/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor 

Re: Review Request 66663: HIVE-19171 Persist runtime statistics in metastore

2018-04-23 Thread Zoltan Haindrich


> On April 21, 2018, 5:59 a.m., Ashutosh Chauhan wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Lines 4273 (patched)
> > 
> >
> > Better name : hive.runtime.stats.batch.size
> > Also lets use the default value of 100_000
> > This can be in MetastoreConf since it affects MS client and server.

renamed...I've also noticed it :)
I wouldn't want to set it a default since it may only be usefull if loading of 
the runtime stats becomes problematic.


> On April 21, 2018, 5:59 a.m., Ashutosh Chauhan wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Line 4276 (original)
> > 
> >
> > I see that you have moved it to Metastoreconf. But I think it belongs 
> > to hiveconf. Since this cache is in HS2 process not in metastore process.

okay; I may add it, but I don't see any case in which someone wants to set the 
2 parameters to different values - so I wanted to make it easier for the user 
to have just 1 setting;


> On April 21, 2018, 5:59 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/MetastoreStatsConnector.java
> > Lines 68 (patched)
> > 
> >
> > Better name: StatsLoader.
> > Also instead runnable inside thread, using Future will be easier to 
> > read as well as efficient.

I've renamed it to RuntimeStatsLoader; about using future; I'm not sure - but 
anyway this update will happen only once during the hs2 lifetime; so 
performance might not be that crucial


> On April 21, 2018, 5:59 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/MetastoreStatsConnector.java
> > Lines 75 (patched)
> > 
> >
> > Since our cache is of limited size. We should also pass in limit. No 
> > point in retrieving more than 100_000 items if we are not gonna load them.

this batch size is here if there are problems with loading the data; the 
default -1 means: no limit, load all with a single request
I don't think that will cause any problems; but in case it does ; the user may 
set it to a smaller value to initiate multi request loading


> On April 21, 2018, 5:59 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/MetastoreStatsConnector.java
> > Lines 78 (patched)
> > 
> >
> > This create time is not used after its loaded in cache. Whats the 
> > reason for doing this max() ?
> > This line can be removed.

createtime serves as the offset for the next batched request; or it would get 
back the same resultset
moved createTime to be a filed of the loader;
I've added it as a field in the main class because I've thinked about that in 
the future if more than 1 hs2 is working - it might be needed to load new stat 
infos thru the metastore...but that might not be needed - at least right now...


> On April 21, 2018, 5:59 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/MetastoreStatsConnector.java
> > Lines 108 (patched)
> > 
> >
> > Better name: StatsPersister.

this is a non-static internal class; so it's full name is:
MetastoreStatsConnector$Submitter
because I've used the persist keyword in naming the String2Object converter 
class; I've renamed this to  RuntimeStatsSubmitter


> On April 21, 2018, 5:59 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/MetastoreStatsConnector.java
> > Lines 130 (patched)
> > 
> >
> > Should size be part of thrift api? Server can do size() and use that on 
> > other side. Seems redundant info in api.

I'm not exposing the transferred object at the thrift level; this information 
is opaque at thrift and also in the metastore.
so its weight should be communicated by the writer.


> On April 21, 2018, 5:59 a.m., Ashutosh Chauhan wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
> > Lines 3253 (patched)
> > 
> >
> > Caller passes in batch size, but we use that as maxCount. We need both. 
> > We want to fetch only maxCount entries but in batches of batch_size.

I don't see why would the server need to know that it servers a batched client 
or not - I think offset + limit should be enough...


> On April 21, 2018, 5:59 a.m., Ashutosh Chauhan wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 11615 (patched)
> > 

Re: how to extract metadata of hive tables in speed

2018-04-23 Thread 侯宗田
Hi,

This really helps me a lot, and I will try it soon, 
Thank you very much!

Hou
> 在 2018年4月23日,下午3:57,Peter Vary  写道:
> 
> Hi,
> 
> Disclaimer: I am not too familiar with the webhcat yet.
> From the logs, I see, that:
> - the first 3 seconds spent on starting a new session, and maybe a driver - 
> this can be reduced, if the session is already there, and the HiveServer2 is 
> started (but do not know if webhcat could use HS2, or reuse sessions) - this 
> delay could be avoided if you use any of the 3 solutions suggested in my last 
> mail.
> - the next 3 seconds spent on initializing the metastore. This can be reduced 
> if a standalone metastore is started, and the webhcat is configured to access 
> this metastore.
> 
> Hope this helps,
> Peter
> 
>> On Apr 23, 2018, at 9:27 AM, 侯宗田  wrote:
>> 
>> Thank you very much for your reply, I am wondering whether I use the webhcat 
>> rightly, I don’t think it is normal to create all the directories and 
>> objects to get a table describ and take 8 seconds. The webhcat should not be 
>> so slow, Or it is because I forget to start some server which can respond 
>> immediately?   
>>> 在 2018年4月23日,下午3:06,Peter Vary  写道:
>>> 
>>> Hi,
>>> 
>>> Alexander Kolbasov has a project which might interest you (keeping in mind,
>>> that this is not production ready - more like a proof of concept):
>>> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md
>>> 
>>> Also you can use HMS thrift API directly to access the MetaStore, or if you
>>> can/want write java code, you can use HiveMetastoreClient class to do it in
>>> java.
>>> 
>>> I am not sure about the performance gains compared to HCat, but currently
>>> there are no faster interfaces for HMS that I know of.
>>> 
>>> Regards,
>>> Peter
>>> 
>>> 
>>> 侯宗田  ezt írta (időpont: 2018. ápr. 23., Hét 2:40):
>>> 
 Can anyone give me some suggestions? I have been stuck in this problem for
 several days. Need help!!
> 在 2018年4月22日,下午9:38,侯宗田  写道:
> 
> 
> Hi,
> 
> I am writing a application which needs the metastore about hive tables.
 I have used webhcat to get the information about tables and process them.
 But a simple request takes over eight seconds to respond on localhost. Why
 is this so slow, and how can I fix it or is there other way I can extract
 the metadata in C?
> 
> $ time curl -s '
 http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
 <
 http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
> '
> {"columns":
> [{"name":"id","type":"int"}],
> "database":"default",
> "table":"haha"}
> 
> real0m8.400s
> user0m0.053s
> sys 0m0.019s
> it seems to run a hcat.py, and it create a bunch of things then clear
 them, it takes very long time, does anyone have some ideas about it?? Any
 suggestions will be very appreciated!
> 
> $hcat.py -e "use default; desc haha; "
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
 [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
 [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
 http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
 file:/usr/local/hive/conf/hive-site.xml
> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
 native-hadoop library for your platform... using builtin-java classes where
 applicable
> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
 /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
> 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
 /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
 /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
> 18/04/21 16:38:16 INFO ql.Driver: Compiling
 command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
 use default
> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
 with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
 called
> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
 hive.metastore.integral.jdo.pushdown unknown - will be ignored
> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
 

Re: how to extract metadata of hive tables in speed

2018-04-23 Thread Peter Vary
Hi,

Disclaimer: I am not too familiar with the webhcat yet.
From the logs, I see, that:
- the first 3 seconds spent on starting a new session, and maybe a driver - 
this can be reduced, if the session is already there, and the HiveServer2 is 
started (but do not know if webhcat could use HS2, or reuse sessions) - this 
delay could be avoided if you use any of the 3 solutions suggested in my last 
mail.
- the next 3 seconds spent on initializing the metastore. This can be reduced 
if a standalone metastore is started, and the webhcat is configured to access 
this metastore.

Hope this helps,
Peter

> On Apr 23, 2018, at 9:27 AM, 侯宗田  wrote:
> 
> Thank you very much for your reply, I am wondering whether I use the webhcat 
> rightly, I don’t think it is normal to create all the directories and objects 
> to get a table describ and take 8 seconds. The webhcat should not be so slow, 
> Or it is because I forget to start some server which can respond immediately? 
>   
>> 在 2018年4月23日,下午3:06,Peter Vary  写道:
>> 
>> Hi,
>> 
>> Alexander Kolbasov has a project which might interest you (keeping in mind,
>> that this is not production ready - more like a proof of concept):
>> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md
>> 
>> Also you can use HMS thrift API directly to access the MetaStore, or if you
>> can/want write java code, you can use HiveMetastoreClient class to do it in
>> java.
>> 
>> I am not sure about the performance gains compared to HCat, but currently
>> there are no faster interfaces for HMS that I know of.
>> 
>> Regards,
>> Peter
>> 
>> 
>> 侯宗田  ezt írta (időpont: 2018. ápr. 23., Hét 2:40):
>> 
>>> Can anyone give me some suggestions? I have been stuck in this problem for
>>> several days. Need help!!
 在 2018年4月22日,下午9:38,侯宗田  写道:
 
 
 Hi,
 
 I am writing a application which needs the metastore about hive tables.
>>> I have used webhcat to get the information about tables and process them.
>>> But a simple request takes over eight seconds to respond on localhost. Why
>>> is this so slow, and how can I fix it or is there other way I can extract
>>> the metadata in C?
 
 $ time curl -s '
>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>> <
>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
 '
 {"columns":
 [{"name":"id","type":"int"}],
 "database":"default",
 "table":"haha"}
 
 real0m8.400s
 user0m0.053s
 sys 0m0.019s
 it seems to run a hcat.py, and it create a bunch of things then clear
>>> them, it takes very long time, does anyone have some ideas about it?? Any
>>> suggestions will be very appreciated!
 
 $hcat.py -e "use default; desc haha; "
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in
>>> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in
>>> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
>>> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
>>> file:/usr/local/hive/conf/hive-site.xml
 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
>>> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
 18/04/21 16:38:16 INFO ql.Driver: Compiling
>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>> use default
 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
>>> called
 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>> datanucleus.cache.level2 unknown - will be ignored
 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
>>> pin classes with
>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
 

Re: how to extract metadata of hive tables in speed

2018-04-23 Thread 侯宗田
Thank you very much for your reply, I am wondering whether I use the webhcat 
rightly, I don’t think it is normal to create all the directories and objects 
to get a table describ and take 8 seconds. The webhcat should not be so slow, 
Or it is because I forget to start some server which can respond immediately?   
> 在 2018年4月23日,下午3:06,Peter Vary  写道:
> 
> Hi,
> 
> Alexander Kolbasov has a project which might interest you (keeping in mind,
> that this is not production ready - more like a proof of concept):
> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md
> 
> Also you can use HMS thrift API directly to access the MetaStore, or if you
> can/want write java code, you can use HiveMetastoreClient class to do it in
> java.
> 
> I am not sure about the performance gains compared to HCat, but currently
> there are no faster interfaces for HMS that I know of.
> 
> Regards,
> Peter
> 
> 
> 侯宗田  ezt írta (időpont: 2018. ápr. 23., Hét 2:40):
> 
>> Can anyone give me some suggestions? I have been stuck in this problem for
>> several days. Need help!!
>>> 在 2018年4月22日,下午9:38,侯宗田  写道:
>>> 
>>> 
>>> Hi,
>>> 
>>> I am writing a application which needs the metastore about hive tables.
>> I have used webhcat to get the information about tables and process them.
>> But a simple request takes over eight seconds to respond on localhost. Why
>> is this so slow, and how can I fix it or is there other way I can extract
>> the metadata in C?
>>> 
>>> $ time curl -s '
>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>> <
>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>> '
>>> {"columns":
>>> [{"name":"id","type":"int"}],
>>> "database":"default",
>>> "table":"haha"}
>>> 
>>> real0m8.400s
>>> user0m0.053s
>>> sys 0m0.019s
>>> it seems to run a hcat.py, and it create a bunch of things then clear
>> them, it takes very long time, does anyone have some ideas about it?? Any
>> suggestions will be very appreciated!
>>> 
>>> $hcat.py -e "use default; desc haha; "
>>> SLF4J: Class path contains multiple SLF4J bindings.
>>> SLF4J: Found binding in
>> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
>> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
>> file:/usr/local/hive/conf/hive-site.xml
>>> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>> 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
>> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
>>> 18/04/21 16:38:16 INFO ql.Driver: Compiling
>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>> use default
>>> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
>> called
>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>> datanucleus.cache.level2 unknown - will be ignored
>>> 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
>> pin classes with
>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>> underlying DB is MYSQL
>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in
>> metastore
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in
>> metastore
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in
>> admin role, since config is empty
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>> ip=unknown-ip-addr  cmd=get_all_functions
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>> 

Re: how to extract metadata of hive tables in speed

2018-04-23 Thread Peter Vary
Hi,

Alexander Kolbasov has a project which might interest you (keeping in mind,
that this is not production ready - more like a proof of concept):
https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md

Also you can use HMS thrift API directly to access the MetaStore, or if you
can/want write java code, you can use HiveMetastoreClient class to do it in
java.

I am not sure about the performance gains compared to HCat, but currently
there are no faster interfaces for HMS that I know of.

Regards,
Peter


侯宗田  ezt írta (időpont: 2018. ápr. 23., Hét 2:40):

> Can anyone give me some suggestions? I have been stuck in this problem for
> several days. Need help!!
> > 在 2018年4月22日,下午9:38,侯宗田  写道:
> >
> >
> > Hi,
> >
> > I am writing a application which needs the metastore about hive tables.
> I have used webhcat to get the information about tables and process them.
> But a simple request takes over eight seconds to respond on localhost. Why
> is this so slow, and how can I fix it or is there other way I can extract
> the metadata in C?
> >
> > $ time curl -s '
> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
> <
> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
> >'
> > {"columns":
> >  [{"name":"id","type":"int"}],
> >  "database":"default",
> >  "table":"haha"}
> >
> > real0m8.400s
> > user0m0.053s
> > sys 0m0.019s
> > it seems to run a hcat.py, and it create a bunch of things then clear
> them, it takes very long time, does anyone have some ideas about it?? Any
> suggestions will be very appreciated!
> >
> > $hcat.py -e "use default; desc haha; "
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
> > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> > 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
> file:/usr/local/hive/conf/hive-site.xml
> > 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> > 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
> > 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
> > 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
> > 18/04/21 16:38:16 INFO ql.Driver: Compiling
> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
> use default
> > 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
> > 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
> called
> > 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
> > 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
> datanucleus.cache.level2 unknown - will be ignored
> > 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
> pin classes with
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
> > 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
> underlying DB is MYSQL
> > 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in
> metastore
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in
> metastore
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in
> admin role, since config is empty
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
> > 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
> ip=unknown-ip-addr  cmd=get_all_functions
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
> > 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
> ip=unknown-ip-addr  cmd=get_database: default
> > 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
> > 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema:
> Schema(fieldSchemas:null, properties:null)
> > 18/04/21 16:38:20 INFO ql.Driver: Completed compiling
> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
> Time taken: 3.936 seconds
> > 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled,