[jira] [Updated] (HIVE-26838) Adding support for a new event "Reload event" in the HMS

2024-05-21 Thread Manish Maheshwari (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari updated HIVE-26838:
-
Summary: Adding support for a new event "Reload event" in the HMS  (was: 
Add a new event to improve cache performance in external systems that 
communicates with HMS.)

> Adding support for a new event "Reload event" in the HMS
> 
>
> Key: HIVE-26838
> URL: https://issues.apache.org/jira/browse/HIVE-26838
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-beta-1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Adding support for a new event "Reload event" in the HMS (HiveMetaStore). 
> This event can be used by external services that depend on HMS for metadata 
> operations to improve its cache performance. In the distributed environment 
> where there are replicas of an external service (with its own cache in each 
> of these replicas) talking to HMS for metadata operations, the reload event 
> can be used to address the cache performance and ensure consistency among all 
> the replicas for a given table/partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28243) Implement user/group deny policy with kerberos auth

2024-05-02 Thread Manish Maheshwari (Jira)
Manish Maheshwari created HIVE-28243:


 Summary: Implement user/group deny policy with kerberos auth
 Key: HIVE-28243
 URL: https://issues.apache.org/jira/browse/HIVE-28243
 Project: Hive
  Issue Type: Improvement
Reporter: Manish Maheshwari


Any user can access HS2 and submit a query today if a user has access to any 
tables via ranger. Customers want to be able to block certain users and groups 
access to hs2 but its not possible today.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HIVE-24893) Download data from Thriftserver through JDBC

2024-05-02 Thread Manish Maheshwari (Jira)


[ https://issues.apache.org/jira/browse/HIVE-24893 ]


Manish Maheshwari deleted comment on HIVE-24893:
--

was (Author: mylogi...@gmail.com):
[~yumwang] - Can you please raise a PR for this feature. I think the community 
will benefit from this implemented.

> Download data from Thriftserver through JDBC
> 
>
> Key: HIVE-24893
> URL: https://issues.apache.org/jira/browse/HIVE-24893
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, JDBC
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> It is very useful to support downloading large amounts of data (such as more 
> than 50GB) through JDBC.
> Snowflake has similar support :
> https://docs.snowflake.com/en/user-guide/jdbc-using.html#label-jdbc-download-from-stage-to-stream
> https://github.com/snowflakedb/snowflake-jdbc/blob/95a7d8a03316093430dc3960df6635643208b6fd/src/main/java/net/snowflake/client/jdbc/SnowflakeConnectionV1.java#L886



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-24893) Download data from Thriftserver through JDBC

2024-05-02 Thread Manish Maheshwari (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842953#comment-17842953
 ] 

Manish Maheshwari commented on HIVE-24893:
--

[~yumwang] - Can you please raise a PR for this feature. I think the community 
will benefit from this implemented.

> Download data from Thriftserver through JDBC
> 
>
> Key: HIVE-24893
> URL: https://issues.apache.org/jira/browse/HIVE-24893
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, JDBC
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> It is very useful to support downloading large amounts of data (such as more 
> than 50GB) through JDBC.
> Snowflake has similar support :
> https://docs.snowflake.com/en/user-guide/jdbc-using.html#label-jdbc-download-from-stage-to-stream
> https://github.com/snowflakedb/snowflake-jdbc/blob/95a7d8a03316093430dc3960df6635643208b6fd/src/main/java/net/snowflake/client/jdbc/SnowflakeConnectionV1.java#L886



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26838) Add a new event to improve cache performance in external systems that communicates with HMS.

2024-03-06 Thread Manish Maheshwari (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari updated HIVE-26838:
-
Issue Type: Bug  (was: New Feature)

> Add a new event to improve cache performance in external systems that 
> communicates with HMS.
> 
>
> Key: HIVE-26838
> URL: https://issues.apache.org/jira/browse/HIVE-26838
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-beta-1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Adding support for a new event "Reload event" in the HMS (HiveMetaStore). 
> This event can be used by external services that depend on HMS for metadata 
> operations to improve its cache performance. In the distributed environment 
> where there are replicas of an external service (with its own cache in each 
> of these replicas) talking to HMS for metadata operations, the reload event 
> can be used to address the cache performance and ensure consistency among all 
> the replicas for a given table/partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-25695) Make spark views authorization in hive configurable.

2023-11-06 Thread Manish Maheshwari (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari updated HIVE-25695:
-
Description: 
HIVE-24026 introduced an authorization model where views created from external 
sources like spark are not authorized at create time, but when a user does 
select on the view. We need to make this authorization configurable. 

This Jira introduces a new config to make this auth model configurable.
{code:java}
hive.security.authorization.enabled.on.spark.views=true {code}
This config is turned on by default. If the users wish to turn off this config, 
then they can set this config to false, which means that during the select 
query, the underlying tables for that view will not be authorized.

The reason for making this auth model configurable is because there can be a 
use-case where a user is running workload of create/alter/select views without 
HIVE-24026 (with ranger/sentry policies in place where user have select 
permissions only on view but not on underlying tables) and when user upgrades 
to HIVE-24026, the admin will have to configure ranger/sentry policies on all 
the underlying tables for required users. By simply turning off this config, 
the user can do workload operations but at the cost of the security hole for 
not authorizing the underlying tables.

  was:
HIVE-24026 introduced an authorization model where views created from external 
sources like spark are not authorized at create time, but when a user does 
select on the view. We need to make this authorization configurable. 

This Jira introduces a new config to make this auth model configurable.

 
{code:java}
hive.security.authorization.enabled.on.spark.views=true {code}
This config is turned on by default. If the users wish to turn off this config, 
then they can set this config to false, which means that during the select 
query, the underlying tables for that view will not be authorized.

 

The reason for making this auth model configurable is because there can be a 
use-case where a user is running workload of create/alter/select views without 
HIVE-24026 (with ranger/sentry policies in place where user have select 
permissions only on view but not on underlying tables) and when user upgrades 
to HIVE-24026, the admin will have to configure ranger/sentry policies on all 
the underlying tables for required users. By simply turning off this config, 
the user can do workload operations but at the cost of the security hole for 
not authorizing the underlying tables.


> Make spark views authorization in hive configurable.
> 
>
> Key: HIVE-25695
> URL: https://issues.apache.org/jira/browse/HIVE-25695
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-24026 introduced an authorization model where views created from 
> external sources like spark are not authorized at create time, but when a 
> user does select on the view. We need to make this authorization 
> configurable. 
> This Jira introduces a new config to make this auth model configurable.
> {code:java}
> hive.security.authorization.enabled.on.spark.views=true {code}
> This config is turned on by default. If the users wish to turn off this 
> config, then they can set this config to false, which means that during the 
> select query, the underlying tables for that view will not be authorized.
> The reason for making this auth model configurable is because there can be a 
> use-case where a user is running workload of create/alter/select views 
> without HIVE-24026 (with ranger/sentry policies in place where user have 
> select permissions only on view but not on underlying tables) and when user 
> upgrades to HIVE-24026, the admin will have to configure ranger/sentry 
> policies on all the underlying tables for required users. By simply turning 
> off this config, the user can do workload operations but at the cost of the 
> security hole for not authorizing the underlying tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27787) Explain Plan in Hive should show partition and file pruning for Iceberg tables

2023-10-11 Thread Manish Maheshwari (Jira)
Manish Maheshwari created HIVE-27787:


 Summary: Explain Plan in Hive should show partition and file 
pruning for Iceberg tables
 Key: HIVE-27787
 URL: https://issues.apache.org/jira/browse/HIVE-27787
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Manish Maheshwari
 Attachments: image-2023-10-11-05-35-58-773.png

Hive explain plan for iceberg tables does not show partition and file pruning 
but the query actually uses iceberg api to prune files to be scanned. 

!image-2023-10-11-05-35-58-773.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27787) Explain Plan in Hive should show partition and file pruning for Iceberg tables

2023-10-11 Thread Manish Maheshwari (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari updated HIVE-27787:
-
Labels: iceberg  (was: )

> Explain Plan in Hive should show partition and file pruning for Iceberg tables
> --
>
> Key: HIVE-27787
> URL: https://issues.apache.org/jira/browse/HIVE-27787
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Manish Maheshwari
>Priority: Major
>  Labels: iceberg
> Attachments: image-2023-10-11-05-35-58-773.png
>
>
> Hive explain plan for iceberg tables does not show partition and file pruning 
> but the query actually uses iceberg api to prune files to be scanned. 
> !image-2023-10-11-05-35-58-773.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27568) Implement RegisterTableProcedure for Iceberg Tables

2023-08-07 Thread Manish Maheshwari (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari updated HIVE-27568:
-
Labels: iceberg  (was: )

> Implement RegisterTableProcedure for Iceberg Tables
> ---
>
> Key: HIVE-27568
> URL: https://issues.apache.org/jira/browse/HIVE-27568
> Project: Hive
>  Issue Type: Improvement
>Reporter: Manish Maheshwari
>Priority: Major
>  Labels: iceberg
>
> Implement RegisterTableProcedure for registering exising iceberg tables into 
> the catalog
> [https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/RegisterTableProcedure.java]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27568) Implement RegisterTableProcedure for Iceberg Tables

2023-08-07 Thread Manish Maheshwari (Jira)
Manish Maheshwari created HIVE-27568:


 Summary: Implement RegisterTableProcedure for Iceberg Tables
 Key: HIVE-27568
 URL: https://issues.apache.org/jira/browse/HIVE-27568
 Project: Hive
  Issue Type: Improvement
Reporter: Manish Maheshwari


Implement RegisterTableProcedure for registering exising iceberg tables into 
the catalog

[https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/RegisterTableProcedure.java]
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-24339) REPL LOAD command ignores config properties set by WITH clause

2021-01-06 Thread Manish Maheshwari (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260197#comment-17260197
 ] 

Manish Maheshwari commented on HIVE-24339:
--

[~anishek] Fyi 

> REPL LOAD command ignores config properties set by WITH clause
> --
>
> Key: HIVE-24339
> URL: https://issues.apache.org/jira/browse/HIVE-24339
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Major
>
> By debug messages we confirmed that REPL LOAD command ignored some config 
> properties when they were provided in WITH clause, e.g.:
> {code}
> REPL LOAD bdpp01pub FROM 
> 'hdfs://prdpdp01//apps/hive/repl/8237c7bd-ba26-4425-8659-3a0d32ab312c' WITH 
> ('mapreduce.job.queuename'='default','hive.exec.parallel'='true','hive.exec.parallel.thread.number'='128',
> ...
> {code}
> We found that it was working on 16 threads, ignoring 
> 'hive.exec.parallel.thread.number'='128'. Setting this property on session 
> level worked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24313) Optimise stats collection for file sizes on cloud storage

2020-10-27 Thread Manish Maheshwari (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221306#comment-17221306
 ] 

Manish Maheshwari commented on HIVE-24313:
--

Also it would be good to persist the stats collected into HMS to ensure that 
they can be used for subsequent queries.

> Optimise stats collection for file sizes on cloud storage
> -
>
> Key: HIVE-24313
> URL: https://issues.apache.org/jira/browse/HIVE-24313
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>
> When stats information is not present (e.g external table), RelOptHiveTable 
> computes basic stats at runtime.
> Following is the codepath.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L598]
> {code:java}
> Statistics stats = StatsUtils.collectStatistics(hiveConf, partitionList,
> hiveTblMetadata, hiveNonPartitionCols, 
> nonPartColNamesThatRqrStats, colStatsCached,
> nonPartColNamesThatRqrStats, true);
>  {code}
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L322]
> {code:java}
> for (Partition p : partList.getNotDeniedPartns()) {
> BasicStats basicStats = 
> basicStatsFactory.build(Partish.buildFor(table, p));
> partStats.add(basicStats);
>   }
>  {code}
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStats.java#L205]
>  
> {code:java}
> try {
> ds = getFileSizeForPath(path);
>   } catch (IOException e) {
> ds = 0L;
>   }
>  {code}
>  
> For a table & query with large number of partitions, this takes long time to 
> compute statistics and increases compilation time.  It would be good to fix 
> it with "ForkJoinPool" ( 
> partList.getNotDeniedPartns().parallelStream().forEach((p) )
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)