[jira] [Comment Edited] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-05-26 Thread Oleksiy Sayankin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351720#comment-17351720
 ] 

Oleksiy Sayankin edited comment on HIVE-21075 at 5/26/21, 11:23 AM:


Created [PR|https://github.com/apache/hive/pull/2323].

Solution is simple:

{code}
if (isPostgres()){
// use SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1
} else {
// use select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)
}
{code}


was (Author: osayankin):
Created [PR|https://github.com/apache/hive/pull/2323].

Solution is simple:

{code}
if (isPostgres()){
// use SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1
} else
{
// use select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)
}
{code}

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-05-26 Thread Oleksiy Sayankin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351720#comment-17351720
 ] 

Oleksiy Sayankin edited comment on HIVE-21075 at 5/26/21, 11:22 AM:


Created [PR|https://github.com/apache/hive/pull/2323].

Solution is simple:

{code}
if (isPostgres()){
// use SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1
} else
{
// use select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)
}
{code}


was (Author: osayankin):
Created [PR|https://github.com/apache/hive/pull/2323].

Solution is simple:

{code}
if (isPostgres()){
// use SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1
} else
{
// select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)
}
{code}

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2019-01-01 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731601#comment-16731601
 ] 

Peter Vary edited comment on HIVE-21075 at 1/1/19 3:23 PM:
---

[~karthik.manamcheri], [~ychena] Sorry for my fast respons which I posted 
before reading the whole context. New years mailbox cleanup... maybe the only 
relevant part: Please keep this in mind when testing solutions. Seemingly small 
changes in CD related queries can multiply when having big number of existing 
partitions.


was (Author: pvary):
[~karthik.manamcheri], [~ychena] we had similar thoughts with [~szita] and 
found out that different queries provide optimal performance for different 
supported backend DBs. Please keep this in mind when testing solutions. 
Seemingly small changes in CD related queries can multiply when having big 
number of existing partitions.

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Priority: Major
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)