[ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46747.
------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Issue resolved by pull request 44948
[https://github.com/apache/spark/pull/44948]

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --------------------------------------------------------------------------
>
>                 Key: SPARK-46747
>                 URL: https://issues.apache.org/jira/browse/SPARK-46747
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>            Reporter: Bala Bellam
>            Assignee: Kent Yao
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to