[ https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kent Yao reassigned SPARK-46747: -------------------------------- Assignee: Kent Yao > Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1 > -------------------------------------------------------------------------- > > Key: SPARK-46747 > URL: https://issues.apache.org/jira/browse/SPARK-46747 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, > 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, > 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4 > Reporter: Bala Bellam > Assignee: Kent Yao > Priority: Critical > Labels: pull-request-available > > +*Background:*+ > PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table > existence in the database by overriding the default > JdbcDialect.getTableExistsQuery which has WHERE 1 = 0. > +*Issue:*+ > Due to LIMIT 1 query pattern, we are seeing high number of shared locks in > the PostgreSQL installations where there are many partitions under a table > that's being written to. Hence resorting to the default JdbcDialect which > does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the > partitions and effectively checks for table existence. > The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain > scenarios, especially with partitioned tables or tables with a lot of data, > as it may take shared locks on all partitions or involve more planner and > execution time to determine the quickest way to get a single row. > On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read > any data due to the always false WHERE condition. This makes it a lighter > operation, as it typically only involves checking the table's metadata to > validate the table's existence without taking locks on the table's data or > partitions. > So, considering performance and minimizing locks, SELECT 1 FROM table WHERE > 1=0 would be a better choice if we're strictly looking to check for a table's > existence and want to avoid potentially heavier operations like taking shared > locks on partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org