[jira] [Resolved] (SPARK-46753) Fix `pypy3` python test
[ https://issues.apache.org/jira/browse/SPARK-46753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46753. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44778 [https://github.com/apache/spark/pull/44778] > Fix `pypy3` python test > --- > > Key: SPARK-46753 > URL: https://issues.apache.org/jira/browse/SPARK-46753 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46753) Fix `pypy3` python test
[ https://issues.apache.org/jira/browse/SPARK-46753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46753: Assignee: BingKun Pan > Fix `pypy3` python test > --- > > Key: SPARK-46753 > URL: https://issues.apache.org/jira/browse/SPARK-46753 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46926) Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list
[ https://issues.apache.org/jira/browse/SPARK-46926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-46926: - Assignee: Ruifeng Zheng > Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list > - > > Key: SPARK-46926 > URL: https://issues.apache.org/jira/browse/SPARK-46926 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46926) Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list
[ https://issues.apache.org/jira/browse/SPARK-46926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-46926. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44965 [https://github.com/apache/spark/pull/44965] > Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list > - > > Key: SPARK-46926 > URL: https://issues.apache.org/jira/browse/SPARK-46926 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46931) Implement {Frame, Series}.to_hdf
Ruifeng Zheng created SPARK-46931: - Summary: Implement {Frame, Series}.to_hdf Key: SPARK-46931 URL: https://issues.apache.org/jira/browse/SPARK-46931 Project: Spark Issue Type: Sub-task Components: PS Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46926) Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list
[ https://issues.apache.org/jira/browse/SPARK-46926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-46926: -- Summary: Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list (was: Support `convert_dtypes`, `infer_objects` and `set_axis` in fallback mode) > Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list > - > > Key: SPARK-46926 > URL: https://issues.apache.org/jira/browse/SPARK-46926 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46926) Support `convert_dtypes`, `infer_objects` and `set_axis` in fallback mode
[ https://issues.apache.org/jira/browse/SPARK-46926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-46926: -- Summary: Support `convert_dtypes`, `infer_objects` and `set_axis` in fallback mode (was: Support `convert_dtypes` and `infer_objects` in fallback mode) > Support `convert_dtypes`, `infer_objects` and `set_axis` in fallback mode > - > > Key: SPARK-46926 > URL: https://issues.apache.org/jira/browse/SPARK-46926 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46930) Add support for a custom prefix for fields of Avro union type when enableStableIdentifiersForUnionType is enabled
[ https://issues.apache.org/jira/browse/SPARK-46930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-46930: - Description: {{enableStableIdentifiersForUnionType}} allows to enable stable identifiers in Avro. We need to add another config to allow configuring the custom prefix used in stable identifiers. Currently the value is {{{}member_{}}}, e.g. member_int, member_string, but we should be able to change to any value or even leave empty. > Add support for a custom prefix for fields of Avro union type when > enableStableIdentifiersForUnionType is enabled > -- > > Key: SPARK-46930 > URL: https://issues.apache.org/jira/browse/SPARK-46930 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ivan Sadikov >Priority: Major > > {{enableStableIdentifiersForUnionType}} allows to enable stable identifiers > in Avro. We need to add another config to allow configuring the custom prefix > used in stable identifiers. Currently the value is {{{}member_{}}}, e.g. > member_int, member_string, but we should be able to change to any value or > even leave empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46930) Add support for a custom prefix for fields of Avro union type when enableStableIdentifiersForUnionType is enabled
Ivan Sadikov created SPARK-46930: Summary: Add support for a custom prefix for fields of Avro union type when enableStableIdentifiersForUnionType is enabled Key: SPARK-46930 URL: https://issues.apache.org/jira/browse/SPARK-46930 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Ivan Sadikov -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46929) Use ThreadUtils.shutdown to close thread pools
[ https://issues.apache.org/jira/browse/SPARK-46929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46929: --- Component/s: Connect Spark Core SS (was: SQL) > Use ThreadUtils.shutdown to close thread pools > -- > > Key: SPARK-46929 > URL: https://issues.apache.org/jira/browse/SPARK-46929 > Project: Spark > Issue Type: Improvement > Components: Connect, Spark Core, SS >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
[ https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-46747: - Fix Version/s: 3.5.1 3.4.3 > Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1 > -- > > Key: SPARK-46747 > URL: https://issues.apache.org/jira/browse/SPARK-46747 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, > 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, > 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4 >Reporter: Bala Bellam >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.1, 3.4.3 > > > +*Background:*+ > PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table > existence in the database by overriding the default > JdbcDialect.getTableExistsQuery which has WHERE 1 = 0. > +*Issue:*+ > Due to LIMIT 1 query pattern, we are seeing high number of shared locks in > the PostgreSQL installations where there are many partitions under a table > that's being written to. Hence resorting to the default JdbcDialect which > does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the > partitions and effectively checks for table existence. > The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain > scenarios, especially with partitioned tables or tables with a lot of data, > as it may take shared locks on all partitions or involve more planner and > execution time to determine the quickest way to get a single row. > On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read > any data due to the always false WHERE condition. This makes it a lighter > operation, as it typically only involves checking the table's metadata to > validate the table's existence without taking locks on the table's data or > partitions. > So, considering performance and minimizing locks, SELECT 1 FROM table WHERE > 1=0 would be a better choice if we're strictly looking to check for a table's > existence and want to avoid potentially heavier operations like taking shared > locks on partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46924) Fix `Load New` button in `Master/HistoryServer` Log UI
[ https://issues.apache.org/jira/browse/SPARK-46924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-46924. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44957 [https://github.com/apache/spark/pull/44957] > Fix `Load New` button in `Master/HistoryServer` Log UI > -- > > Key: SPARK-46924 > URL: https://issues.apache.org/jira/browse/SPARK-46924 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46927) Make `assertDataFrameEqual` work properly without PyArrow
[ https://issues.apache.org/jira/browse/SPARK-46927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46927. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44960 [https://github.com/apache/spark/pull/44960] > Make `assertDataFrameEqual` work properly without PyArrow > - > > Key: SPARK-46927 > URL: https://issues.apache.org/jira/browse/SPARK-46927 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46929) Use ThreadUtils.shutdown to close thread pools
Jiaan Geng created SPARK-46929: -- Summary: Use ThreadUtils.shutdown to close thread pools Key: SPARK-46929 URL: https://issues.apache.org/jira/browse/SPARK-46929 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46927) Make `assertDataFrameEqual` work properly without PyArrow
[ https://issues.apache.org/jira/browse/SPARK-46927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46927: Assignee: Haejoon Lee > Make `assertDataFrameEqual` work properly without PyArrow > - > > Key: SPARK-46927 > URL: https://issues.apache.org/jira/browse/SPARK-46927 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46927) Make `assertDataFrameEqual` work properly without PyArrow
Haejoon Lee created SPARK-46927: --- Summary: Make `assertDataFrameEqual` work properly without PyArrow Key: SPARK-46927 URL: https://issues.apache.org/jira/browse/SPARK-46927 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46928) Support ListState in Arbitrary State API v2
Bhuwan Sahni created SPARK-46928: Summary: Support ListState in Arbitrary State API v2 Key: SPARK-46928 URL: https://issues.apache.org/jira/browse/SPARK-46928 Project: Spark Issue Type: New Feature Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Bhuwan Sahni As part of Arbitrary State API v2 ([https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig),] we need to support ListState. This task encounters adding support for ListState in Scala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46926) Support `convert_dtypes` and `infer_objects` in fallback mode
Ruifeng Zheng created SPARK-46926: - Summary: Support `convert_dtypes` and `infer_objects` in fallback mode Key: SPARK-46926 URL: https://issues.apache.org/jira/browse/SPARK-46926 Project: Spark Issue Type: Sub-task Components: PS Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46925) Add a warning that instructs to install memory_profiler for memory profiling
[ https://issues.apache.org/jira/browse/SPARK-46925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46925: Assignee: Xinrong Meng > Add a warning that instructs to install memory_profiler for memory profiling > > > Key: SPARK-46925 > URL: https://issues.apache.org/jira/browse/SPARK-46925 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Add a warning that instructs to install memory_profiler for memory profiling -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46925) Add a warning that instructs to install memory_profiler for memory profiling
[ https://issues.apache.org/jira/browse/SPARK-46925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46925. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44958 [https://github.com/apache/spark/pull/44958] > Add a warning that instructs to install memory_profiler for memory profiling > > > Key: SPARK-46925 > URL: https://issues.apache.org/jira/browse/SPARK-46925 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 4.0.0 > > > Add a warning that instructs to install memory_profiler for memory profiling -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46192) failed to insert the table using the default value of union
[ https://issues.apache.org/jira/browse/SPARK-46192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zengxl resolved SPARK-46192. Resolution: Fixed > failed to insert the table using the default value of union > --- > > Key: SPARK-46192 > URL: https://issues.apache.org/jira/browse/SPARK-46192 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.4.1 >Reporter: zengxl >Priority: Major > > > Obtain the following tables and data > {code:java} > create table test_spark(k string default null,v int default null) stored as > orc; > create table test_spark_1(k string default null,v int default null) stored as > orc; > insert into table test_spark_1 values('k1',1),('k2',2),('k3',3); > create table test_spark_2(k string default null,v int default null) stored as > orc; > insert into table test_spark_2 values('k3',3),('k4',4),('k5',5); > {code} > Execute the following SQL > {code:java} > insert into table test_spark (k) > select k from test_spark_1 > union > select k from test_spark_2 > {code} > exception: > {code:java} > 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is > CatalogAndIdentifier > 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is > CatalogAndIdentifier > 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is > CatalogAndIdentifier > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: > i.userSpecifiedCols.size is 1 > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: > i.userSpecifiedCols.size is 1 > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 > ,resolved :1 , i.query 1 > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: here is > ResolveUserSpecifiedColumns tableOutoyt: 2---nameToQueryExpr : 1Error in > query: `default`.`test_spark` requires that the data to be inserted have the > same number of columns as the target table: target table has 2 column(s) but > the inserted data has 1 column(s), including 0 partition column(s) having > constant value(s). {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46192) failed to insert the table using the default value of union
[ https://issues.apache.org/jira/browse/SPARK-46192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812518#comment-17812518 ] zengxl commented on SPARK-46192: This patch solves all of these problems https://issues.apache.org/jira/browse/SPARK-43742 > failed to insert the table using the default value of union > --- > > Key: SPARK-46192 > URL: https://issues.apache.org/jira/browse/SPARK-46192 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.4.1 >Reporter: zengxl >Priority: Major > > > Obtain the following tables and data > {code:java} > create table test_spark(k string default null,v int default null) stored as > orc; > create table test_spark_1(k string default null,v int default null) stored as > orc; > insert into table test_spark_1 values('k1',1),('k2',2),('k3',3); > create table test_spark_2(k string default null,v int default null) stored as > orc; > insert into table test_spark_2 values('k3',3),('k4',4),('k5',5); > {code} > Execute the following SQL > {code:java} > insert into table test_spark (k) > select k from test_spark_1 > union > select k from test_spark_2 > {code} > exception: > {code:java} > 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is > CatalogAndIdentifier > 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is > CatalogAndIdentifier > 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is > CatalogAndIdentifier > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: > i.userSpecifiedCols.size is 1 > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: > i.userSpecifiedCols.size is 1 > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 > ,resolved :1 , i.query 1 > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: here is > ResolveUserSpecifiedColumns tableOutoyt: 2---nameToQueryExpr : 1Error in > query: `default`.`test_spark` requires that the data to be inserted have the > same number of columns as the target table: target table has 2 column(s) but > the inserted data has 1 column(s), including 0 partition column(s) having > constant value(s). {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46923) Limit width of config tables in documentation and style them consistently
[ https://issues.apache.org/jira/browse/SPARK-46923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46923. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44955 [https://github.com/apache/spark/pull/44955] > Limit width of config tables in documentation and style them consistently > - > > Key: SPARK-46923 > URL: https://issues.apache.org/jira/browse/SPARK-46923 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46923) Limit width of config tables in documentation and style them consistently
[ https://issues.apache.org/jira/browse/SPARK-46923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46923: Assignee: Nicholas Chammas > Limit width of config tables in documentation and style them consistently > - > > Key: SPARK-46923 > URL: https://issues.apache.org/jira/browse/SPARK-46923 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46925) Add a warning that instructs to install memory_profiler for memory profiling
Xinrong Meng created SPARK-46925: Summary: Add a warning that instructs to install memory_profiler for memory profiling Key: SPARK-46925 URL: https://issues.apache.org/jira/browse/SPARK-46925 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Add a warning that instructs to install memory_profiler for memory profiling -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46924) Fix `Load New` button in `Master/HistoryServer` Log UI
[ https://issues.apache.org/jira/browse/SPARK-46924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46924: - Assignee: Dongjoon Hyun > Fix `Load New` button in `Master/HistoryServer` Log UI > -- > > Key: SPARK-46924 > URL: https://issues.apache.org/jira/browse/SPARK-46924 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46924) Fix `Load New` button in `Master/HistoryServer` Log UI
[ https://issues.apache.org/jira/browse/SPARK-46924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46924: -- Summary: Fix `Load New` button in `Master/HistoryServer` Log UI (was: Fix `Load New` button in `Master/HistoryServer` UI) > Fix `Load New` button in `Master/HistoryServer` Log UI > -- > > Key: SPARK-46924 > URL: https://issues.apache.org/jira/browse/SPARK-46924 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46924) Fix `Load New` button in `Master/HistoryServer` UI
Dongjoon Hyun created SPARK-46924: - Summary: Fix `Load New` button in `Master/HistoryServer` UI Key: SPARK-46924 URL: https://issues.apache.org/jira/browse/SPARK-46924 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46923) Limit width of config tables in documentation and style them consistently
[ https://issues.apache.org/jira/browse/SPARK-46923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-46923: - Summary: Limit width of config tables in documentation and style them consistently (was: Style config tables in documentation consistently) > Limit width of config tables in documentation and style them consistently > - > > Key: SPARK-46923 > URL: https://issues.apache.org/jira/browse/SPARK-46923 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46830) Introducing collation concept into Spark
[ https://issues.apache.org/jira/browse/SPARK-46830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandar Tomic updated SPARK-46830: - Description: This feature will introduce collation support to the Spark engine. This means that: # Every StringType will have an associated collation. Default remains UTF8 Binary, which will behave under the same rules as current UTF8 String comparison. # Collation will be respected in all collation sensitive operations - comparisons, hashing, string operations (contains, startWith, endsWith etc.) # Collation can be set through following ways: ## COLLATE expression. e.g. strExpr COLLATE collation_name ## In CREATE TABLE column definition ## By setting session collation. # All the Spark operators need to respect collation settings (filters, joins, shuffles, aggs etc.) This is a high level description of the feature. You can find detailed design under [this|https://docs.google.com/document/d/1A9RQiwq-n3R3vuh571yjOLaaIuIYRTyCx7UFr0Qg-eY/edit?usp=sharing] link (doc is in attachment as well). was: This feature will introduce collation support to the Spark engine. This means that: # Every StringType will have an associated collation. Default remains UTF8 Binary, which will behave under the same rules as current UTF8 String comparison. # Collation will be respected in all collation sensitive operations - comparisons, hashing, string operations (contains, startWith, endsWith etc.) # Collation can be set through following ways: ## COLLATE expression. e.g. strExpr COLLATE collation_name ## In CREATE TABLE column definition ## By setting session collation. # All the Spark operators need to respect collation settings (filters, joins, shuffles, aggs etc.) This is a high level description of the feature. You can find detailed design under [this|https://docs.google.com/document/d/1G3Xap-0Aj-QC6qoWZDDqO84IulHnogjD1REE3yh1_jk/edit?usp=sharing] link (doc is in attachment as well). > Introducing collation concept into Spark > > > Key: SPARK-46830 > URL: https://issues.apache.org/jira/browse/SPARK-46830 > Project: Spark > Issue Type: Epic > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Priority: Major > Attachments: Collation Support in Spark.docx > > > This feature will introduce collation support to the Spark engine. This means > that: > > # Every StringType will have an associated collation. Default remains UTF8 > Binary, which will behave under the same rules as current UTF8 String > comparison. > # Collation will be respected in all collation sensitive operations - > comparisons, hashing, string operations (contains, startWith, endsWith etc.) > # Collation can be set through following ways: > ## COLLATE expression. e.g. strExpr COLLATE collation_name > ## In CREATE TABLE column definition > ## By setting session collation. > # All the Spark operators need to respect collation settings (filters, > joins, shuffles, aggs etc.) > > This is a high level description of the feature. You can find detailed design > under > [this|https://docs.google.com/document/d/1A9RQiwq-n3R3vuh571yjOLaaIuIYRTyCx7UFr0Qg-eY/edit?usp=sharing] > link (doc is in attachment as well). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46911) Add deleteIfExists operator to StatefulProcessorHandle
[ https://issues.apache.org/jira/browse/SPARK-46911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46911: --- Labels: pull-request-available (was: ) > Add deleteIfExists operator to StatefulProcessorHandle > -- > > Key: SPARK-46911 > URL: https://issues.apache.org/jira/browse/SPARK-46911 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Eric Marnadi >Priority: Major > Labels: pull-request-available > > Adding the {{deleteIfExists}} method to the {{StatefulProcessorHandle}} in > order to remove state variables from the State Store -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46798) Kafka custom partition location assignment in Spark Structured Streaming (rack awareness)
[ https://issues.apache.org/jira/browse/SPARK-46798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46798: --- Labels: pull-request-available (was: ) > Kafka custom partition location assignment in Spark Structured Streaming > (rack awareness) > - > > Key: SPARK-46798 > URL: https://issues.apache.org/jira/browse/SPARK-46798 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: Randall Schwager >Priority: Major > Labels: pull-request-available > > I'd like to propose, and implement if approved, support for custom partition > location assignment. [Please find the design doc for SPARK-46798 describing > the change > here.|https://docs.google.com/document/d/1RoEk_mt8AUh9sTQZ1NfzIuuYKf1zx6BP1K3IlJ2b8iM/edit#heading=h.pbt6pdb2jt5c] > SPARK-15406 Added Kafka consumer support to Spark Structured Streaming, but > it did not add custom partition location assignment as a feature. The > Structured Streaming Kafka consumer as it exists today evenly allocates Kafka > topic partitions to executors without regard to Kafka broker rack information > or executor location. This behavior can drive large cross-AZ networking costs > in large deployments. > [The design doc for > SPARK-15406|https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit#heading=h.k36c6oyz89xw] > described the ability to assign Kafka partitions to particular executors (a > feature which would enable rack awareness), but it seems that feature was > never implemented. > For DStreams users, there does seem to be a way to assign Kafka partitions to > Spark executors in a custom fashion with > [LocationStrategies.PreferFixed|https://github.com/apache/spark/blob/master/connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/LocationStrategy.scala#L69], > so this sort of functionality has a precedent. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46923) Style config tables in documentation consistently
Nicholas Chammas created SPARK-46923: Summary: Style config tables in documentation consistently Key: SPARK-46923 URL: https://issues.apache.org/jira/browse/SPARK-46923 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 4.0.0 Reporter: Nicholas Chammas -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46921) Move `ProblemFilters` that do not belong to defaultExcludes to `v40excludes`.
[ https://issues.apache.org/jira/browse/SPARK-46921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46921. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44952 [https://github.com/apache/spark/pull/44952] > Move `ProblemFilters` that do not belong to defaultExcludes to `v40excludes`. > - > > Key: SPARK-46921 > URL: https://issues.apache.org/jira/browse/SPARK-46921 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46473) Reuse `getPartitionedFile` method
[ https://issues.apache.org/jira/browse/SPARK-46473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-46473: - Priority: Trivial (was: Minor) > Reuse `getPartitionedFile` method > - > > Key: SPARK-46473 > URL: https://issues.apache.org/jira/browse/SPARK-46473 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: xiaoping.huang >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46922) better handling for runtime user errors
Wenchen Fan created SPARK-46922: --- Summary: better handling for runtime user errors Key: SPARK-46922 URL: https://issues.apache.org/jira/browse/SPARK-46922 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
[ https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812345#comment-17812345 ] Bala Bellam commented on SPARK-46747: - Thanks [~yao] . So, does this get released to 3.3 & higher? > Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1 > -- > > Key: SPARK-46747 > URL: https://issues.apache.org/jira/browse/SPARK-46747 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, > 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, > 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4 >Reporter: Bala Bellam >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > +*Background:*+ > PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table > existence in the database by overriding the default > JdbcDialect.getTableExistsQuery which has WHERE 1 = 0. > +*Issue:*+ > Due to LIMIT 1 query pattern, we are seeing high number of shared locks in > the PostgreSQL installations where there are many partitions under a table > that's being written to. Hence resorting to the default JdbcDialect which > does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the > partitions and effectively checks for table existence. > The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain > scenarios, especially with partitioned tables or tables with a lot of data, > as it may take shared locks on all partitions or involve more planner and > execution time to determine the quickest way to get a single row. > On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read > any data due to the always false WHERE condition. This makes it a lighter > operation, as it typically only involves checking the table's metadata to > validate the table's existence without taking locks on the table's data or > partitions. > So, considering performance and minimizing locks, SELECT 1 FROM table WHERE > 1=0 would be a better choice if we're strictly looking to check for a table's > existence and want to avoid potentially heavier operations like taking shared > locks on partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46919) Upgrade `grpcio*` to 1.60.0 and `grpc-java` to 1.61.0
[ https://issues.apache.org/jira/browse/SPARK-46919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46919: --- Labels: pull-request-available (was: ) > Upgrade `grpcio*` to 1.60.0 and `grpc-java` to 1.61.0 > - > > Key: SPARK-46919 > URL: https://issues.apache.org/jira/browse/SPARK-46919 > Project: Spark > Issue Type: Improvement > Components: Build, Connect >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
[ https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812341#comment-17812341 ] Kent Yao commented on SPARK-46747: -- according to the release policies of Spark,patches never get merged to EOL versions. FYI, Spark 3.2 and before are EOL. if you stay in 2.3 for some reason, consider backport the patch to it > Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1 > -- > > Key: SPARK-46747 > URL: https://issues.apache.org/jira/browse/SPARK-46747 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, > 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, > 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4 >Reporter: Bala Bellam >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > +*Background:*+ > PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table > existence in the database by overriding the default > JdbcDialect.getTableExistsQuery which has WHERE 1 = 0. > +*Issue:*+ > Due to LIMIT 1 query pattern, we are seeing high number of shared locks in > the PostgreSQL installations where there are many partitions under a table > that's being written to. Hence resorting to the default JdbcDialect which > does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the > partitions and effectively checks for table existence. > The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain > scenarios, especially with partitioned tables or tables with a lot of data, > as it may take shared locks on all partitions or involve more planner and > execution time to determine the quickest way to get a single row. > On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read > any data due to the always false WHERE condition. This makes it a lighter > operation, as it typically only involves checking the table's metadata to > validate the table's existence without taking locks on the table's data or > partitions. > So, considering performance and minimizing locks, SELECT 1 FROM table WHERE > 1=0 would be a better choice if we're strictly looking to check for a table's > existence and want to avoid potentially heavier operations like taking shared > locks on partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
[ https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bala Bellam updated SPARK-46747: Fix Version/s: (was: 2.3.2) > Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1 > -- > > Key: SPARK-46747 > URL: https://issues.apache.org/jira/browse/SPARK-46747 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, > 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, > 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4 >Reporter: Bala Bellam >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > +*Background:*+ > PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table > existence in the database by overriding the default > JdbcDialect.getTableExistsQuery which has WHERE 1 = 0. > +*Issue:*+ > Due to LIMIT 1 query pattern, we are seeing high number of shared locks in > the PostgreSQL installations where there are many partitions under a table > that's being written to. Hence resorting to the default JdbcDialect which > does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the > partitions and effectively checks for table existence. > The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain > scenarios, especially with partitioned tables or tables with a lot of data, > as it may take shared locks on all partitions or involve more planner and > execution time to determine the quickest way to get a single row. > On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read > any data due to the always false WHERE condition. This makes it a lighter > operation, as it typically only involves checking the table's metadata to > validate the table's existence without taking locks on the table's data or > partitions. > So, considering performance and minimizing locks, SELECT 1 FROM table WHERE > 1=0 would be a better choice if we're strictly looking to check for a table's > existence and want to avoid potentially heavier operations like taking shared > locks on partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
[ https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bala Bellam updated SPARK-46747: Fix Version/s: 2.3.2 > Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1 > -- > > Key: SPARK-46747 > URL: https://issues.apache.org/jira/browse/SPARK-46747 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, > 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, > 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4 >Reporter: Bala Bellam >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 2.3.2, 4.0.0 > > > +*Background:*+ > PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table > existence in the database by overriding the default > JdbcDialect.getTableExistsQuery which has WHERE 1 = 0. > +*Issue:*+ > Due to LIMIT 1 query pattern, we are seeing high number of shared locks in > the PostgreSQL installations where there are many partitions under a table > that's being written to. Hence resorting to the default JdbcDialect which > does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the > partitions and effectively checks for table existence. > The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain > scenarios, especially with partitioned tables or tables with a lot of data, > as it may take shared locks on all partitions or involve more planner and > execution time to determine the quickest way to get a single row. > On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read > any data due to the always false WHERE condition. This makes it a lighter > operation, as it typically only involves checking the table's metadata to > validate the table's existence without taking locks on the table's data or > partitions. > So, considering performance and minimizing locks, SELECT 1 FROM table WHERE > 1=0 would be a better choice if we're strictly looking to check for a table's > existence and want to avoid potentially heavier operations like taking shared > locks on partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
[ https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812302#comment-17812302 ] Bala Bellam commented on SPARK-46747: - Thank you very much [~yao] . Sure, I can provide those number of shared locks as soon as I can. Currently we are using older versions of Spark (2.3). Does this PR update the older versions as well? > Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1 > -- > > Key: SPARK-46747 > URL: https://issues.apache.org/jira/browse/SPARK-46747 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, > 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, > 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4 >Reporter: Bala Bellam >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > +*Background:*+ > PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table > existence in the database by overriding the default > JdbcDialect.getTableExistsQuery which has WHERE 1 = 0. > +*Issue:*+ > Due to LIMIT 1 query pattern, we are seeing high number of shared locks in > the PostgreSQL installations where there are many partitions under a table > that's being written to. Hence resorting to the default JdbcDialect which > does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the > partitions and effectively checks for table existence. > The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain > scenarios, especially with partitioned tables or tables with a lot of data, > as it may take shared locks on all partitions or involve more planner and > execution time to determine the quickest way to get a single row. > On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read > any data due to the always false WHERE condition. This makes it a lighter > operation, as it typically only involves checking the table's metadata to > validate the table's existence without taking locks on the table's data or > partitions. > So, considering performance and minimizing locks, SELECT 1 FROM table WHERE > 1=0 would be a better choice if we're strictly looking to check for a table's > existence and want to avoid potentially heavier operations like taking shared > locks on partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
[ https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-46747: - Target Version/s: (was: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4) > Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1 > -- > > Key: SPARK-46747 > URL: https://issues.apache.org/jira/browse/SPARK-46747 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, > 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, > 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4 >Reporter: Bala Bellam >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > +*Background:*+ > PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table > existence in the database by overriding the default > JdbcDialect.getTableExistsQuery which has WHERE 1 = 0. > +*Issue:*+ > Due to LIMIT 1 query pattern, we are seeing high number of shared locks in > the PostgreSQL installations where there are many partitions under a table > that's being written to. Hence resorting to the default JdbcDialect which > does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the > partitions and effectively checks for table existence. > The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain > scenarios, especially with partitioned tables or tables with a lot of data, > as it may take shared locks on all partitions or involve more planner and > execution time to determine the quickest way to get a single row. > On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read > any data due to the always false WHERE condition. This makes it a lighter > operation, as it typically only involves checking the table's metadata to > validate the table's existence without taking locks on the table's data or > partitions. > So, considering performance and minimizing locks, SELECT 1 FROM table WHERE > 1=0 would be a better choice if we're strictly looking to check for a table's > existence and want to avoid potentially heavier operations like taking shared > locks on partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
[ https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-46747: - Priority: Major (was: Critical) > Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1 > -- > > Key: SPARK-46747 > URL: https://issues.apache.org/jira/browse/SPARK-46747 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, > 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, > 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4 >Reporter: Bala Bellam >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > +*Background:*+ > PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table > existence in the database by overriding the default > JdbcDialect.getTableExistsQuery which has WHERE 1 = 0. > +*Issue:*+ > Due to LIMIT 1 query pattern, we are seeing high number of shared locks in > the PostgreSQL installations where there are many partitions under a table > that's being written to. Hence resorting to the default JdbcDialect which > does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the > partitions and effectively checks for table existence. > The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain > scenarios, especially with partitioned tables or tables with a lot of data, > as it may take shared locks on all partitions or involve more planner and > execution time to determine the quickest way to get a single row. > On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read > any data due to the always false WHERE condition. This makes it a lighter > operation, as it typically only involves checking the table's metadata to > validate the table's existence without taking locks on the table's data or > partitions. > So, considering performance and minimizing locks, SELECT 1 FROM table WHERE > 1=0 would be a better choice if we're strictly looking to check for a table's > existence and want to avoid potentially heavier operations like taking shared > locks on partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
[ https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-46747. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44948 [https://github.com/apache/spark/pull/44948] > Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1 > -- > > Key: SPARK-46747 > URL: https://issues.apache.org/jira/browse/SPARK-46747 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, > 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, > 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4 >Reporter: Bala Bellam >Assignee: Kent Yao >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > > +*Background:*+ > PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table > existence in the database by overriding the default > JdbcDialect.getTableExistsQuery which has WHERE 1 = 0. > +*Issue:*+ > Due to LIMIT 1 query pattern, we are seeing high number of shared locks in > the PostgreSQL installations where there are many partitions under a table > that's being written to. Hence resorting to the default JdbcDialect which > does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the > partitions and effectively checks for table existence. > The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain > scenarios, especially with partitioned tables or tables with a lot of data, > as it may take shared locks on all partitions or involve more planner and > execution time to determine the quickest way to get a single row. > On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read > any data due to the always false WHERE condition. This makes it a lighter > operation, as it typically only involves checking the table's metadata to > validate the table's existence without taking locks on the table's data or > partitions. > So, considering performance and minimizing locks, SELECT 1 FROM table WHERE > 1=0 would be a better choice if we're strictly looking to check for a table's > existence and want to avoid potentially heavier operations like taking shared > locks on partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
[ https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-46747: Assignee: Kent Yao > Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1 > -- > > Key: SPARK-46747 > URL: https://issues.apache.org/jira/browse/SPARK-46747 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, > 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, > 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4 >Reporter: Bala Bellam >Assignee: Kent Yao >Priority: Critical > Labels: pull-request-available > > +*Background:*+ > PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table > existence in the database by overriding the default > JdbcDialect.getTableExistsQuery which has WHERE 1 = 0. > +*Issue:*+ > Due to LIMIT 1 query pattern, we are seeing high number of shared locks in > the PostgreSQL installations where there are many partitions under a table > that's being written to. Hence resorting to the default JdbcDialect which > does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the > partitions and effectively checks for table existence. > The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain > scenarios, especially with partitioned tables or tables with a lot of data, > as it may take shared locks on all partitions or involve more planner and > execution time to determine the quickest way to get a single row. > On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read > any data due to the always false WHERE condition. This makes it a lighter > operation, as it typically only involves checking the table's metadata to > validate the table's existence without taking locks on the table's data or > partitions. > So, considering performance and minimizing locks, SELECT 1 FROM table WHERE > 1=0 would be a better choice if we're strictly looking to check for a table's > existence and want to avoid potentially heavier operations like taking shared > locks on partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46918) Replace self-defined variables with Hadoop ContainerExitStatus
[ https://issues.apache.org/jira/browse/SPARK-46918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-46918. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44950 [https://github.com/apache/spark/pull/44950] > Replace self-defined variables with Hadoop ContainerExitStatus > -- > > Key: SPARK-46918 > URL: https://issues.apache.org/jira/browse/SPARK-46918 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46918) Replace self-defined variables with Hadoop ContainerExitStatus
[ https://issues.apache.org/jira/browse/SPARK-46918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-46918: Assignee: Cheng Pan > Replace self-defined variables with Hadoop ContainerExitStatus > -- > > Key: SPARK-46918 > URL: https://issues.apache.org/jira/browse/SPARK-46918 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46919) Upgrade `grpcio*` to 1.60.0 and `grpc-java` to 1.61.0
Yang Jie created SPARK-46919: Summary: Upgrade `grpcio*` to 1.60.0 and `grpc-java` to 1.61.0 Key: SPARK-46919 URL: https://issues.apache.org/jira/browse/SPARK-46919 Project: Spark Issue Type: Improvement Components: Build, Connect Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46917) [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test
[ https://issues.apache.org/jira/browse/SPARK-46917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-46917: Summary: [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test (was: Improve merge_spark_pr.py) > [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test > -- > > Key: SPARK-46917 > URL: https://issues.apache.org/jira/browse/SPARK-46917 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46918) Replace self-defined variables with Hadoop ContainerExitStatus
[ https://issues.apache.org/jira/browse/SPARK-46918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46918: --- Labels: pull-request-available (was: ) > Replace self-defined variables with Hadoop ContainerExitStatus > -- > > Key: SPARK-46918 > URL: https://issues.apache.org/jira/browse/SPARK-46918 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46918) Replace self-defined variables with Hadoop ContainerExitStatus
Cheng Pan created SPARK-46918: - Summary: Replace self-defined variables with Hadoop ContainerExitStatus Key: SPARK-46918 URL: https://issues.apache.org/jira/browse/SPARK-46918 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 4.0.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46890) CSV fails on a column with default and without enforcing schema
[ https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46890: -- Assignee: (was: Apache Spark) > CSV fails on a column with default and without enforcing schema > --- > > Key: SPARK-46890 > URL: https://issues.apache.org/jira/browse/SPARK-46890 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Major > Labels: pull-request-available > Attachments: image-2024-01-29-13-22-05-326.png > > > When we create a table using CSV on an existing file with a header and: > - a column has an default + > - enforceSchema is false - taking into account CSV header > then query a column with a default. > The example below shows the issue: > {code:sql} > CREATE TABLE IF NOT EXISTS products ( > product_id INT, > name STRING, > price FLOAT default 0.0, > quantity INT default 0 > ) > USING CSV > OPTIONS ( > header 'true', > inferSchema 'false', > enforceSchema 'false', > path '/Users/maximgekk/tmp/products.csv' > ); > {code} > The CSV file products.csv: > {code:java} > product_id,name,price,quantity > 1,Apple,0.50,100 > 2,Banana,0.25,200 > 3,Orange,0.75,50 > {code} > The query fails: > {code:sql} > spark-sql (default)> SELECT price FROM products; > 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6) > java.lang.IllegalArgumentException: Number of column in CSV header is not > equal to number of fields in the schema: > Header length: 4, schema size: 1 > CSV file: file:///Users/maximgekk/tmp/products.csv > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46890) CSV fails on a column with default and without enforcing schema
[ https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46890: -- Assignee: Apache Spark > CSV fails on a column with default and without enforcing schema > --- > > Key: SPARK-46890 > URL: https://issues.apache.org/jira/browse/SPARK-46890 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > Attachments: image-2024-01-29-13-22-05-326.png > > > When we create a table using CSV on an existing file with a header and: > - a column has an default + > - enforceSchema is false - taking into account CSV header > then query a column with a default. > The example below shows the issue: > {code:sql} > CREATE TABLE IF NOT EXISTS products ( > product_id INT, > name STRING, > price FLOAT default 0.0, > quantity INT default 0 > ) > USING CSV > OPTIONS ( > header 'true', > inferSchema 'false', > enforceSchema 'false', > path '/Users/maximgekk/tmp/products.csv' > ); > {code} > The CSV file products.csv: > {code:java} > product_id,name,price,quantity > 1,Apple,0.50,100 > 2,Banana,0.25,200 > 3,Orange,0.75,50 > {code} > The query fails: > {code:sql} > spark-sql (default)> SELECT price FROM products; > 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6) > java.lang.IllegalArgumentException: Number of column in CSV header is not > equal to number of fields in the schema: > Header length: 4, schema size: 1 > CSV file: file:///Users/maximgekk/tmp/products.csv > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46851) Remove buf version information from the doc contributing.rst
[ https://issues.apache.org/jira/browse/SPARK-46851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-46851: Summary: Remove buf version information from the doc contributing.rst (was: Upgrade `buf` to v1.29.0) > Remove buf version information from the doc contributing.rst > > > Key: SPARK-46851 > URL: https://issues.apache.org/jira/browse/SPARK-46851 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org