date:20240130

[jira] [Resolved] (SPARK-46753) Fix `pypy3` python test

2024-01-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46753.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44778
[https://github.com/apache/spark/pull/44778]

> Fix `pypy3` python test
> ---
>
> Key: SPARK-46753
> URL: https://issues.apache.org/jira/browse/SPARK-46753
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46753) Fix `pypy3` python test

2024-01-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46753:


Assignee: BingKun Pan

> Fix `pypy3` python test
> ---
>
> Key: SPARK-46753
> URL: https://issues.apache.org/jira/browse/SPARK-46753
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46926) Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list

2024-01-30 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-46926:
-

Assignee: Ruifeng Zheng

> Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list
> -
>
> Key: SPARK-46926
> URL: https://issues.apache.org/jira/browse/SPARK-46926
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46926) Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list

2024-01-30 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-46926.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44965
[https://github.com/apache/spark/pull/44965]

> Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list
> -
>
> Key: SPARK-46926
> URL: https://issues.apache.org/jira/browse/SPARK-46926
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46931) Implement {Frame, Series}.to_hdf

2024-01-30 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-46931:
-

 Summary: Implement {Frame, Series}.to_hdf
 Key: SPARK-46931
 URL: https://issues.apache.org/jira/browse/SPARK-46931
 Project: Spark
  Issue Type: Sub-task
  Components: PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46926) Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list

2024-01-30 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-46926:
--
Summary: Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback 
list  (was: Support `convert_dtypes`, `infer_objects` and `set_axis` in 
fallback mode)

> Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list
> -
>
> Key: SPARK-46926
> URL: https://issues.apache.org/jira/browse/SPARK-46926
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46926) Support `convert_dtypes`, `infer_objects` and `set_axis` in fallback mode

2024-01-30 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-46926:
--
Summary: Support `convert_dtypes`, `infer_objects` and `set_axis` in 
fallback mode  (was: Support `convert_dtypes` and `infer_objects` in fallback 
mode)

> Support `convert_dtypes`, `infer_objects` and `set_axis` in fallback mode
> -
>
> Key: SPARK-46926
> URL: https://issues.apache.org/jira/browse/SPARK-46926
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46930) Add support for a custom prefix for fields of Avro union type when enableStableIdentifiersForUnionType is enabled

2024-01-30 Thread Ivan Sadikov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Sadikov updated SPARK-46930:
-
Description: {{enableStableIdentifiersForUnionType}} allows to enable 
stable identifiers in Avro. We need to add another config to allow configuring 
the custom prefix used in stable identifiers. Currently the value is 
{{{}member_{}}}, e.g. member_int, member_string, but we should be able to 
change to any value or even leave empty.

> Add support for a custom prefix for fields of Avro union type when 
> enableStableIdentifiersForUnionType is enabled 
> --
>
> Key: SPARK-46930
> URL: https://issues.apache.org/jira/browse/SPARK-46930
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ivan Sadikov
>Priority: Major
>
> {{enableStableIdentifiersForUnionType}} allows to enable stable identifiers 
> in Avro. We need to add another config to allow configuring the custom prefix 
> used in stable identifiers. Currently the value is {{{}member_{}}}, e.g. 
> member_int, member_string, but we should be able to change to any value or 
> even leave empty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46930) Add support for a custom prefix for fields of Avro union type when enableStableIdentifiersForUnionType is enabled

2024-01-30 Thread Ivan Sadikov (Jira)

Ivan Sadikov created SPARK-46930:


 Summary: Add support for a custom prefix for fields of Avro union 
type when enableStableIdentifiersForUnionType is enabled 
 Key: SPARK-46930
 URL: https://issues.apache.org/jira/browse/SPARK-46930
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Ivan Sadikov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46929) Use ThreadUtils.shutdown to close thread pools

2024-01-30 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-46929:
---
Component/s: Connect
 Spark Core
 SS
 (was: SQL)

> Use ThreadUtils.shutdown to close thread pools
> --
>
> Key: SPARK-46929
> URL: https://issues.apache.org/jira/browse/SPARK-46929
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Spark Core, SS
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

2024-01-30 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-46747:
-
Fix Version/s: 3.5.1
   3.4.3

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>Reporter: Bala Bellam
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1, 3.4.3
>
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46924) Fix `Load New` button in `Master/HistoryServer` Log UI

2024-01-30 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46924.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44957
[https://github.com/apache/spark/pull/44957]

> Fix `Load New` button in `Master/HistoryServer` Log UI
> --
>
> Key: SPARK-46924
> URL: https://issues.apache.org/jira/browse/SPARK-46924
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46927) Make `assertDataFrameEqual` work properly without PyArrow

2024-01-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46927.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44960
[https://github.com/apache/spark/pull/44960]

> Make `assertDataFrameEqual` work properly without PyArrow
> -
>
> Key: SPARK-46927
> URL: https://issues.apache.org/jira/browse/SPARK-46927
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46929) Use ThreadUtils.shutdown to close thread pools

2024-01-30 Thread Jiaan Geng (Jira)

Jiaan Geng created SPARK-46929:
--

 Summary: Use ThreadUtils.shutdown to close thread pools
 Key: SPARK-46929
 URL: https://issues.apache.org/jira/browse/SPARK-46929
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Jiaan Geng
Assignee: Jiaan Geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46927) Make `assertDataFrameEqual` work properly without PyArrow

2024-01-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46927:


Assignee: Haejoon Lee

> Make `assertDataFrameEqual` work properly without PyArrow
> -
>
> Key: SPARK-46927
> URL: https://issues.apache.org/jira/browse/SPARK-46927
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46927) Make `assertDataFrameEqual` work properly without PyArrow

2024-01-30 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-46927:
---

 Summary: Make `assertDataFrameEqual` work properly without PyArrow
 Key: SPARK-46927
 URL: https://issues.apache.org/jira/browse/SPARK-46927
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46928) Support ListState in Arbitrary State API v2

2024-01-30 Thread Bhuwan Sahni (Jira)

Bhuwan Sahni created SPARK-46928:


 Summary: Support ListState in Arbitrary State API v2
 Key: SPARK-46928
 URL: https://issues.apache.org/jira/browse/SPARK-46928
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Bhuwan Sahni


As part of Arbitrary State API v2 
([https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig),]
 we need to support ListState. This task encounters adding support for 
ListState in Scala. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46926) Support `convert_dtypes` and `infer_objects` in fallback mode

2024-01-30 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-46926:
-

 Summary: Support `convert_dtypes` and `infer_objects` in fallback 
mode
 Key: SPARK-46926
 URL: https://issues.apache.org/jira/browse/SPARK-46926
 Project: Spark
  Issue Type: Sub-task
  Components: PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46925) Add a warning that instructs to install memory_profiler for memory profiling

2024-01-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46925:


Assignee: Xinrong Meng

> Add a warning that instructs to install memory_profiler for memory profiling
> 
>
> Key: SPARK-46925
> URL: https://issues.apache.org/jira/browse/SPARK-46925
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Add a warning that instructs to install memory_profiler for memory profiling



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46925) Add a warning that instructs to install memory_profiler for memory profiling

2024-01-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46925.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44958
[https://github.com/apache/spark/pull/44958]

> Add a warning that instructs to install memory_profiler for memory profiling
> 
>
> Key: SPARK-46925
> URL: https://issues.apache.org/jira/browse/SPARK-46925
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 4.0.0
>
>
> Add a warning that instructs to install memory_profiler for memory profiling



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46192) failed to insert the table using the default value of union

2024-01-30 Thread zengxl (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zengxl resolved SPARK-46192.

Resolution: Fixed

> failed to insert the table using the default value of union
> ---
>
> Key: SPARK-46192
> URL: https://issues.apache.org/jira/browse/SPARK-46192
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.4.1
>Reporter: zengxl
>Priority: Major
>
>  
> Obtain the following tables and data
> {code:java}
> create table test_spark(k string default null,v int default null) stored as 
> orc;
> create table test_spark_1(k string default null,v int default null) stored as 
> orc;
> insert into table test_spark_1 values('k1',1),('k2',2),('k3',3);
> create table test_spark_2(k string default null,v int default null) stored as 
> orc; 
> insert into table test_spark_2 values('k3',3),('k4',4),('k5',5);
> {code}
> Execute the following SQL
> {code:java}
> insert into table test_spark (k) 
> select k from test_spark_1
> union
> select k from test_spark_2 
> {code}
> exception:
> {code:java}
> 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
> CatalogAndIdentifier
> 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
> CatalogAndIdentifier
> 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
> CatalogAndIdentifier
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: 
> i.userSpecifiedCols.size is 1
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: 
> i.userSpecifiedCols.size is 1
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 
> ,resolved :1 , i.query 1
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: here is 
> ResolveUserSpecifiedColumns tableOutoyt: 2---nameToQueryExpr : 1Error in 
> query: `default`.`test_spark` requires that the data to be inserted have the 
> same number of columns as the target table: target table has 2 column(s) but 
> the inserted data has 1 column(s), including 0 partition column(s) having 
> constant value(s). {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46192) failed to insert the table using the default value of union

2024-01-30 Thread zengxl (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812518#comment-17812518
 ] 

zengxl commented on SPARK-46192:


This patch solves all of these problems

https://issues.apache.org/jira/browse/SPARK-43742

> failed to insert the table using the default value of union
> ---
>
> Key: SPARK-46192
> URL: https://issues.apache.org/jira/browse/SPARK-46192
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.4.1
>Reporter: zengxl
>Priority: Major
>
>  
> Obtain the following tables and data
> {code:java}
> create table test_spark(k string default null,v int default null) stored as 
> orc;
> create table test_spark_1(k string default null,v int default null) stored as 
> orc;
> insert into table test_spark_1 values('k1',1),('k2',2),('k3',3);
> create table test_spark_2(k string default null,v int default null) stored as 
> orc; 
> insert into table test_spark_2 values('k3',3),('k4',4),('k5',5);
> {code}
> Execute the following SQL
> {code:java}
> insert into table test_spark (k) 
> select k from test_spark_1
> union
> select k from test_spark_2 
> {code}
> exception:
> {code:java}
> 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
> CatalogAndIdentifier
> 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
> CatalogAndIdentifier
> 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
> CatalogAndIdentifier
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: 
> i.userSpecifiedCols.size is 1
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: 
> i.userSpecifiedCols.size is 1
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 
> ,resolved :1 , i.query 1
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: here is 
> ResolveUserSpecifiedColumns tableOutoyt: 2---nameToQueryExpr : 1Error in 
> query: `default`.`test_spark` requires that the data to be inserted have the 
> same number of columns as the target table: target table has 2 column(s) but 
> the inserted data has 1 column(s), including 0 partition column(s) having 
> constant value(s). {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46923) Limit width of config tables in documentation and style them consistently

2024-01-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46923.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44955
[https://github.com/apache/spark/pull/44955]

> Limit width of config tables in documentation and style them consistently
> -
>
> Key: SPARK-46923
> URL: https://issues.apache.org/jira/browse/SPARK-46923
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46923) Limit width of config tables in documentation and style them consistently

2024-01-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46923:


Assignee: Nicholas Chammas

> Limit width of config tables in documentation and style them consistently
> -
>
> Key: SPARK-46923
> URL: https://issues.apache.org/jira/browse/SPARK-46923
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46925) Add a warning that instructs to install memory_profiler for memory profiling

2024-01-30 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-46925:


 Summary: Add a warning that instructs to install memory_profiler 
for memory profiling
 Key: SPARK-46925
 URL: https://issues.apache.org/jira/browse/SPARK-46925
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Add a warning that instructs to install memory_profiler for memory profiling



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46924) Fix `Load New` button in `Master/HistoryServer` Log UI

2024-01-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46924:
-

Assignee: Dongjoon Hyun

> Fix `Load New` button in `Master/HistoryServer` Log UI
> --
>
> Key: SPARK-46924
> URL: https://issues.apache.org/jira/browse/SPARK-46924
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46924) Fix `Load New` button in `Master/HistoryServer` Log UI

2024-01-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46924:
--
Summary: Fix `Load New` button in `Master/HistoryServer` Log UI  (was: Fix 
`Load New` button in `Master/HistoryServer` UI)

> Fix `Load New` button in `Master/HistoryServer` Log UI
> --
>
> Key: SPARK-46924
> URL: https://issues.apache.org/jira/browse/SPARK-46924
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46924) Fix `Load New` button in `Master/HistoryServer` UI

2024-01-30 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-46924:
-

 Summary: Fix `Load New` button in `Master/HistoryServer` UI
 Key: SPARK-46924
 URL: https://issues.apache.org/jira/browse/SPARK-46924
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46923) Limit width of config tables in documentation and style them consistently

2024-01-30 Thread Nicholas Chammas (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-46923:
-
Summary: Limit width of config tables in documentation and style them 
consistently  (was: Style config tables in documentation consistently)

> Limit width of config tables in documentation and style them consistently
> -
>
> Key: SPARK-46923
> URL: https://issues.apache.org/jira/browse/SPARK-46923
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46830) Introducing collation concept into Spark

2024-01-30 Thread Aleksandar Tomic (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandar Tomic updated SPARK-46830:
-
Description: 
This feature will introduce collation support to the Spark engine. This means 
that:

 
 # Every StringType will have an associated collation. Default remains UTF8 
Binary, which will behave under the same rules as current UTF8 String 
comparison.
 # Collation will be respected in all collation sensitive operations - 
comparisons, hashing, string operations (contains, startWith, endsWith etc.)
 # Collation can be set through following ways:
 ## COLLATE expression. e.g. strExpr COLLATE collation_name
 ## In CREATE TABLE column definition
 ## By setting session collation.
 # All the Spark operators need to respect collation settings (filters, joins, 
shuffles, aggs etc.)

 

This is a high level description of the feature. You can find detailed design 
under 
[this|https://docs.google.com/document/d/1A9RQiwq-n3R3vuh571yjOLaaIuIYRTyCx7UFr0Qg-eY/edit?usp=sharing]
 link (doc is in attachment as well).

 

  was:
This feature will introduce collation support to the Spark engine. This means 
that:

 
 # Every StringType will have an associated collation. Default remains UTF8 
Binary, which will behave under the same rules as current UTF8 String 
comparison.
 # Collation will be respected in all collation sensitive operations - 
comparisons, hashing, string operations (contains, startWith, endsWith etc.)
 # Collation can be set through following ways:
 ## COLLATE expression. e.g. strExpr COLLATE collation_name
 ## In CREATE TABLE column definition
 ## By setting session collation.
 # All the Spark operators need to respect collation settings (filters, joins, 
shuffles, aggs etc.)

 

This is a high level description of the feature. You can find detailed design 
under 
[this|https://docs.google.com/document/d/1G3Xap-0Aj-QC6qoWZDDqO84IulHnogjD1REE3yh1_jk/edit?usp=sharing]
 link (doc is in attachment as well).

 


> Introducing collation concept into Spark
> 
>
> Key: SPARK-46830
> URL: https://issues.apache.org/jira/browse/SPARK-46830
> Project: Spark
>  Issue Type: Epic
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Priority: Major
> Attachments: Collation Support in Spark.docx
>
>
> This feature will introduce collation support to the Spark engine. This means 
> that:
>  
>  # Every StringType will have an associated collation. Default remains UTF8 
> Binary, which will behave under the same rules as current UTF8 String 
> comparison.
>  # Collation will be respected in all collation sensitive operations - 
> comparisons, hashing, string operations (contains, startWith, endsWith etc.)
>  # Collation can be set through following ways:
>  ## COLLATE expression. e.g. strExpr COLLATE collation_name
>  ## In CREATE TABLE column definition
>  ## By setting session collation.
>  # All the Spark operators need to respect collation settings (filters, 
> joins, shuffles, aggs etc.)
>  
> This is a high level description of the feature. You can find detailed design 
> under 
> [this|https://docs.google.com/document/d/1A9RQiwq-n3R3vuh571yjOLaaIuIYRTyCx7UFr0Qg-eY/edit?usp=sharing]
>  link (doc is in attachment as well).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46911) Add deleteIfExists operator to StatefulProcessorHandle

2024-01-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46911:
---
Labels: pull-request-available  (was: )

> Add deleteIfExists operator to StatefulProcessorHandle
> --
>
> Key: SPARK-46911
> URL: https://issues.apache.org/jira/browse/SPARK-46911
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Eric Marnadi
>Priority: Major
>  Labels: pull-request-available
>
> Adding the {{deleteIfExists}} method to the {{StatefulProcessorHandle}} in 
> order to remove state variables from the State Store



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46798) Kafka custom partition location assignment in Spark Structured Streaming (rack awareness)

2024-01-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46798:
---
Labels: pull-request-available  (was: )

> Kafka custom partition location assignment in Spark Structured Streaming 
> (rack awareness)
> -
>
> Key: SPARK-46798
> URL: https://issues.apache.org/jira/browse/SPARK-46798
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0, 3.5.0
>Reporter: Randall Schwager
>Priority: Major
>  Labels: pull-request-available
>
> I'd like to propose, and implement if approved, support for custom partition 
> location assignment. [Please find the design doc for SPARK-46798 describing 
> the change 
> here.|https://docs.google.com/document/d/1RoEk_mt8AUh9sTQZ1NfzIuuYKf1zx6BP1K3IlJ2b8iM/edit#heading=h.pbt6pdb2jt5c]
> SPARK-15406 Added Kafka consumer support to Spark Structured Streaming, but 
> it did not add custom partition location assignment as a feature. The 
> Structured Streaming Kafka consumer as it exists today evenly allocates Kafka 
> topic partitions to executors without regard to Kafka broker rack information 
> or executor location. This behavior can drive large cross-AZ networking costs 
> in large deployments.
> [The design doc for 
> SPARK-15406|https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit#heading=h.k36c6oyz89xw]
>  described the ability to assign Kafka partitions to particular executors (a 
> feature which would enable rack awareness), but it seems that feature was 
> never implemented.
> For DStreams users, there does seem to be a way to assign Kafka partitions to 
> Spark executors in a custom fashion with 
> [LocationStrategies.PreferFixed|https://github.com/apache/spark/blob/master/connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/LocationStrategy.scala#L69],
>  so this sort of functionality has a precedent.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46923) Style config tables in documentation consistently

2024-01-30 Thread Nicholas Chammas (Jira)

Nicholas Chammas created SPARK-46923:


 Summary: Style config tables in documentation consistently
 Key: SPARK-46923
 URL: https://issues.apache.org/jira/browse/SPARK-46923
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Nicholas Chammas






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46921) Move `ProblemFilters` that do not belong to defaultExcludes to `v40excludes`.

2024-01-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46921.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44952
[https://github.com/apache/spark/pull/44952]

> Move `ProblemFilters` that do not belong to defaultExcludes to `v40excludes`.
> -
>
> Key: SPARK-46921
> URL: https://issues.apache.org/jira/browse/SPARK-46921
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46473) Reuse `getPartitionedFile` method

2024-01-30 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-46473:
-
Priority: Trivial  (was: Minor)

> Reuse `getPartitionedFile` method
> -
>
> Key: SPARK-46473
> URL: https://issues.apache.org/jira/browse/SPARK-46473
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: xiaoping.huang
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46922) better handling for runtime user errors

2024-01-30 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-46922:
---

 Summary: better handling for runtime user errors
 Key: SPARK-46922
 URL: https://issues.apache.org/jira/browse/SPARK-46922
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

2024-01-30 Thread Bala Bellam (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812345#comment-17812345
 ] 

Bala Bellam commented on SPARK-46747:
-

Thanks [~yao] . So, does this get released to 3.3 & higher? 

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>Reporter: Bala Bellam
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46919) Upgrade `grpcio*` to 1.60.0 and `grpc-java` to 1.61.0

2024-01-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46919:
---
Labels: pull-request-available  (was: )

> Upgrade `grpcio*` to 1.60.0 and `grpc-java` to 1.61.0
> -
>
> Key: SPARK-46919
> URL: https://issues.apache.org/jira/browse/SPARK-46919
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Connect
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

2024-01-30 Thread Kent Yao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812341#comment-17812341
 ] 

Kent Yao commented on SPARK-46747:
--

according to the release policies of Spark，patches never get merged to EOL 
versions. FYI, Spark 3.2 and before are EOL.

if you stay in 2.3 for some reason, consider backport the patch to it


> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>Reporter: Bala Bellam
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

2024-01-30 Thread Bala Bellam (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bala Bellam updated SPARK-46747:

Fix Version/s: (was: 2.3.2)

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>Reporter: Bala Bellam
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

2024-01-30 Thread Bala Bellam (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bala Bellam updated SPARK-46747:

Fix Version/s: 2.3.2

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>Reporter: Bala Bellam
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.2, 4.0.0
>
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

2024-01-30 Thread Bala Bellam (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812302#comment-17812302
 ] 

Bala Bellam commented on SPARK-46747:
-

Thank you very much [~yao] . Sure, I can provide those number of shared locks 
as soon as I can.

Currently we are using older versions of Spark (2.3). Does this PR update the 
older versions as well?

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>Reporter: Bala Bellam
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

2024-01-30 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-46747:
-
Target Version/s:   (was: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 
2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 
2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 
3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 
3.3.3, 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4)

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>Reporter: Bala Bellam
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

2024-01-30 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-46747:
-
Priority: Major  (was: Critical)

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>Reporter: Bala Bellam
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

2024-01-30 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46747.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44948
[https://github.com/apache/spark/pull/44948]

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>Reporter: Bala Bellam
>Assignee: Kent Yao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

2024-01-30 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-46747:


Assignee: Kent Yao

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>Reporter: Bala Bellam
>Assignee: Kent Yao
>Priority: Critical
>  Labels: pull-request-available
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46918) Replace self-defined variables with Hadoop ContainerExitStatus

2024-01-30 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-46918.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44950
[https://github.com/apache/spark/pull/44950]

> Replace self-defined variables with Hadoop ContainerExitStatus
> --
>
> Key: SPARK-46918
> URL: https://issues.apache.org/jira/browse/SPARK-46918
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46918) Replace self-defined variables with Hadoop ContainerExitStatus

2024-01-30 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-46918:


Assignee: Cheng Pan

> Replace self-defined variables with Hadoop ContainerExitStatus
> --
>
> Key: SPARK-46918
> URL: https://issues.apache.org/jira/browse/SPARK-46918
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46919) Upgrade `grpcio*` to 1.60.0 and `grpc-java` to 1.61.0

2024-01-30 Thread Yang Jie (Jira)

Yang Jie created SPARK-46919:


 Summary: Upgrade `grpcio*` to 1.60.0 and `grpc-java` to 1.61.0
 Key: SPARK-46919
 URL: https://issues.apache.org/jira/browse/SPARK-46919
 Project: Spark
  Issue Type: Improvement
  Components: Build, Connect
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46917) [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test

2024-01-30 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-46917:

Summary: [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test  (was: Improve 
merge_spark_pr.py)

> [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test
> --
>
> Key: SPARK-46917
> URL: https://issues.apache.org/jira/browse/SPARK-46917
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46918) Replace self-defined variables with Hadoop ContainerExitStatus

2024-01-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46918:
---
Labels: pull-request-available  (was: )

> Replace self-defined variables with Hadoop ContainerExitStatus
> --
>
> Key: SPARK-46918
> URL: https://issues.apache.org/jira/browse/SPARK-46918
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46918) Replace self-defined variables with Hadoop ContainerExitStatus

2024-01-30 Thread Cheng Pan (Jira)

Cheng Pan created SPARK-46918:
-

 Summary: Replace self-defined variables with Hadoop 
ContainerExitStatus
 Key: SPARK-46918
 URL: https://issues.apache.org/jira/browse/SPARK-46918
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46890:
--

Assignee: (was: Apache Spark)

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-01-29-13-22-05-326.png
>
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46890:
--

Assignee: Apache Spark

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-01-29-13-22-05-326.png
>
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46851) Remove buf version information from the doc contributing.rst

2024-01-30 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-46851:

Summary: Remove buf version information from the doc contributing.rst  
(was: Upgrade `buf` to v1.29.0)

> Remove buf version information from the doc contributing.rst
> 
>
> Key: SPARK-46851
> URL: https://issues.apache.org/jira/browse/SPARK-46851
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

55 matches

Mail list logo