[jira] [Updated] (SPARK-46418) Reorganize `ReshapeTests`

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46418:
---
Labels: pull-request-available  (was: )

> Reorganize `ReshapeTests`
> -
>
> Key: SPARK-46418
> URL: https://issues.apache.org/jira/browse/SPARK-46418
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46418) Reorganize `ReshapeTests`

2023-12-14 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46418:
-

 Summary: Reorganize `ReshapeTests`
 Key: SPARK-46418
 URL: https://issues.apache.org/jira/browse/SPARK-46418
 Project: Spark
  Issue Type: Test
  Components: PS, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46415) Creating partitions through jdbc connection to beeline is slow

2023-12-14 Thread xichenglin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xichenglin updated SPARK-46415:
---
Description: Use jdbc to connect to spark beeline through the connection 
pool to perform partition creation operations. When the number of connections 
exceeds 4, the speed of creating partitions will be very slow, and the 
execution time of each SQL is 4s-10s. Spark 2.x does not have this problem, and 
the execution time of each SQL is within 1 second.  (was: Use jdbc to connect 
to spark beeline through the connection pool to create partitions. When the 
number of connections exceeds 4, spark 3. There is no such problem. The 
execution time of each SQL statement is within 1 second.)

> Creating partitions through jdbc connection to beeline is slow
> --
>
> Key: SPARK-46415
> URL: https://issues.apache.org/jira/browse/SPARK-46415
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.3
>Reporter: xichenglin
>Priority: Major
>
> Use jdbc to connect to spark beeline through the connection pool to perform 
> partition creation operations. When the number of connections exceeds 4, the 
> speed of creating partitions will be very slow, and the execution time of 
> each SQL is 4s-10s. Spark 2.x does not have this problem, and the execution 
> time of each SQL is within 1 second.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46417) do not fail when calling getTable and throwException is false

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46417:
---
Labels: pull-request-available  (was: )

> do not fail when calling getTable and throwException is false
> -
>
> Key: SPARK-46417
> URL: https://issues.apache.org/jira/browse/SPARK-46417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46417) do not fail when calling getTable and throwException is false

2023-12-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-46417:

Issue Type: Bug  (was: Improvement)

> do not fail when calling getTable and throwException is false
> -
>
> Key: SPARK-46417
> URL: https://issues.apache.org/jira/browse/SPARK-46417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46417) do not fail when calling getTable and throwException is false

2023-12-14 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-46417:
---

 Summary: do not fail when calling getTable and throwException is 
false
 Key: SPARK-46417
 URL: https://issues.apache.org/jira/browse/SPARK-46417
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46402) Add getMessageParameters and getQueryContext support

2023-12-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46402.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44349
[https://github.com/apache/spark/pull/44349]

> Add getMessageParameters and getQueryContext support
> 
>
> Key: SPARK-46402
> URL: https://issues.apache.org/jira/browse/SPARK-46402
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46402) Add getMessageParameters and getQueryContext support

2023-12-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46402:


Assignee: Hyukjin Kwon

> Add getMessageParameters and getQueryContext support
> 
>
> Key: SPARK-46402
> URL: https://issues.apache.org/jira/browse/SPARK-46402
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46416) Add @tailrec to HadoopFSUtils#shouldFilterOutPath

2023-12-14 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-46416.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44345
[https://github.com/apache/spark/pull/44345]

> Add @tailrec to HadoopFSUtils#shouldFilterOutPath
> -
>
> Key: SPARK-46416
> URL: https://issues.apache.org/jira/browse/SPARK-46416
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46416) Add @tailrec to HadoopFSUtils#shouldFilterOutPath

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46416:
---
Labels: pull-request-available  (was: )

> Add @tailrec to HadoopFSUtils#shouldFilterOutPath
> -
>
> Key: SPARK-46416
> URL: https://issues.apache.org/jira/browse/SPARK-46416
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46416) Add @tailrec to HadoopFSUtils#shouldFilterOutPath

2023-12-14 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-46416:


Assignee: Yang Jie

> Add @tailrec to HadoopFSUtils#shouldFilterOutPath
> -
>
> Key: SPARK-46416
> URL: https://issues.apache.org/jira/browse/SPARK-46416
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46416) Add @tailrec to HadoopFSUtils#shouldFilterOutPath

2023-12-14 Thread Yang Jie (Jira)
Yang Jie created SPARK-46416:


 Summary: Add @tailrec to HadoopFSUtils#shouldFilterOutPath
 Key: SPARK-46416
 URL: https://issues.apache.org/jira/browse/SPARK-46416
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"

2023-12-14 Thread Marc Le Bihan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Le Bihan resolved SPARK-45311.
---
Fix Version/s: 4.0.0
   3.5.1
   3.4.2
   Resolution: Fixed

Resolved through the resolution of linked issues

> Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search 
> for an encoder for a generic type, and since 3.5.x isn't "an expression 
> encoder"
> -
>
> Key: SPARK-45311
> URL: https://issues.apache.org/jira/browse/SPARK-45311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.4.1, 3.5.0
> Environment: Debian 12
> Java 17
> Underlying Spring-Boot 2.7.14
>Reporter: Marc Le Bihan
>Priority: Major
> Fix For: 4.0.0, 3.5.1, 3.4.2
>
> Attachments: JavaTypeInference_116.png, sparkIssue_02.png
>
>
> If you find it convenient, you might clone the 
> [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that 
> does many operations around cities, local authorities and accounting with 
> open data) where I've extracted from my work what's necessary to make a set 
> of 35 tests that run correctly with Spark 3.3.x, and show the troubles 
> encountered with 3.4.x and 3.5.x.
>  
> It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark 
> 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails 
> with two problems:
>  
> *1)* It throws *java.util.NoSuchElementException: None.get* messages 
> everywhere.
> Asking over the Internet, I wasn't alone facing this problem. Reading it, 
> you'll see that I've attempted a debug but my Scala skills are low.
> [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0]
> {color:#172b4d}by the way, if possible, the encoder and decoder functions 
> should forward a parameter as soon as the name of the field being handled is 
> known, and then all the long of their process, so that when the encoder is at 
> any point where it has to throw an exception, it knows the field it is 
> handling in its specific call and can send a message like:{color}
> {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the 
> method or field it was targeting]_{color}
>  
> *2)* *Not found an encoder of the type RS to Spark SQL internal 
> representation.* Consider to change the input type to one of supported at 
> (...)
> Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal 
> representation (...)
>  
> where *RS* and *OMI_ID* are generic types.
> This is strange.
> [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge]
>  
> *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, 
> but another add itself to the list:
> "{*}Only expression encoders are supported for now{*}" on what was accepted 
> and working before.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46415) Creating partitions through jdbc connection to beeline is slow

2023-12-14 Thread xichenglin (Jira)
xichenglin created SPARK-46415:
--

 Summary: Creating partitions through jdbc connection to beeline is 
slow
 Key: SPARK-46415
 URL: https://issues.apache.org/jira/browse/SPARK-46415
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.3, 3.0.0
Reporter: xichenglin


Use jdbc to connect to spark beeline through the connection pool to create 
partitions. When the number of connections exceeds 4, spark 3. There is no such 
problem. The execution time of each SQL statement is within 1 second.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46414) Use prependBaseUri to render javascript imports

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46414:
---
Labels: pull-request-available  (was: )

> Use prependBaseUri to render javascript imports
> ---
>
> Key: SPARK-46414
> URL: https://issues.apache.org/jira/browse/SPARK-46414
> Project: Spark
>  Issue Type: Sub-task
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46414) Use prependBaseUri to render javascript imports

2023-12-14 Thread Kent Yao (Jira)
Kent Yao created SPARK-46414:


 Summary: Use prependBaseUri to render javascript imports
 Key: SPARK-46414
 URL: https://issues.apache.org/jira/browse/SPARK-46414
 Project: Spark
  Issue Type: Sub-task
  Components: UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-12-14 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796975#comment-17796975
 ] 

melin commented on SPARK-43338:
---

[~yao]   databricks support change:    spark.databricks.sql.initial.catalog.name

https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html

> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js

2023-12-14 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46404.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44346
[https://github.com/apache/spark/pull/44346]

> Add structured-streaming-page.test.js to test structured-streaming-page.js
> --
>
> Key: SPARK-46404
> URL: https://issues.apache.org/jira/browse/SPARK-46404
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming, UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46384) Structured Streaming UI doesn't display graph correctly

2023-12-14 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-46384:


Assignee: Kent Yao

> Structured Streaming UI doesn't display graph correctly
> ---
>
> Key: SPARK-46384
> URL: https://issues.apache.org/jira/browse/SPARK-46384
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>
> The Streaming UI is broken currently at spark master. Running a simple query:
> ```
> q = 
> spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start()
> ```
> Would make the spark UI shows empty graph for "operation duration":
> !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8!
> Here is the error:
> !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20!
>  
> I verified the same query runs fine on spark 3.5, as in the following graph.
> !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI!
>  
> This should be a problem from the library updates, this could be a potential 
> source of error: [https://github.com/apache/spark/pull/42879]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js

2023-12-14 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-46404:


Assignee: Kent Yao

> Add structured-streaming-page.test.js to test structured-streaming-page.js
> --
>
> Key: SPARK-46404
> URL: https://issues.apache.org/jira/browse/SPARK-46404
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming, UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46384) Structured Streaming UI doesn't display graph correctly

2023-12-14 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46384.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44346
[https://github.com/apache/spark/pull/44346]

> Structured Streaming UI doesn't display graph correctly
> ---
>
> Key: SPARK-46384
> URL: https://issues.apache.org/jira/browse/SPARK-46384
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The Streaming UI is broken currently at spark master. Running a simple query:
> ```
> q = 
> spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start()
> ```
> Would make the spark UI shows empty graph for "operation duration":
> !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8!
> Here is the error:
> !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20!
>  
> I verified the same query runs fine on spark 3.5, as in the following graph.
> !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI!
>  
> This should be a problem from the library updates, this could be a potential 
> source of error: [https://github.com/apache/spark/pull/42879]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46407) Reorganize `OpsOnDiffFramesDisabledTests`

2023-12-14 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-46407:
-

Assignee: Ruifeng Zheng

> Reorganize `OpsOnDiffFramesDisabledTests`
> -
>
> Key: SPARK-46407
> URL: https://issues.apache.org/jira/browse/SPARK-46407
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46407) Reorganize `OpsOnDiffFramesDisabledTests`

2023-12-14 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-46407.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44354
[https://github.com/apache/spark/pull/44354]

> Reorganize `OpsOnDiffFramesDisabledTests`
> -
>
> Key: SPARK-46407
> URL: https://issues.apache.org/jira/browse/SPARK-46407
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43149) When CTAS with USING fails to store metadata in metastore, data gets left around

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43149:
---
Labels: pull-request-available  (was: )

> When CTAS with USING fails to store metadata in metastore, data gets left 
> around
> 
>
> Key: SPARK-43149
> URL: https://issues.apache.org/jira/browse/SPARK-43149
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
>
> For example:
> {noformat}
> drop table if exists parquet_ds1;
> -- try creating table with invalid column name
> -- use 'using parquet' to designate the data source
> create table parquet_ds1 using parquet as
> select id, date'2018-01-01' + make_dt_interval(0, id)
> from range(0, 10);
> Cannot create a table having a column whose name contains commas in Hive 
> metastore. Table: `spark_catalog`.`default`.`parquet_ds1`; Column: DATE 
> '2018-01-01' + make_dt_interval(0, id, 0, 0.00)
> -- show that table did not get created
> show tables;
> -- try again with valid column name
> -- spark will complain that directory already exists
> create table parquet_ds1 using parquet as
> select id, date'2018-01-01' + make_dt_interval(0, id) as ts
> from range(0, 10);
> [LOCATION_ALREADY_EXISTS] Cannot name the managed table as 
> `spark_catalog`.`default`.`parquet_ds1`, as its associated location 
> 'file:/Users/bruce/github/spark_upstream/spark-warehouse/parquet_ds1' already 
> exists. Please pick a different table name, or remove the existing location 
> first.
> org.apache.spark.SparkRuntimeException: [LOCATION_ALREADY_EXISTS] Cannot name 
> the managed table as `spark_catalog`.`default`.`parquet_ds1`, as its 
> associated location 
> 'file:/Users/bruce/github/spark_upstream/spark-warehouse/parquet_ds1' already 
> exists. Please pick a different table name, or remove the existing location 
> first.
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.locationAlreadyExists(QueryExecutionErrors.scala:2804)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateTableLocation(SessionCatalog.scala:414)
>   at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:176)
> ...
> {noformat}
> One must manually remove the directory {{spark-warehouse/parquet_ds1}} before 
> the {{create table}} command will succeed.
> It seems that datasource table creation runs the data-creation job first, 
> then stores the metadata into the metastore.
> When using Spark to create Hive tables, the issue does not happen:
> {noformat}
> drop table if exists parquet_hive1;
> -- try creating table with invalid column name,
> -- but use 'stored as parquet' instead of 'using'
> create table parquet_hive1 stored as parquet as
> select id, date'2018-01-01' + make_dt_interval(0, id)
> from range(0, 10);
> Cannot create a table having a column whose name contains commas in Hive 
> metastore. Table: `spark_catalog`.`default`.`parquet_hive1`; Column: DATE 
> '2018-01-01' + make_dt_interval(0, id, 0, 0.00)
> -- try again with valid column name. This will succeed;
> create table parquet_hive1 stored as parquet as
> select id, date'2018-01-01' + make_dt_interval(0, id) as ts
> from range(0, 10);
> {noformat}
> It seems that Hive table creation stores metadata into the metastore first, 
> then runs the data-creation job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36680) Supports Dynamic Table Options for Spark SQL

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-36680:
---
Labels: pull-request-available  (was: )

> Supports Dynamic Table Options for Spark SQL
> 
>
> Key: SPARK-36680
> URL: https://issues.apache.org/jira/browse/SPARK-36680
> Project: Spark
>  Issue Type: Wish
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: wang-zhun
>Priority: Major
>  Labels: pull-request-available
>
> Now a DataFrame API user can implement dynamic options through the 
> _DataFrameReader$option_ method, but Spark SQL users cannot use.
> {code:java}
> DataFrameReader/AstBuilder -> UnresolvedRelation$options -> 
> DataSourceV2Relation$options -> SupportsRead$newScanBuilder(options)
> {code}
>  
>  The table options were persisted to the Catalog and if we want to modify 
> that, we should use another DDL like "_ALTER TABLE ..._". But there are some 
> cases that user want to modify the table options dynamically just in the 
> query:
>  * JDBCTable set _fetchsize_ according to the actual situation of the table
>  * IcebergTable support time travel
> {code:java}
> spark.read
> .option("snapshot-id", 10963874102873L)
> .format("iceberg")
> .load("path/to/table"){code}
> These parameters setting is very common and ad-hoc, setting them flexibly 
> would promote the user experience with Spark SQL especially for Now we 
> support catalog expansion.
>   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46294) Clean up initValue vs zeroValue semantics in SQLMetrics

2023-12-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46294.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44222
[https://github.com/apache/spark/pull/44222]

> Clean up initValue vs zeroValue semantics in SQLMetrics
> ---
>
> Key: SPARK-46294
> URL: https://issues.apache.org/jira/browse/SPARK-46294
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Davin Tjong
>Assignee: Davin Tjong
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The semantics of initValue and _zeroValue in SQLMetrics is a little bit 
> confusing, since they effectively mean the same thing. Changing it to the 
> following would be clearer, especially in terms of defining what an "invalid" 
> metric is.
>  
> proposed definitions:
>  
> initValue is the starting value for a SQLMetric. If a metric has value equal 
> to its initValue, then it should be filtered out before aggregating with 
> SQLMetrics.stringValue().
>  
> zeroValue defines the lowest value considered valid. If a SQLMetric is 
> invalid, it is set to zeroValue upon receiving any updates, and it also 
> reports zeroValue as its value to avoid exposing it to the user 
> programatically (concern previouosly addressed in SPARK-41442).
> For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that 
> the metric is by default invalid. At the end of a task, we will update the 
> metric making it valid, and the invalid metrics will be filtered out when 
> calculating min, max, etc. as a workaround for SPARK-11013.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46294) Clean up initValue vs zeroValue semantics in SQLMetrics

2023-12-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46294:
---

Assignee: Davin Tjong

> Clean up initValue vs zeroValue semantics in SQLMetrics
> ---
>
> Key: SPARK-46294
> URL: https://issues.apache.org/jira/browse/SPARK-46294
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Davin Tjong
>Assignee: Davin Tjong
>Priority: Minor
>  Labels: pull-request-available
>
> The semantics of initValue and _zeroValue in SQLMetrics is a little bit 
> confusing, since they effectively mean the same thing. Changing it to the 
> following would be clearer, especially in terms of defining what an "invalid" 
> metric is.
>  
> proposed definitions:
>  
> initValue is the starting value for a SQLMetric. If a metric has value equal 
> to its initValue, then it should be filtered out before aggregating with 
> SQLMetrics.stringValue().
>  
> zeroValue defines the lowest value considered valid. If a SQLMetric is 
> invalid, it is set to zeroValue upon receiving any updates, and it also 
> reports zeroValue as its value to avoid exposing it to the user 
> programatically (concern previouosly addressed in SPARK-41442).
> For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that 
> the metric is by default invalid. At the end of a task, we will update the 
> metric making it valid, and the invalid metrics will be filtered out when 
> calculating min, max, etc. as a workaround for SPARK-11013.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46386) Improve assertions of observation (pyspark.sql.observation)

2023-12-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46386:
-
Summary: Improve assertions of observation (pyspark.sql.observation)  (was: 
Improve and test assertions of observation (pyspark.sql.observation))

> Improve assertions of observation (pyspark.sql.observation)
> ---
>
> Key: SPARK-46386
> URL: https://issues.apache.org/jira/browse/SPARK-46386
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46386) Improve and test assertions of observation (pyspark.sql.observation)

2023-12-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46386:
-
Parent: (was: SPARK-46041)
Issue Type: Improvement  (was: Sub-task)

> Improve and test assertions of observation (pyspark.sql.observation)
> 
>
> Key: SPARK-46386
> URL: https://issues.apache.org/jira/browse/SPARK-46386
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46413:
---
Labels: pull-request-available  (was: )

> Validate returnType of Arrow Python UDF
> ---
>
> Key: SPARK-46413
> URL: https://issues.apache.org/jira/browse/SPARK-46413
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Validate returnType of Arrow Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF

2023-12-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46413:
-
Description: Validate returnType of Arrow Python UDF  (was: Check 
returnType of Arrow Python UDF)

> Validate returnType of Arrow Python UDF
> ---
>
> Key: SPARK-46413
> URL: https://issues.apache.org/jira/browse/SPARK-46413
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Validate returnType of Arrow Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF

2023-12-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46413:
-
Summary: Validate returnType of Arrow Python UDF  (was: Check returnType of 
Arrow Python UDF)

> Validate returnType of Arrow Python UDF
> ---
>
> Key: SPARK-46413
> URL: https://issues.apache.org/jira/browse/SPARK-46413
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Check returnType of Arrow Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46413) Check returnType of Arrow Python UDF

2023-12-14 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46413:


 Summary: Check returnType of Arrow Python UDF
 Key: SPARK-46413
 URL: https://issues.apache.org/jira/browse/SPARK-46413
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Check returnType of Arrow Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46289) Exception when ordering by UDT in interpreted mode

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46289:
---
Labels: pull-request-available  (was: )

> Exception when ordering by UDT in interpreted mode
> --
>
> Key: SPARK-46289
> URL: https://issues.apache.org/jira/browse/SPARK-46289
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.5.0
>Reporter: Bruce Robbins
>Priority: Minor
>  Labels: pull-request-available
>
> In interpreted mode, ordering by a UDT will result in an exception. For 
> example:
> {noformat}
> import org.apache.spark.ml.linalg.{DenseVector, Vector}
> val df = Seq.tabulate(30) { x =>
>   (x, x + 1, x + 2, new DenseVector(Array((x/100.0).toDouble, ((x + 
> 1)/100.0).toDouble, ((x + 3)/100.0).toDouble)))
> }.toDF("id", "c1", "c2", "c3")
> df.createOrReplaceTempView("df")
> // this works
> sql("select * from df order by c3").collect
> sql("set spark.sql.codegen.wholeStage=false")
> sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> // this gets an error
> sql("select * from df order by c3").collect
> {noformat}
> The second {{collect}} action results in the following exception:
> {noformat}
> org.apache.spark.SparkIllegalArgumentException: Type 
> UninitializedPhysicalType does not support ordered operations.
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.orderedOperationUnsupportedByDataTypeError(QueryExecutionErrors.scala:348)
>   at 
> org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:332)
>   at 
> org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:329)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:60)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:39)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter$RowComparator.compare(UnsafeExternalRowSorter.java:254)
> {noformat}
> Note: You don't get an error if you use {{show}} rather than {{collect}}. 
> This is because {{show}} will implicitly add a {{limit}}, in which case the 
> ordering is performed by {{TakeOrderedAndProject}} rather than 
> {{UnsafeExternalRowSorter}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46412) Update resource-managers/kubernetes/integration-tests/README.md to java 17 and 21

2023-12-14 Thread Jira
Bjørn Jørgensen created SPARK-46412:
---

 Summary: Update 
resource-managers/kubernetes/integration-tests/README.md to java 17 and 21
 Key: SPARK-46412
 URL: https://issues.apache.org/jira/browse/SPARK-46412
 Project: Spark
  Issue Type: Documentation
  Components: k8s
Affects Versions: 4.0.0
Reporter: Bjørn Jørgensen


In the file resource-managers/kubernetes/integration-tests/README.md 
change java 8 to 17 and 11 to 21 
-Dspark.kubernetes.test.sparkTgz=spark-3.0.0-SNAPSHOT-bin-example.tgz \
to 4 
OpenJDK to azul/zulu-openjdk



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46409) Spark Connect Repl does not work with ClosureCleaner

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46409:
---
Labels: pull-request-available  (was: )

> Spark Connect Repl does not work with ClosureCleaner
> 
>
> Key: SPARK-46409
> URL: https://issues.apache.org/jira/browse/SPARK-46409
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Vsevolod Stepanov
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-45136 added ClosureCleaner support to SparkConnect client. 
> Unfortunately, this change breaks ConnectRepl launched by 
> `./connector/connect/bin/spark-connect-scala-client`. To reproduce the issue:
>  # Run `./connector/connect/bin/spark-connect-shell`
>  # Run  `./connector/connect/bin/spark-connect-scala-client`
>  # In the REPL, execute this code:
> ```
> @ def plus1(x: Int): Int = x + 1
> @ val plus1_udf = udf(plus1 _)
> ```
> This will fail with the following error:
> ```
> java.lang.reflect.InaccessibleObjectException: Unable to make private native 
> java.lang.reflect.Field[] java.lang.Class.getDeclaredFields0(boolean) 
> accessible: module java.base does not "opens java.lang" to unnamed module 
> @45099dd3
>   
> java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
>   
> java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
>   java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
>   java.lang.reflect.Method.setAccessible(Method.java:193)
>   
> org.apache.spark.util.ClosureCleaner$.getFinalModifiersFieldForJava17(ClosureCleaner.scala:577)
>   
> org.apache.spark.util.ClosureCleaner$.setFieldAndIgnoreModifiers(ClosureCleaner.scala:560)
>   
> org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$18(ClosureCleaner.scala:533)
>   
> org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$18$adapted(ClosureCleaner.scala:525)
>   scala.collection.ArrayOps$WithFilter.foreach(ArrayOps.scala:73)
>   
> org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$16(ClosureCleaner.scala:525)
>   
> org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$16$adapted(ClosureCleaner.scala:522)
>   scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   scala.collection.AbstractIterable.foreach(Iterable.scala:933)
>   scala.collection.IterableOps$WithFilter.foreach(Iterable.scala:903)
>   
> org.apache.spark.util.ClosureCleaner$.cleanupAmmoniteReplClosure(ClosureCleaner.scala:522)
>   org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:251)
>   
> org.apache.spark.sql.expressions.SparkConnectClosureCleaner$.clean(UserDefinedFunction.scala:210)
>   
> org.apache.spark.sql.expressions.ScalarUserDefinedFunction$.apply(UserDefinedFunction.scala:187)
>   
> org.apache.spark.sql.expressions.ScalarUserDefinedFunction$.apply(UserDefinedFunction.scala:180)
>   org.apache.spark.sql.functions$.udf(functions.scala:7956)
>   ammonite.$sess.cmd1$Helper.(cmd1.sc:1)
>   ammonite.$sess.cmd1$.(cmd1.sc:7)
> ```
>  
> This is because ClosureCleaner is heavily reliant on using reflection API and 
> is not compatible with Java 17. The rest of Spark bypasses this by adding 
> `--add-opens` JVM flags, see 
> https://issues.apache.org/jira/browse/SPARK-36796. We need to add these 
> options to Spark Connect Client launch script as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2023-12-14 Thread Andrew Otto (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto updated SPARK-23890:

 Shepherd: Max Gekk
Affects Version/s: 3.0.0
  Description: 
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
from JSON events by adding newly found columns to the Hive table schema, via a 
Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

-We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.-

 

In this [PR|https://github.com/apache/spark/pull/21012], I was told that the 
Spark 3 datasource v2 would support this.

However, it is clear that it does not.  There is an [explicit 
check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441]
 and 
[test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583]
 that prevents this from happening.

 

 

  was:
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
from JSON events by adding newly found columns to the Hive table schema, via a 
Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.

 

 


> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Andrew Otto
>Priority: Major
>  Labels: bulk-closed, pull-request-available
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a 

[jira] [Reopened] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2023-12-14 Thread Andrew Otto (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto reopened SPARK-23890:
-

This was supposed to have been fixed in Spark 3 datasource v2, but the issue 
persists.

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Andrew Otto
>Priority: Major
>  Labels: bulk-closed, pull-request-available
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> -We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.-
>  
> In this [PR|https://github.com/apache/spark/pull/21012], I was told that the 
> Spark 3 datasource v2 would support this.
> However, it is clear that it does not.  There is an [explicit 
> check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441]
>  and 
> [test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583]
>  that prevents this from happening.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-23890:
---
Labels: bulk-closed pull-request-available  (was: bulk-closed)

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>  Labels: bulk-closed, pull-request-available
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46411) Change to use bcprov/bcpkix-jdk18on for test

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46411:
---
Labels: pull-request-available  (was: )

> Change to use bcprov/bcpkix-jdk18on for test
> 
>
> Key: SPARK-46411
> URL: https://issues.apache.org/jira/browse/SPARK-46411
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46411) Change to use bcprov/bcpkix-jdk18on for test

2023-12-14 Thread Yang Jie (Jira)
Yang Jie created SPARK-46411:


 Summary: Change to use bcprov/bcpkix-jdk18on for test
 Key: SPARK-46411
 URL: https://issues.apache.org/jira/browse/SPARK-46411
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46410) Assign error classes/subclasses to JdbcUtils.classifyException

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46410:
---
Labels: pull-request-available  (was: )

> Assign error classes/subclasses to JdbcUtils.classifyException
> --
>
> Key: SPARK-46410
> URL: https://issues.apache.org/jira/browse/SPARK-46410
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Serge Rielau
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> This is a follow up to SPARK-46393.
> We should raise distinct error classes for the different kinds of invokers of 
> JdbcUtils.classifyException



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46410) Assign error classes/subclasses to JdbcUtils.classifyException

2023-12-14 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-46410:


Assignee: Max Gekk

> Assign error classes/subclasses to JdbcUtils.classifyException
> --
>
> Key: SPARK-46410
> URL: https://issues.apache.org/jira/browse/SPARK-46410
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Serge Rielau
>Assignee: Max Gekk
>Priority: Major
>
> This is a follow up to SPARK-46393.
> We should raise distinct error classes for the different kinds of invokers of 
> JdbcUtils.classifyException



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46410) Assign error classes/subclasses to JdbcUtils.classifyException

2023-12-14 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-46410:


 Summary: Assign error classes/subclasses to 
JdbcUtils.classifyException
 Key: SPARK-46410
 URL: https://issues.apache.org/jira/browse/SPARK-46410
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Serge Rielau


This is a follow up to SPARK-46393.
We should raise distinct error classes for the different kinds of invokers of 
JdbcUtils.classifyException



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46409) Spark Connect Repl does not work with ClosureCleaner

2023-12-14 Thread Vsevolod Stepanov (Jira)
Vsevolod Stepanov created SPARK-46409:
-

 Summary: Spark Connect Repl does not work with ClosureCleaner
 Key: SPARK-46409
 URL: https://issues.apache.org/jira/browse/SPARK-46409
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Vsevolod Stepanov


SPARK-45136 added ClosureCleaner support to SparkConnect client. Unfortunately, 
this change breaks ConnectRepl launched by 
`./connector/connect/bin/spark-connect-scala-client`. To reproduce the issue:
 # Run `./connector/connect/bin/spark-connect-shell`
 # Run  `./connector/connect/bin/spark-connect-scala-client`
 # In the REPL, execute this code:
```
@ def plus1(x: Int): Int = x + 1
@ val plus1_udf = udf(plus1 _)
```

This will fail with the following error:

```

java.lang.reflect.InaccessibleObjectException: Unable to make private native 
java.lang.reflect.Field[] java.lang.Class.getDeclaredFields0(boolean) 
accessible: module java.base does not "opens java.lang" to unnamed module 
@45099dd3
  
java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
  
java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
  java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
  java.lang.reflect.Method.setAccessible(Method.java:193)
  
org.apache.spark.util.ClosureCleaner$.getFinalModifiersFieldForJava17(ClosureCleaner.scala:577)
  
org.apache.spark.util.ClosureCleaner$.setFieldAndIgnoreModifiers(ClosureCleaner.scala:560)
  
org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$18(ClosureCleaner.scala:533)
  
org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$18$adapted(ClosureCleaner.scala:525)
  scala.collection.ArrayOps$WithFilter.foreach(ArrayOps.scala:73)
  
org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$16(ClosureCleaner.scala:525)
  
org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$16$adapted(ClosureCleaner.scala:522)
  scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
  scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
  scala.collection.AbstractIterable.foreach(Iterable.scala:933)
  scala.collection.IterableOps$WithFilter.foreach(Iterable.scala:903)
  
org.apache.spark.util.ClosureCleaner$.cleanupAmmoniteReplClosure(ClosureCleaner.scala:522)
  org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:251)
  
org.apache.spark.sql.expressions.SparkConnectClosureCleaner$.clean(UserDefinedFunction.scala:210)
  
org.apache.spark.sql.expressions.ScalarUserDefinedFunction$.apply(UserDefinedFunction.scala:187)
  
org.apache.spark.sql.expressions.ScalarUserDefinedFunction$.apply(UserDefinedFunction.scala:180)
  org.apache.spark.sql.functions$.udf(functions.scala:7956)
  ammonite.$sess.cmd1$Helper.(cmd1.sc:1)
  ammonite.$sess.cmd1$.(cmd1.sc:7)

```

 

This is because ClosureCleaner is heavily reliant on using reflection API and 
is not compatible with Java 17. The rest of Spark bypasses this by adding 
`--add-opens` JVM flags, see https://issues.apache.org/jira/browse/SPARK-36796. 
We need to add these options to Spark Connect Client launch script as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46408) support date_sub on V2ExpressionBuilder

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46408:
---
Labels: pull-request-available  (was: )

> support date_sub on V2ExpressionBuilder
> ---
>
> Key: SPARK-46408
> URL: https://issues.apache.org/jira/browse/SPARK-46408
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Caican Cai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> V2ExpressionBuilder currently does not support date_sub, which will affect 
> the filter pushdown of date_sub in logical plan optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46408) support date_sub on V2ExpressionBuilder

2023-12-14 Thread Caican Cai (Jira)
Caican Cai created SPARK-46408:
--

 Summary: support date_sub on V2ExpressionBuilder
 Key: SPARK-46408
 URL: https://issues.apache.org/jira/browse/SPARK-46408
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Caican Cai
 Fix For: 4.0.0


V2ExpressionBuilder currently does not support date_sub, which will affect the 
filter pushdown of date_sub in logical plan optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46406) Assign a name to the error class _LEGACY_ERROR_TEMP_1023

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46406:
---
Labels: pull-request-available  (was: )

> Assign a name to the error class _LEGACY_ERROR_TEMP_1023
> 
>
> Key: SPARK-46406
> URL: https://issues.apache.org/jira/browse/SPARK-46406
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46407) Reorganize `OpsOnDiffFramesDisabledTests`

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46407:
---
Labels: pull-request-available  (was: )

> Reorganize `OpsOnDiffFramesDisabledTests`
> -
>
> Key: SPARK-46407
> URL: https://issues.apache.org/jira/browse/SPARK-46407
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46407) Reorganize `OpsOnDiffFramesDisabledTests`

2023-12-14 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46407:
-

 Summary: Reorganize `OpsOnDiffFramesDisabledTests`
 Key: SPARK-46407
 URL: https://issues.apache.org/jira/browse/SPARK-46407
 Project: Spark
  Issue Type: Test
  Components: PS, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46406) Assign a name to the error class _LEGACY_ERROR_TEMP_1023

2023-12-14 Thread Jiaan Geng (Jira)
Jiaan Geng created SPARK-46406:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_1023
 Key: SPARK-46406
 URL: https://issues.apache.org/jira/browse/SPARK-46406
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Jiaan Geng
Assignee: Jiaan Geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)

2023-12-14 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng resolved SPARK-45796.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44184
[https://github.com/apache/spark/pull/44184]

> Support MODE() WITHIN GROUP (ORDER BY col) 
> ---
>
> Key: SPARK-45796
> URL: https://issues.apache.org/jira/browse/SPARK-45796
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Many mainstream databases supports the syntax show below.
> { MODE() WITHIN GROUP (ORDER BY sortSpecification) }
> [FILTER (WHERE expression)] [OVER windowNameOrSpecification]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46391) Reorganize `ExpandingParityTests`

2023-12-14 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-46391:
-

Assignee: Ruifeng Zheng

> Reorganize `ExpandingParityTests`
> -
>
> Key: SPARK-46391
> URL: https://issues.apache.org/jira/browse/SPARK-46391
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46391) Reorganize `ExpandingParityTests`

2023-12-14 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-46391.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44332
[https://github.com/apache/spark/pull/44332]

> Reorganize `ExpandingParityTests`
> -
>
> Key: SPARK-46391
> URL: https://issues.apache.org/jira/browse/SPARK-46391
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28386) Cannot resolve ORDER BY columns with GROUP BY and HAVING

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-28386:
---
Labels: pull-request-available  (was: )

> Cannot resolve ORDER BY columns with GROUP BY and HAVING
> 
>
> Key: SPARK-28386
> URL: https://issues.apache.org/jira/browse/SPARK-28386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>
> How to reproduce:
> {code:sql}
> CREATE TABLE test_having (a int, b int, c string, d string) USING parquet;
> INSERT INTO test_having VALUES (0, 1, '', 'A');
> INSERT INTO test_having VALUES (1, 2, '', 'b');
> INSERT INTO test_having VALUES (2, 2, '', 'c');
> INSERT INTO test_having VALUES (3, 3, '', 'D');
> INSERT INTO test_having VALUES (4, 3, '', 'e');
> INSERT INTO test_having VALUES (5, 3, '', 'F');
> INSERT INTO test_having VALUES (6, 4, '', 'g');
> INSERT INTO test_having VALUES (7, 4, '', 'h');
> INSERT INTO test_having VALUES (8, 4, '', 'I');
> INSERT INTO test_having VALUES (9, 4, '', 'j');
> SELECT lower(c), count(c) FROM test_having
>   GROUP BY lower(c) HAVING count(*) > 2
>   ORDER BY lower(c);
> {code}
> {noformat}
> spark-sql> SELECT lower(c), count(c) FROM test_having
>  > GROUP BY lower(c) HAVING count(*) > 2
>  > ORDER BY lower(c);
> Error in query: cannot resolve '`c`' given input columns: [lower(c), 
> count(c)]; line 3 pos 19;
> 'Sort ['lower('c) ASC NULLS FIRST], true
> +- Project [lower(c)#158, count(c)#159L]
>+- Filter (count(1)#161L > cast(2 as bigint))
>   +- Aggregate [lower(c#7)], [lower(c#7) AS lower(c)#158, count(c#7) AS 
> count(c)#159L, count(1) AS count(1)#161L]
>  +- SubqueryAlias test_having
> +- Relation[a#5,b#6,c#7,d#8] parquet
> {noformat}
> But it works when setting an alias:
> {noformat}
> spark-sql> SELECT lower(c) withAias, count(c) FROM test_having
>  > GROUP BY lower(c) HAVING count(*) > 2
>  > ORDER BY withAias;
> 3
>   4
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46405) Issue with CSV schema inference and malformed records

2023-12-14 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-46405:
---

 Summary:  Issue with CSV schema inference and malformed records
 Key: SPARK-46405
 URL: https://issues.apache.org/jira/browse/SPARK-46405
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Yaohua Zhao


There appears to be a discrepancy in the behavior of schema inference in the 
CSV reader compared to JSON. When processing CSV files without a predefined 
schema, the mechanism to handle malformed records seems to be inconsistent. 
Unlike the JSON format, where a `_corrupt_record` column is automatically added 
in the presence of malformed records, the CSV format does not exhibit this 
behavior. This inconsistency can lead to unexpected results and data loss 
during processing.

*Steps to Reproduce:*
 # Create a CSV file with malformed records without providing a schema.
 # Observe that the `_corrupt_record` column is not automatically added to the 
final dataframe.

*Expected Result:* The `_corrupt_record` column should be automatically added 
to the final dataframe when processing a CSV file with malformed records, 
similar to the behavior observed with JSON files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45796:
--

Assignee: Jiaan Geng  (was: Apache Spark)

> Support MODE() WITHIN GROUP (ORDER BY col) 
> ---
>
> Key: SPARK-45796
> URL: https://issues.apache.org/jira/browse/SPARK-45796
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Many mainstream databases supports the syntax show below.
> { MODE() WITHIN GROUP (ORDER BY sortSpecification) }
> [FILTER (WHERE expression)] [OVER windowNameOrSpecification]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45796:
--

Assignee: Apache Spark  (was: Jiaan Geng)

> Support MODE() WITHIN GROUP (ORDER BY col) 
> ---
>
> Key: SPARK-45796
> URL: https://issues.apache.org/jira/browse/SPARK-45796
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Many mainstream databases supports the syntax show below.
> { MODE() WITHIN GROUP (ORDER BY sortSpecification) }
> [FILTER (WHERE expression)] [OVER windowNameOrSpecification]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46401) Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`

2023-12-14 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-46401:


Assignee: Yang Jie

> Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`
> --
>
> Key: SPARK-46401
> URL: https://issues.apache.org/jira/browse/SPARK-46401
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46401) Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`

2023-12-14 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-46401.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44347
[https://github.com/apache/spark/pull/44347]

> Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`
> --
>
> Key: SPARK-46401
> URL: https://issues.apache.org/jira/browse/SPARK-46401
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45796:
--

Assignee: Apache Spark  (was: Jiaan Geng)

> Support MODE() WITHIN GROUP (ORDER BY col) 
> ---
>
> Key: SPARK-45796
> URL: https://issues.apache.org/jira/browse/SPARK-45796
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Many mainstream databases supports the syntax show below.
> { MODE() WITHIN GROUP (ORDER BY sortSpecification) }
> [FILTER (WHERE expression)] [OVER windowNameOrSpecification]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js

2023-12-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796633#comment-17796633
 ] 

ASF GitHub Bot commented on SPARK-46404:


User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/44346

> Add structured-streaming-page.test.js to test structured-streaming-page.js
> --
>
> Key: SPARK-46404
> URL: https://issues.apache.org/jira/browse/SPARK-46404
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming, UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46404:
--

Assignee: Apache Spark

> Add structured-streaming-page.test.js to test structured-streaming-page.js
> --
>
> Key: SPARK-46404
> URL: https://issues.apache.org/jira/browse/SPARK-46404
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming, UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45796:
--

Assignee: Jiaan Geng  (was: Apache Spark)

> Support MODE() WITHIN GROUP (ORDER BY col) 
> ---
>
> Key: SPARK-45796
> URL: https://issues.apache.org/jira/browse/SPARK-45796
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Many mainstream databases supports the syntax show below.
> { MODE() WITHIN GROUP (ORDER BY sortSpecification) }
> [FILTER (WHERE expression)] [OVER windowNameOrSpecification]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46384) Structured Streaming UI doesn't display graph correctly

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46384:
--

Assignee: Apache Spark

> Structured Streaming UI doesn't display graph correctly
> ---
>
> Key: SPARK-46384
> URL: https://issues.apache.org/jira/browse/SPARK-46384
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> The Streaming UI is broken currently at spark master. Running a simple query:
> ```
> q = 
> spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start()
> ```
> Would make the spark UI shows empty graph for "operation duration":
> !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8!
> Here is the error:
> !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20!
>  
> I verified the same query runs fine on spark 3.5, as in the following graph.
> !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI!
>  
> This should be a problem from the library updates, this could be a potential 
> source of error: [https://github.com/apache/spark/pull/42879]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46401) Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46401:
--

Assignee: Apache Spark

> Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`
> --
>
> Key: SPARK-46401
> URL: https://issues.apache.org/jira/browse/SPARK-46401
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46384) Structured Streaming UI doesn't display graph correctly

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46384:
--

Assignee: (was: Apache Spark)

> Structured Streaming UI doesn't display graph correctly
> ---
>
> Key: SPARK-46384
> URL: https://issues.apache.org/jira/browse/SPARK-46384
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>
> The Streaming UI is broken currently at spark master. Running a simple query:
> ```
> q = 
> spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start()
> ```
> Would make the spark UI shows empty graph for "operation duration":
> !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8!
> Here is the error:
> !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20!
>  
> I verified the same query runs fine on spark 3.5, as in the following graph.
> !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI!
>  
> This should be a problem from the library updates, this could be a potential 
> source of error: [https://github.com/apache/spark/pull/42879]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46404:
--

Assignee: (was: Apache Spark)

> Add structured-streaming-page.test.js to test structured-streaming-page.js
> --
>
> Key: SPARK-46404
> URL: https://issues.apache.org/jira/browse/SPARK-46404
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming, UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46401) Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46401:
--

Assignee: (was: Apache Spark)

> Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`
> --
>
> Key: SPARK-46401
> URL: https://issues.apache.org/jira/browse/SPARK-46401
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46404:
--

Assignee: Apache Spark

> Add structured-streaming-page.test.js to test structured-streaming-page.js
> --
>
> Key: SPARK-46404
> URL: https://issues.apache.org/jira/browse/SPARK-46404
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming, UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46384) Structured Streaming UI doesn't display graph correctly

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46384:
--

Assignee: Apache Spark

> Structured Streaming UI doesn't display graph correctly
> ---
>
> Key: SPARK-46384
> URL: https://issues.apache.org/jira/browse/SPARK-46384
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> The Streaming UI is broken currently at spark master. Running a simple query:
> ```
> q = 
> spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start()
> ```
> Would make the spark UI shows empty graph for "operation duration":
> !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8!
> Here is the error:
> !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20!
>  
> I verified the same query runs fine on spark 3.5, as in the following graph.
> !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI!
>  
> This should be a problem from the library updates, this could be a potential 
> source of error: [https://github.com/apache/spark/pull/42879]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46384) Structured Streaming UI doesn't display graph correctly

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46384:
--

Assignee: (was: Apache Spark)

> Structured Streaming UI doesn't display graph correctly
> ---
>
> Key: SPARK-46384
> URL: https://issues.apache.org/jira/browse/SPARK-46384
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>
> The Streaming UI is broken currently at spark master. Running a simple query:
> ```
> q = 
> spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start()
> ```
> Would make the spark UI shows empty graph for "operation duration":
> !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8!
> Here is the error:
> !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20!
>  
> I verified the same query runs fine on spark 3.5, as in the following graph.
> !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI!
>  
> This should be a problem from the library updates, this could be a potential 
> source of error: [https://github.com/apache/spark/pull/42879]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46400) When there are corrupted files in the local maven repo, retry to skip this cache

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46400:
--

Assignee: Apache Spark

> When there are corrupted files in the local maven repo, retry to skip this 
> cache
> 
>
> Key: SPARK-46400
> URL: https://issues.apache.org/jira/browse/SPARK-46400
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46400) When there are corrupted files in the local maven repo, retry to skip this cache

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46400:
--

Assignee: (was: Apache Spark)

> When there are corrupted files in the local maven repo, retry to skip this 
> cache
> 
>
> Key: SPARK-46400
> URL: https://issues.apache.org/jira/browse/SPARK-46400
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45959:
--

Assignee: (was: Apache Spark)

> Abusing DataSet.withColumn can cause huge tree with severe perf degradation
> ---
>
> Key: SPARK-45959
> URL: https://issues.apache.org/jira/browse/SPARK-45959
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Asif
>Priority: Minor
>  Labels: pull-request-available
>
> Though documentation clearly recommends to add all columns in a single shot, 
> but in reality is difficult to expect customer to modify their code, as in 
> spark2  the rules in analyzer were such that  they did not do deep tree 
> traversal.  Moreover in Spark3 , the plans are cloned before giving to 
> analyzer , optimizer etc which was not the case in Spark2.
> All these things have resulted in query time being increased from 5 min to 2 
> - 3 hrs.
> Many times the columns are added to plan via some for loop logic which just 
> keeps adding new computation based on some rule.
> So,  my suggestion is to do some intial check in the withColumn api, before 
> creating a new projection, like if all the existing columns are still being 
> projected, and the new column being added has an expression which is not 
> depending on the output of the top node , but its child,  then instead of 
> adding a new project, the column can be added to the existing node.
> For starts, may be we can just handle Project node ..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45959:
--

Assignee: Apache Spark

> Abusing DataSet.withColumn can cause huge tree with severe perf degradation
> ---
>
> Key: SPARK-45959
> URL: https://issues.apache.org/jira/browse/SPARK-45959
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Asif
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>
> Though documentation clearly recommends to add all columns in a single shot, 
> but in reality is difficult to expect customer to modify their code, as in 
> spark2  the rules in analyzer were such that  they did not do deep tree 
> traversal.  Moreover in Spark3 , the plans are cloned before giving to 
> analyzer , optimizer etc which was not the case in Spark2.
> All these things have resulted in query time being increased from 5 min to 2 
> - 3 hrs.
> Many times the columns are added to plan via some for loop logic which just 
> keeps adding new computation based on some rule.
> So,  my suggestion is to do some intial check in the withColumn api, before 
> creating a new projection, like if all the existing columns are still being 
> projected, and the new column being added has an expression which is not 
> depending on the output of the top node , but its child,  then instead of 
> adding a new project, the column can be added to the existing node.
> For starts, may be we can just handle Project node ..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45959:
--

Assignee: Apache Spark

> Abusing DataSet.withColumn can cause huge tree with severe perf degradation
> ---
>
> Key: SPARK-45959
> URL: https://issues.apache.org/jira/browse/SPARK-45959
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Asif
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>
> Though documentation clearly recommends to add all columns in a single shot, 
> but in reality is difficult to expect customer to modify their code, as in 
> spark2  the rules in analyzer were such that  they did not do deep tree 
> traversal.  Moreover in Spark3 , the plans are cloned before giving to 
> analyzer , optimizer etc which was not the case in Spark2.
> All these things have resulted in query time being increased from 5 min to 2 
> - 3 hrs.
> Many times the columns are added to plan via some for loop logic which just 
> keeps adding new computation based on some rule.
> So,  my suggestion is to do some intial check in the withColumn api, before 
> creating a new projection, like if all the existing columns are still being 
> projected, and the new column being added has an expression which is not 
> depending on the output of the top node , but its child,  then instead of 
> adding a new project, the column can be added to the existing node.
> For starts, may be we can just handle Project node ..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45959:
--

Assignee: (was: Apache Spark)

> Abusing DataSet.withColumn can cause huge tree with severe perf degradation
> ---
>
> Key: SPARK-45959
> URL: https://issues.apache.org/jira/browse/SPARK-45959
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Asif
>Priority: Minor
>  Labels: pull-request-available
>
> Though documentation clearly recommends to add all columns in a single shot, 
> but in reality is difficult to expect customer to modify their code, as in 
> spark2  the rules in analyzer were such that  they did not do deep tree 
> traversal.  Moreover in Spark3 , the plans are cloned before giving to 
> analyzer , optimizer etc which was not the case in Spark2.
> All these things have resulted in query time being increased from 5 min to 2 
> - 3 hrs.
> Many times the columns are added to plan via some for loop logic which just 
> keeps adding new computation based on some rule.
> So,  my suggestion is to do some intial check in the withColumn api, before 
> creating a new projection, like if all the existing columns are still being 
> projected, and the new column being added has an expression which is not 
> depending on the output of the top node , but its child,  then instead of 
> adding a new project, the column can be added to the existing node.
> For starts, may be we can just handle Project node ..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js

2023-12-14 Thread Kent Yao (Jira)
Kent Yao created SPARK-46404:


 Summary: Add structured-streaming-page.test.js to test 
structured-streaming-page.js
 Key: SPARK-46404
 URL: https://issues.apache.org/jira/browse/SPARK-46404
 Project: Spark
  Issue Type: Sub-task
  Components: Structured Streaming, UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46403) Decode parquet binary with getBytesUnsafe method

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46403:
---
Labels: pull-request-available  (was: )

> Decode parquet binary with getBytesUnsafe method
> 
>
> Key: SPARK-46403
> URL: https://issues.apache.org/jira/browse/SPARK-46403
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-12-14-16-30-39-104.png
>
>
> Now spark will get a parquet binary object with getBytes() method.
> The *Binary.getBytes()* method will always make a new copy of the internal 
> bytes.
> We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has 
> already been called getBytes() and has the cached bytes.
> !image-2023-12-14-16-30-39-104.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46403) Decode parquet binary with getBytesUnsafe method

2023-12-14 Thread Wan Kun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Kun updated SPARK-46403:

Description: 
Now spark will get a parquet binary object with getBytes() method.

The *Binary.getBytes()* method will always make a new copy of the internal 
bytes.

We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has 
already been called getBytes() and has the cached bytes.

!image-2023-12-14-16-30-39-104.png!

  was:
Now spark will get a parquet binary object with getBytes() method.

The *Binary.getBytes()* method will always make a new copy of the internal 
bytes.

We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has 
already been called getBytes() and has the cached bytes.

!image-2023-12-14-16-28-04-797.png!


> Decode parquet binary with getBytesUnsafe method
> 
>
> Key: SPARK-46403
> URL: https://issues.apache.org/jira/browse/SPARK-46403
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
> Attachments: image-2023-12-14-16-30-39-104.png
>
>
> Now spark will get a parquet binary object with getBytes() method.
> The *Binary.getBytes()* method will always make a new copy of the internal 
> bytes.
> We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has 
> already been called getBytes() and has the cached bytes.
> !image-2023-12-14-16-30-39-104.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46403) Decode parquet binary with getBytesUnsafe method

2023-12-14 Thread Wan Kun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Kun updated SPARK-46403:

Description: 
Now spark will get a parquet binary object with getBytes() method.

The *Binary.getBytes()* method will always make a new copy of the internal 
bytes.

We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has 
already been called getBytes() and has the cached bytes.

!image-2023-12-14-16-28-04-797.png!

  was:
Now spark will get a parquet binary dictionary object with getBytes() method.

The *Binary.getBytes()* method will always make a new copy of the internal 
bytes.

We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has 
already been called getBytes() and has the cached bytes.

!image-2023-12-14-16-28-04-797.png!


> Decode parquet binary with getBytesUnsafe method
> 
>
> Key: SPARK-46403
> URL: https://issues.apache.org/jira/browse/SPARK-46403
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
> Attachments: image-2023-12-14-16-30-39-104.png
>
>
> Now spark will get a parquet binary object with getBytes() method.
> The *Binary.getBytes()* method will always make a new copy of the internal 
> bytes.
> We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has 
> already been called getBytes() and has the cached bytes.
> !image-2023-12-14-16-28-04-797.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46403) Decode parquet binary with getBytesUnsafe method

2023-12-14 Thread Wan Kun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Kun updated SPARK-46403:

Attachment: image-2023-12-14-16-30-39-104.png

> Decode parquet binary with getBytesUnsafe method
> 
>
> Key: SPARK-46403
> URL: https://issues.apache.org/jira/browse/SPARK-46403
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
> Attachments: image-2023-12-14-16-30-39-104.png
>
>
> Now spark will get a parquet binary object with getBytes() method.
> The *Binary.getBytes()* method will always make a new copy of the internal 
> bytes.
> We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has 
> already been called getBytes() and has the cached bytes.
> !image-2023-12-14-16-28-04-797.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46403) Decode parquet binary with getBytesUnsafe method

2023-12-14 Thread Wan Kun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Kun updated SPARK-46403:

Summary: Decode parquet binary with getBytesUnsafe method  (was: Decode 
parquet binary dictionary with getBytesUnsafe method)

> Decode parquet binary with getBytesUnsafe method
> 
>
> Key: SPARK-46403
> URL: https://issues.apache.org/jira/browse/SPARK-46403
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
>
> Now spark will get a parquet binary dictionary object with getBytes() method.
> The *Binary.getBytes()* method will always make a new copy of the internal 
> bytes.
> We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has 
> already been called getBytes() and has the cached bytes.
> !image-2023-12-14-16-28-04-797.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46403) Decode parquet binary dictionary with getBytesUnsafe method

2023-12-14 Thread Wan Kun (Jira)
Wan Kun created SPARK-46403:
---

 Summary: Decode parquet binary dictionary with getBytesUnsafe 
method
 Key: SPARK-46403
 URL: https://issues.apache.org/jira/browse/SPARK-46403
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wan Kun


Now spark will get a parquet binary dictionary object with getBytes() method.

The *Binary.getBytes()* method will always make a new copy of the internal 
bytes.

We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has 
already been called getBytes() and has the cached bytes.

!image-2023-12-14-16-28-04-797.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46396) LegacyFastTimestampFormatter.parseOptional should not throw exception

2023-12-14 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-46396.

Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 44338
[https://github.com/apache/spark/pull/44338]

> LegacyFastTimestampFormatter.parseOptional should not throw exception
> -
>
> Key: SPARK-46396
> URL: https://issues.apache.org/jira/browse/SPARK-46396
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0
>
>
> When setting spark.sql.legacy.timeParserPolicy=LEGACY, Spark will use the 
> LegacyFastTimestampFormatter to infer potential timestamp columns. The 
> inference shouldn't throw exception.
> However, when the input is 23012150952, there is exception:
> ```
> For input string: "23012150952"
> java.lang.NumberFormatException: For input string: "23012150952"
> at 
> java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
> at java.base/java.lang.Integer.parseInt(Integer.java:668)
> at java.base/java.lang.Integer.parseInt(Integer.java:786)
> at 
> org.apache.commons.lang3.time.FastDateParser$NumberStrategy.parse(FastDateParser.java:304)
> at 
> org.apache.commons.lang3.time.FastDateParser.parse(FastDateParser.java:1045)
> at org.apache.commons.lang3.time.FastDateFormat.parse(FastDateFormat.java:651)
> at 
> org.apache.spark.sql.catalyst.util.LegacyFastTimestampFormatter.parseOptional(TimestampFormatter.scala:418)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46393) Classify exceptions in the JDBC table catalog

2023-12-14 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-46393.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44335
[https://github.com/apache/spark/pull/44335]

> Classify exceptions in the JDBC table catalog
> -
>
> Key: SPARK-46393
> URL: https://issues.apache.org/jira/browse/SPARK-46393
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Handle exceptions from JDBC drivers and convert them to AnalysisException 
> with error classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org