[jira] [Updated] (SPARK-46418) Reorganize `ReshapeTests`
[ https://issues.apache.org/jira/browse/SPARK-46418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46418: --- Labels: pull-request-available (was: ) > Reorganize `ReshapeTests` > - > > Key: SPARK-46418 > URL: https://issues.apache.org/jira/browse/SPARK-46418 > Project: Spark > Issue Type: Test > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46418) Reorganize `ReshapeTests`
Ruifeng Zheng created SPARK-46418: - Summary: Reorganize `ReshapeTests` Key: SPARK-46418 URL: https://issues.apache.org/jira/browse/SPARK-46418 Project: Spark Issue Type: Test Components: PS, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46415) Creating partitions through jdbc connection to beeline is slow
[ https://issues.apache.org/jira/browse/SPARK-46415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xichenglin updated SPARK-46415: --- Description: Use jdbc to connect to spark beeline through the connection pool to perform partition creation operations. When the number of connections exceeds 4, the speed of creating partitions will be very slow, and the execution time of each SQL is 4s-10s. Spark 2.x does not have this problem, and the execution time of each SQL is within 1 second. (was: Use jdbc to connect to spark beeline through the connection pool to create partitions. When the number of connections exceeds 4, spark 3. There is no such problem. The execution time of each SQL statement is within 1 second.) > Creating partitions through jdbc connection to beeline is slow > -- > > Key: SPARK-46415 > URL: https://issues.apache.org/jira/browse/SPARK-46415 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.1.3 >Reporter: xichenglin >Priority: Major > > Use jdbc to connect to spark beeline through the connection pool to perform > partition creation operations. When the number of connections exceeds 4, the > speed of creating partitions will be very slow, and the execution time of > each SQL is 4s-10s. Spark 2.x does not have this problem, and the execution > time of each SQL is within 1 second. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46417) do not fail when calling getTable and throwException is false
[ https://issues.apache.org/jira/browse/SPARK-46417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46417: --- Labels: pull-request-available (was: ) > do not fail when calling getTable and throwException is false > - > > Key: SPARK-46417 > URL: https://issues.apache.org/jira/browse/SPARK-46417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46417) do not fail when calling getTable and throwException is false
[ https://issues.apache.org/jira/browse/SPARK-46417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-46417: Issue Type: Bug (was: Improvement) > do not fail when calling getTable and throwException is false > - > > Key: SPARK-46417 > URL: https://issues.apache.org/jira/browse/SPARK-46417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46417) do not fail when calling getTable and throwException is false
Wenchen Fan created SPARK-46417: --- Summary: do not fail when calling getTable and throwException is false Key: SPARK-46417 URL: https://issues.apache.org/jira/browse/SPARK-46417 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46402) Add getMessageParameters and getQueryContext support
[ https://issues.apache.org/jira/browse/SPARK-46402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46402. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44349 [https://github.com/apache/spark/pull/44349] > Add getMessageParameters and getQueryContext support > > > Key: SPARK-46402 > URL: https://issues.apache.org/jira/browse/SPARK-46402 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46402) Add getMessageParameters and getQueryContext support
[ https://issues.apache.org/jira/browse/SPARK-46402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46402: Assignee: Hyukjin Kwon > Add getMessageParameters and getQueryContext support > > > Key: SPARK-46402 > URL: https://issues.apache.org/jira/browse/SPARK-46402 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46416) Add @tailrec to HadoopFSUtils#shouldFilterOutPath
[ https://issues.apache.org/jira/browse/SPARK-46416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-46416. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44345 [https://github.com/apache/spark/pull/44345] > Add @tailrec to HadoopFSUtils#shouldFilterOutPath > - > > Key: SPARK-46416 > URL: https://issues.apache.org/jira/browse/SPARK-46416 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46416) Add @tailrec to HadoopFSUtils#shouldFilterOutPath
[ https://issues.apache.org/jira/browse/SPARK-46416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46416: --- Labels: pull-request-available (was: ) > Add @tailrec to HadoopFSUtils#shouldFilterOutPath > - > > Key: SPARK-46416 > URL: https://issues.apache.org/jira/browse/SPARK-46416 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46416) Add @tailrec to HadoopFSUtils#shouldFilterOutPath
[ https://issues.apache.org/jira/browse/SPARK-46416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-46416: Assignee: Yang Jie > Add @tailrec to HadoopFSUtils#shouldFilterOutPath > - > > Key: SPARK-46416 > URL: https://issues.apache.org/jira/browse/SPARK-46416 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46416) Add @tailrec to HadoopFSUtils#shouldFilterOutPath
Yang Jie created SPARK-46416: Summary: Add @tailrec to HadoopFSUtils#shouldFilterOutPath Key: SPARK-46416 URL: https://issues.apache.org/jira/browse/SPARK-46416 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"
[ https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Le Bihan resolved SPARK-45311. --- Fix Version/s: 4.0.0 3.5.1 3.4.2 Resolution: Fixed Resolved through the resolution of linked issues > Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search > for an encoder for a generic type, and since 3.5.x isn't "an expression > encoder" > - > > Key: SPARK-45311 > URL: https://issues.apache.org/jira/browse/SPARK-45311 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0, 3.4.1, 3.5.0 > Environment: Debian 12 > Java 17 > Underlying Spring-Boot 2.7.14 >Reporter: Marc Le Bihan >Priority: Major > Fix For: 4.0.0, 3.5.1, 3.4.2 > > Attachments: JavaTypeInference_116.png, sparkIssue_02.png > > > If you find it convenient, you might clone the > [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that > does many operations around cities, local authorities and accounting with > open data) where I've extracted from my work what's necessary to make a set > of 35 tests that run correctly with Spark 3.3.x, and show the troubles > encountered with 3.4.x and 3.5.x. > > It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark > 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails > with two problems: > > *1)* It throws *java.util.NoSuchElementException: None.get* messages > everywhere. > Asking over the Internet, I wasn't alone facing this problem. Reading it, > you'll see that I've attempted a debug but my Scala skills are low. > [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0] > {color:#172b4d}by the way, if possible, the encoder and decoder functions > should forward a parameter as soon as the name of the field being handled is > known, and then all the long of their process, so that when the encoder is at > any point where it has to throw an exception, it knows the field it is > handling in its specific call and can send a message like:{color} > {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the > method or field it was targeting]_{color} > > *2)* *Not found an encoder of the type RS to Spark SQL internal > representation.* Consider to change the input type to one of supported at > (...) > Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal > representation (...) > > where *RS* and *OMI_ID* are generic types. > This is strange. > [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge] > > *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, > but another add itself to the list: > "{*}Only expression encoders are supported for now{*}" on what was accepted > and working before. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46415) Creating partitions through jdbc connection to beeline is slow
xichenglin created SPARK-46415: -- Summary: Creating partitions through jdbc connection to beeline is slow Key: SPARK-46415 URL: https://issues.apache.org/jira/browse/SPARK-46415 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.3, 3.0.0 Reporter: xichenglin Use jdbc to connect to spark beeline through the connection pool to create partitions. When the number of connections exceeds 4, spark 3. There is no such problem. The execution time of each SQL statement is within 1 second. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46414) Use prependBaseUri to render javascript imports
[ https://issues.apache.org/jira/browse/SPARK-46414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46414: --- Labels: pull-request-available (was: ) > Use prependBaseUri to render javascript imports > --- > > Key: SPARK-46414 > URL: https://issues.apache.org/jira/browse/SPARK-46414 > Project: Spark > Issue Type: Sub-task > Components: UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46414) Use prependBaseUri to render javascript imports
Kent Yao created SPARK-46414: Summary: Use prependBaseUri to render javascript imports Key: SPARK-46414 URL: https://issues.apache.org/jira/browse/SPARK-46414 Project: Spark Issue Type: Sub-task Components: UI Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796975#comment-17796975 ] melin commented on SPARK-43338: --- [~yao] databricks support change: spark.databricks.sql.initial.catalog.name https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If multiple Hive Metastores exist, the platform manages multiple hms metadata > and classifies them by catalogName. A different catalog name is required > [~fanjia] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js
[ https://issues.apache.org/jira/browse/SPARK-46404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-46404. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44346 [https://github.com/apache/spark/pull/44346] > Add structured-streaming-page.test.js to test structured-streaming-page.js > -- > > Key: SPARK-46404 > URL: https://issues.apache.org/jira/browse/SPARK-46404 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming, UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46384) Structured Streaming UI doesn't display graph correctly
[ https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-46384: Assignee: Kent Yao > Structured Streaming UI doesn't display graph correctly > --- > > Key: SPARK-46384 > URL: https://issues.apache.org/jira/browse/SPARK-46384 > Project: Spark > Issue Type: Task > Components: Structured Streaming, Web UI >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > > The Streaming UI is broken currently at spark master. Running a simple query: > ``` > q = > spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start() > ``` > Would make the spark UI shows empty graph for "operation duration": > !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8! > Here is the error: > !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20! > > I verified the same query runs fine on spark 3.5, as in the following graph. > !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI! > > This should be a problem from the library updates, this could be a potential > source of error: [https://github.com/apache/spark/pull/42879] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js
[ https://issues.apache.org/jira/browse/SPARK-46404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-46404: Assignee: Kent Yao > Add structured-streaming-page.test.js to test structured-streaming-page.js > -- > > Key: SPARK-46404 > URL: https://issues.apache.org/jira/browse/SPARK-46404 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming, UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46384) Structured Streaming UI doesn't display graph correctly
[ https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-46384. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44346 [https://github.com/apache/spark/pull/44346] > Structured Streaming UI doesn't display graph correctly > --- > > Key: SPARK-46384 > URL: https://issues.apache.org/jira/browse/SPARK-46384 > Project: Spark > Issue Type: Task > Components: Structured Streaming, Web UI >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The Streaming UI is broken currently at spark master. Running a simple query: > ``` > q = > spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start() > ``` > Would make the spark UI shows empty graph for "operation duration": > !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8! > Here is the error: > !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20! > > I verified the same query runs fine on spark 3.5, as in the following graph. > !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI! > > This should be a problem from the library updates, this could be a potential > source of error: [https://github.com/apache/spark/pull/42879] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46407) Reorganize `OpsOnDiffFramesDisabledTests`
[ https://issues.apache.org/jira/browse/SPARK-46407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-46407: - Assignee: Ruifeng Zheng > Reorganize `OpsOnDiffFramesDisabledTests` > - > > Key: SPARK-46407 > URL: https://issues.apache.org/jira/browse/SPARK-46407 > Project: Spark > Issue Type: Test > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46407) Reorganize `OpsOnDiffFramesDisabledTests`
[ https://issues.apache.org/jira/browse/SPARK-46407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-46407. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44354 [https://github.com/apache/spark/pull/44354] > Reorganize `OpsOnDiffFramesDisabledTests` > - > > Key: SPARK-46407 > URL: https://issues.apache.org/jira/browse/SPARK-46407 > Project: Spark > Issue Type: Test > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43149) When CTAS with USING fails to store metadata in metastore, data gets left around
[ https://issues.apache.org/jira/browse/SPARK-43149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43149: --- Labels: pull-request-available (was: ) > When CTAS with USING fails to store metadata in metastore, data gets left > around > > > Key: SPARK-43149 > URL: https://issues.apache.org/jira/browse/SPARK-43149 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Bruce Robbins >Priority: Major > Labels: pull-request-available > > For example: > {noformat} > drop table if exists parquet_ds1; > -- try creating table with invalid column name > -- use 'using parquet' to designate the data source > create table parquet_ds1 using parquet as > select id, date'2018-01-01' + make_dt_interval(0, id) > from range(0, 10); > Cannot create a table having a column whose name contains commas in Hive > metastore. Table: `spark_catalog`.`default`.`parquet_ds1`; Column: DATE > '2018-01-01' + make_dt_interval(0, id, 0, 0.00) > -- show that table did not get created > show tables; > -- try again with valid column name > -- spark will complain that directory already exists > create table parquet_ds1 using parquet as > select id, date'2018-01-01' + make_dt_interval(0, id) as ts > from range(0, 10); > [LOCATION_ALREADY_EXISTS] Cannot name the managed table as > `spark_catalog`.`default`.`parquet_ds1`, as its associated location > 'file:/Users/bruce/github/spark_upstream/spark-warehouse/parquet_ds1' already > exists. Please pick a different table name, or remove the existing location > first. > org.apache.spark.SparkRuntimeException: [LOCATION_ALREADY_EXISTS] Cannot name > the managed table as `spark_catalog`.`default`.`parquet_ds1`, as its > associated location > 'file:/Users/bruce/github/spark_upstream/spark-warehouse/parquet_ds1' already > exists. Please pick a different table name, or remove the existing location > first. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.locationAlreadyExists(QueryExecutionErrors.scala:2804) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateTableLocation(SessionCatalog.scala:414) > at > org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:176) > ... > {noformat} > One must manually remove the directory {{spark-warehouse/parquet_ds1}} before > the {{create table}} command will succeed. > It seems that datasource table creation runs the data-creation job first, > then stores the metadata into the metastore. > When using Spark to create Hive tables, the issue does not happen: > {noformat} > drop table if exists parquet_hive1; > -- try creating table with invalid column name, > -- but use 'stored as parquet' instead of 'using' > create table parquet_hive1 stored as parquet as > select id, date'2018-01-01' + make_dt_interval(0, id) > from range(0, 10); > Cannot create a table having a column whose name contains commas in Hive > metastore. Table: `spark_catalog`.`default`.`parquet_hive1`; Column: DATE > '2018-01-01' + make_dt_interval(0, id, 0, 0.00) > -- try again with valid column name. This will succeed; > create table parquet_hive1 stored as parquet as > select id, date'2018-01-01' + make_dt_interval(0, id) as ts > from range(0, 10); > {noformat} > It seems that Hive table creation stores metadata into the metastore first, > then runs the data-creation job. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36680) Supports Dynamic Table Options for Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-36680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-36680: --- Labels: pull-request-available (was: ) > Supports Dynamic Table Options for Spark SQL > > > Key: SPARK-36680 > URL: https://issues.apache.org/jira/browse/SPARK-36680 > Project: Spark > Issue Type: Wish > Components: SQL >Affects Versions: 3.1.2 >Reporter: wang-zhun >Priority: Major > Labels: pull-request-available > > Now a DataFrame API user can implement dynamic options through the > _DataFrameReader$option_ method, but Spark SQL users cannot use. > {code:java} > DataFrameReader/AstBuilder -> UnresolvedRelation$options -> > DataSourceV2Relation$options -> SupportsRead$newScanBuilder(options) > {code} > > The table options were persisted to the Catalog and if we want to modify > that, we should use another DDL like "_ALTER TABLE ..._". But there are some > cases that user want to modify the table options dynamically just in the > query: > * JDBCTable set _fetchsize_ according to the actual situation of the table > * IcebergTable support time travel > {code:java} > spark.read > .option("snapshot-id", 10963874102873L) > .format("iceberg") > .load("path/to/table"){code} > These parameters setting is very common and ad-hoc, setting them flexibly > would promote the user experience with Spark SQL especially for Now we > support catalog expansion. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46294) Clean up initValue vs zeroValue semantics in SQLMetrics
[ https://issues.apache.org/jira/browse/SPARK-46294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-46294. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44222 [https://github.com/apache/spark/pull/44222] > Clean up initValue vs zeroValue semantics in SQLMetrics > --- > > Key: SPARK-46294 > URL: https://issues.apache.org/jira/browse/SPARK-46294 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Davin Tjong >Assignee: Davin Tjong >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > The semantics of initValue and _zeroValue in SQLMetrics is a little bit > confusing, since they effectively mean the same thing. Changing it to the > following would be clearer, especially in terms of defining what an "invalid" > metric is. > > proposed definitions: > > initValue is the starting value for a SQLMetric. If a metric has value equal > to its initValue, then it should be filtered out before aggregating with > SQLMetrics.stringValue(). > > zeroValue defines the lowest value considered valid. If a SQLMetric is > invalid, it is set to zeroValue upon receiving any updates, and it also > reports zeroValue as its value to avoid exposing it to the user > programatically (concern previouosly addressed in SPARK-41442). > For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that > the metric is by default invalid. At the end of a task, we will update the > metric making it valid, and the invalid metrics will be filtered out when > calculating min, max, etc. as a workaround for SPARK-11013. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46294) Clean up initValue vs zeroValue semantics in SQLMetrics
[ https://issues.apache.org/jira/browse/SPARK-46294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-46294: --- Assignee: Davin Tjong > Clean up initValue vs zeroValue semantics in SQLMetrics > --- > > Key: SPARK-46294 > URL: https://issues.apache.org/jira/browse/SPARK-46294 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Davin Tjong >Assignee: Davin Tjong >Priority: Minor > Labels: pull-request-available > > The semantics of initValue and _zeroValue in SQLMetrics is a little bit > confusing, since they effectively mean the same thing. Changing it to the > following would be clearer, especially in terms of defining what an "invalid" > metric is. > > proposed definitions: > > initValue is the starting value for a SQLMetric. If a metric has value equal > to its initValue, then it should be filtered out before aggregating with > SQLMetrics.stringValue(). > > zeroValue defines the lowest value considered valid. If a SQLMetric is > invalid, it is set to zeroValue upon receiving any updates, and it also > reports zeroValue as its value to avoid exposing it to the user > programatically (concern previouosly addressed in SPARK-41442). > For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that > the metric is by default invalid. At the end of a task, we will update the > metric making it valid, and the invalid metrics will be filtered out when > calculating min, max, etc. as a workaround for SPARK-11013. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46386) Improve assertions of observation (pyspark.sql.observation)
[ https://issues.apache.org/jira/browse/SPARK-46386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46386: - Summary: Improve assertions of observation (pyspark.sql.observation) (was: Improve and test assertions of observation (pyspark.sql.observation)) > Improve assertions of observation (pyspark.sql.observation) > --- > > Key: SPARK-46386 > URL: https://issues.apache.org/jira/browse/SPARK-46386 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46386) Improve and test assertions of observation (pyspark.sql.observation)
[ https://issues.apache.org/jira/browse/SPARK-46386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46386: - Parent: (was: SPARK-46041) Issue Type: Improvement (was: Sub-task) > Improve and test assertions of observation (pyspark.sql.observation) > > > Key: SPARK-46386 > URL: https://issues.apache.org/jira/browse/SPARK-46386 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46413: --- Labels: pull-request-available (was: ) > Validate returnType of Arrow Python UDF > --- > > Key: SPARK-46413 > URL: https://issues.apache.org/jira/browse/SPARK-46413 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Validate returnType of Arrow Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46413: - Description: Validate returnType of Arrow Python UDF (was: Check returnType of Arrow Python UDF) > Validate returnType of Arrow Python UDF > --- > > Key: SPARK-46413 > URL: https://issues.apache.org/jira/browse/SPARK-46413 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > Validate returnType of Arrow Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46413: - Summary: Validate returnType of Arrow Python UDF (was: Check returnType of Arrow Python UDF) > Validate returnType of Arrow Python UDF > --- > > Key: SPARK-46413 > URL: https://issues.apache.org/jira/browse/SPARK-46413 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > Check returnType of Arrow Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46413) Check returnType of Arrow Python UDF
Xinrong Meng created SPARK-46413: Summary: Check returnType of Arrow Python UDF Key: SPARK-46413 URL: https://issues.apache.org/jira/browse/SPARK-46413 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Check returnType of Arrow Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46289) Exception when ordering by UDT in interpreted mode
[ https://issues.apache.org/jira/browse/SPARK-46289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46289: --- Labels: pull-request-available (was: ) > Exception when ordering by UDT in interpreted mode > -- > > Key: SPARK-46289 > URL: https://issues.apache.org/jira/browse/SPARK-46289 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.2, 3.5.0 >Reporter: Bruce Robbins >Priority: Minor > Labels: pull-request-available > > In interpreted mode, ordering by a UDT will result in an exception. For > example: > {noformat} > import org.apache.spark.ml.linalg.{DenseVector, Vector} > val df = Seq.tabulate(30) { x => > (x, x + 1, x + 2, new DenseVector(Array((x/100.0).toDouble, ((x + > 1)/100.0).toDouble, ((x + 3)/100.0).toDouble))) > }.toDF("id", "c1", "c2", "c3") > df.createOrReplaceTempView("df") > // this works > sql("select * from df order by c3").collect > sql("set spark.sql.codegen.wholeStage=false") > sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > // this gets an error > sql("select * from df order by c3").collect > {noformat} > The second {{collect}} action results in the following exception: > {noformat} > org.apache.spark.SparkIllegalArgumentException: Type > UninitializedPhysicalType does not support ordered operations. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.orderedOperationUnsupportedByDataTypeError(QueryExecutionErrors.scala:348) > at > org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:332) > at > org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:329) > at > org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:60) > at > org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:39) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter$RowComparator.compare(UnsafeExternalRowSorter.java:254) > {noformat} > Note: You don't get an error if you use {{show}} rather than {{collect}}. > This is because {{show}} will implicitly add a {{limit}}, in which case the > ordering is performed by {{TakeOrderedAndProject}} rather than > {{UnsafeExternalRowSorter}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46412) Update resource-managers/kubernetes/integration-tests/README.md to java 17 and 21
Bjørn Jørgensen created SPARK-46412: --- Summary: Update resource-managers/kubernetes/integration-tests/README.md to java 17 and 21 Key: SPARK-46412 URL: https://issues.apache.org/jira/browse/SPARK-46412 Project: Spark Issue Type: Documentation Components: k8s Affects Versions: 4.0.0 Reporter: Bjørn Jørgensen In the file resource-managers/kubernetes/integration-tests/README.md change java 8 to 17 and 11 to 21 -Dspark.kubernetes.test.sparkTgz=spark-3.0.0-SNAPSHOT-bin-example.tgz \ to 4 OpenJDK to azul/zulu-openjdk -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46409) Spark Connect Repl does not work with ClosureCleaner
[ https://issues.apache.org/jira/browse/SPARK-46409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46409: --- Labels: pull-request-available (was: ) > Spark Connect Repl does not work with ClosureCleaner > > > Key: SPARK-46409 > URL: https://issues.apache.org/jira/browse/SPARK-46409 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Vsevolod Stepanov >Priority: Major > Labels: pull-request-available > > SPARK-45136 added ClosureCleaner support to SparkConnect client. > Unfortunately, this change breaks ConnectRepl launched by > `./connector/connect/bin/spark-connect-scala-client`. To reproduce the issue: > # Run `./connector/connect/bin/spark-connect-shell` > # Run `./connector/connect/bin/spark-connect-scala-client` > # In the REPL, execute this code: > ``` > @ def plus1(x: Int): Int = x + 1 > @ val plus1_udf = udf(plus1 _) > ``` > This will fail with the following error: > ``` > java.lang.reflect.InaccessibleObjectException: Unable to make private native > java.lang.reflect.Field[] java.lang.Class.getDeclaredFields0(boolean) > accessible: module java.base does not "opens java.lang" to unnamed module > @45099dd3 > > java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354) > > java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297) > java.lang.reflect.Method.checkCanSetAccessible(Method.java:199) > java.lang.reflect.Method.setAccessible(Method.java:193) > > org.apache.spark.util.ClosureCleaner$.getFinalModifiersFieldForJava17(ClosureCleaner.scala:577) > > org.apache.spark.util.ClosureCleaner$.setFieldAndIgnoreModifiers(ClosureCleaner.scala:560) > > org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$18(ClosureCleaner.scala:533) > > org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$18$adapted(ClosureCleaner.scala:525) > scala.collection.ArrayOps$WithFilter.foreach(ArrayOps.scala:73) > > org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$16(ClosureCleaner.scala:525) > > org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$16$adapted(ClosureCleaner.scala:522) > scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576) > scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574) > scala.collection.AbstractIterable.foreach(Iterable.scala:933) > scala.collection.IterableOps$WithFilter.foreach(Iterable.scala:903) > > org.apache.spark.util.ClosureCleaner$.cleanupAmmoniteReplClosure(ClosureCleaner.scala:522) > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:251) > > org.apache.spark.sql.expressions.SparkConnectClosureCleaner$.clean(UserDefinedFunction.scala:210) > > org.apache.spark.sql.expressions.ScalarUserDefinedFunction$.apply(UserDefinedFunction.scala:187) > > org.apache.spark.sql.expressions.ScalarUserDefinedFunction$.apply(UserDefinedFunction.scala:180) > org.apache.spark.sql.functions$.udf(functions.scala:7956) > ammonite.$sess.cmd1$Helper.(cmd1.sc:1) > ammonite.$sess.cmd1$.(cmd1.sc:7) > ``` > > This is because ClosureCleaner is heavily reliant on using reflection API and > is not compatible with Java 17. The rest of Spark bypasses this by adding > `--add-opens` JVM flags, see > https://issues.apache.org/jira/browse/SPARK-36796. We need to add these > options to Spark Connect Client launch script as well -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
[ https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Otto updated SPARK-23890: Shepherd: Max Gekk Affects Version/s: 3.0.0 Description: As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE CHANGE COLUMN commands to Hive. This restriction was loosened in [https://github.com/apache/spark/pull/12714] to allow for those commands if they only change the column comment. Wikimedia has been evolving Parquet backed Hive tables with data originally from JSON events by adding newly found columns to the Hive table schema, via a Spark job we call 'Refine'. We do this by recursively merging an input DataFrame schema with a Hive table DataFrame schema, finding new fields, and then issuing an ALTER TABLE statement to add the columns. However, because we allow for nested data types in the incoming JSON data, we make extensive use of struct type fields. In order to add newly detected fields in a nested data type, we must alter the struct column and append the nested struct field. This requires CHANGE COLUMN that alters the column type. In reality, the 'type' of the column is not changing, it just just a new field being added to the struct, but to SQL, this looks like a type change. -We were about to upgrade to Spark 2 but this new restriction in SQL DDL that can be sent to Hive will block us. I believe this is fixable by adding an exception in [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325] to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and destination type are both struct types, and the destination type only adds new fields.- In this [PR|https://github.com/apache/spark/pull/21012], I was told that the Spark 3 datasource v2 would support this. However, it is clear that it does not. There is an [explicit check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441] and [test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583] that prevents this from happening. was: As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE CHANGE COLUMN commands to Hive. This restriction was loosened in [https://github.com/apache/spark/pull/12714] to allow for those commands if they only change the column comment. Wikimedia has been evolving Parquet backed Hive tables with data originally from JSON events by adding newly found columns to the Hive table schema, via a Spark job we call 'Refine'. We do this by recursively merging an input DataFrame schema with a Hive table DataFrame schema, finding new fields, and then issuing an ALTER TABLE statement to add the columns. However, because we allow for nested data types in the incoming JSON data, we make extensive use of struct type fields. In order to add newly detected fields in a nested data type, we must alter the struct column and append the nested struct field. This requires CHANGE COLUMN that alters the column type. In reality, the 'type' of the column is not changing, it just just a new field being added to the struct, but to SQL, this looks like a type change. We were about to upgrade to Spark 2 but this new restriction in SQL DDL that can be sent to Hive will block us. I believe this is fixable by adding an exception in [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325] to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and destination type are both struct types, and the destination type only adds new fields. > Hive ALTER TABLE CHANGE COLUMN for struct type no longer works > -- > > Key: SPARK-23890 > URL: https://issues.apache.org/jira/browse/SPARK-23890 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Andrew Otto >Priority: Major > Labels: bulk-closed, pull-request-available > > As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE > CHANGE COLUMN commands to Hive. This restriction was loosened in > [https://github.com/apache/spark/pull/12714] to allow for those commands if > they only change the column comment. > Wikimedia has been evolving Parquet backed Hive tables with data originally > from JSON events by adding newly found columns to the Hive table schema, via > a Spark job we call 'Refine'. We do this by recursively merging an input > DataFrame schema with a
[jira] [Reopened] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
[ https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Otto reopened SPARK-23890: - This was supposed to have been fixed in Spark 3 datasource v2, but the issue persists. > Hive ALTER TABLE CHANGE COLUMN for struct type no longer works > -- > > Key: SPARK-23890 > URL: https://issues.apache.org/jira/browse/SPARK-23890 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Andrew Otto >Priority: Major > Labels: bulk-closed, pull-request-available > > As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE > CHANGE COLUMN commands to Hive. This restriction was loosened in > [https://github.com/apache/spark/pull/12714] to allow for those commands if > they only change the column comment. > Wikimedia has been evolving Parquet backed Hive tables with data originally > from JSON events by adding newly found columns to the Hive table schema, via > a Spark job we call 'Refine'. We do this by recursively merging an input > DataFrame schema with a Hive table DataFrame schema, finding new fields, and > then issuing an ALTER TABLE statement to add the columns. However, because > we allow for nested data types in the incoming JSON data, we make extensive > use of struct type fields. In order to add newly detected fields in a nested > data type, we must alter the struct column and append the nested struct > field. This requires CHANGE COLUMN that alters the column type. In reality, > the 'type' of the column is not changing, it just just a new field being > added to the struct, but to SQL, this looks like a type change. > -We were about to upgrade to Spark 2 but this new restriction in SQL DDL that > can be sent to Hive will block us. I believe this is fixable by adding an > exception in > [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325] > to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and > destination type are both struct types, and the destination type only adds > new fields.- > > In this [PR|https://github.com/apache/spark/pull/21012], I was told that the > Spark 3 datasource v2 would support this. > However, it is clear that it does not. There is an [explicit > check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441] > and > [test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583] > that prevents this from happening. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
[ https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-23890: --- Labels: bulk-closed pull-request-available (was: bulk-closed) > Hive ALTER TABLE CHANGE COLUMN for struct type no longer works > -- > > Key: SPARK-23890 > URL: https://issues.apache.org/jira/browse/SPARK-23890 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Otto >Priority: Major > Labels: bulk-closed, pull-request-available > > As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE > CHANGE COLUMN commands to Hive. This restriction was loosened in > [https://github.com/apache/spark/pull/12714] to allow for those commands if > they only change the column comment. > Wikimedia has been evolving Parquet backed Hive tables with data originally > from JSON events by adding newly found columns to the Hive table schema, via > a Spark job we call 'Refine'. We do this by recursively merging an input > DataFrame schema with a Hive table DataFrame schema, finding new fields, and > then issuing an ALTER TABLE statement to add the columns. However, because > we allow for nested data types in the incoming JSON data, we make extensive > use of struct type fields. In order to add newly detected fields in a nested > data type, we must alter the struct column and append the nested struct > field. This requires CHANGE COLUMN that alters the column type. In reality, > the 'type' of the column is not changing, it just just a new field being > added to the struct, but to SQL, this looks like a type change. > We were about to upgrade to Spark 2 but this new restriction in SQL DDL that > can be sent to Hive will block us. I believe this is fixable by adding an > exception in > [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325] > to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and > destination type are both struct types, and the destination type only adds > new fields. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46411) Change to use bcprov/bcpkix-jdk18on for test
[ https://issues.apache.org/jira/browse/SPARK-46411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46411: --- Labels: pull-request-available (was: ) > Change to use bcprov/bcpkix-jdk18on for test > > > Key: SPARK-46411 > URL: https://issues.apache.org/jira/browse/SPARK-46411 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46411) Change to use bcprov/bcpkix-jdk18on for test
Yang Jie created SPARK-46411: Summary: Change to use bcprov/bcpkix-jdk18on for test Key: SPARK-46411 URL: https://issues.apache.org/jira/browse/SPARK-46411 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46410) Assign error classes/subclasses to JdbcUtils.classifyException
[ https://issues.apache.org/jira/browse/SPARK-46410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46410: --- Labels: pull-request-available (was: ) > Assign error classes/subclasses to JdbcUtils.classifyException > -- > > Key: SPARK-46410 > URL: https://issues.apache.org/jira/browse/SPARK-46410 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Serge Rielau >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > This is a follow up to SPARK-46393. > We should raise distinct error classes for the different kinds of invokers of > JdbcUtils.classifyException -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46410) Assign error classes/subclasses to JdbcUtils.classifyException
[ https://issues.apache.org/jira/browse/SPARK-46410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-46410: Assignee: Max Gekk > Assign error classes/subclasses to JdbcUtils.classifyException > -- > > Key: SPARK-46410 > URL: https://issues.apache.org/jira/browse/SPARK-46410 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Serge Rielau >Assignee: Max Gekk >Priority: Major > > This is a follow up to SPARK-46393. > We should raise distinct error classes for the different kinds of invokers of > JdbcUtils.classifyException -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46410) Assign error classes/subclasses to JdbcUtils.classifyException
Serge Rielau created SPARK-46410: Summary: Assign error classes/subclasses to JdbcUtils.classifyException Key: SPARK-46410 URL: https://issues.apache.org/jira/browse/SPARK-46410 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.0 Reporter: Serge Rielau This is a follow up to SPARK-46393. We should raise distinct error classes for the different kinds of invokers of JdbcUtils.classifyException -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46409) Spark Connect Repl does not work with ClosureCleaner
Vsevolod Stepanov created SPARK-46409: - Summary: Spark Connect Repl does not work with ClosureCleaner Key: SPARK-46409 URL: https://issues.apache.org/jira/browse/SPARK-46409 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Vsevolod Stepanov SPARK-45136 added ClosureCleaner support to SparkConnect client. Unfortunately, this change breaks ConnectRepl launched by `./connector/connect/bin/spark-connect-scala-client`. To reproduce the issue: # Run `./connector/connect/bin/spark-connect-shell` # Run `./connector/connect/bin/spark-connect-scala-client` # In the REPL, execute this code: ``` @ def plus1(x: Int): Int = x + 1 @ val plus1_udf = udf(plus1 _) ``` This will fail with the following error: ``` java.lang.reflect.InaccessibleObjectException: Unable to make private native java.lang.reflect.Field[] java.lang.Class.getDeclaredFields0(boolean) accessible: module java.base does not "opens java.lang" to unnamed module @45099dd3 java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354) java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297) java.lang.reflect.Method.checkCanSetAccessible(Method.java:199) java.lang.reflect.Method.setAccessible(Method.java:193) org.apache.spark.util.ClosureCleaner$.getFinalModifiersFieldForJava17(ClosureCleaner.scala:577) org.apache.spark.util.ClosureCleaner$.setFieldAndIgnoreModifiers(ClosureCleaner.scala:560) org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$18(ClosureCleaner.scala:533) org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$18$adapted(ClosureCleaner.scala:525) scala.collection.ArrayOps$WithFilter.foreach(ArrayOps.scala:73) org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$16(ClosureCleaner.scala:525) org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$16$adapted(ClosureCleaner.scala:522) scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576) scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574) scala.collection.AbstractIterable.foreach(Iterable.scala:933) scala.collection.IterableOps$WithFilter.foreach(Iterable.scala:903) org.apache.spark.util.ClosureCleaner$.cleanupAmmoniteReplClosure(ClosureCleaner.scala:522) org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:251) org.apache.spark.sql.expressions.SparkConnectClosureCleaner$.clean(UserDefinedFunction.scala:210) org.apache.spark.sql.expressions.ScalarUserDefinedFunction$.apply(UserDefinedFunction.scala:187) org.apache.spark.sql.expressions.ScalarUserDefinedFunction$.apply(UserDefinedFunction.scala:180) org.apache.spark.sql.functions$.udf(functions.scala:7956) ammonite.$sess.cmd1$Helper.(cmd1.sc:1) ammonite.$sess.cmd1$.(cmd1.sc:7) ``` This is because ClosureCleaner is heavily reliant on using reflection API and is not compatible with Java 17. The rest of Spark bypasses this by adding `--add-opens` JVM flags, see https://issues.apache.org/jira/browse/SPARK-36796. We need to add these options to Spark Connect Client launch script as well -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46408) support date_sub on V2ExpressionBuilder
[ https://issues.apache.org/jira/browse/SPARK-46408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46408: --- Labels: pull-request-available (was: ) > support date_sub on V2ExpressionBuilder > --- > > Key: SPARK-46408 > URL: https://issues.apache.org/jira/browse/SPARK-46408 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Caican Cai >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > V2ExpressionBuilder currently does not support date_sub, which will affect > the filter pushdown of date_sub in logical plan optimization. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46408) support date_sub on V2ExpressionBuilder
Caican Cai created SPARK-46408: -- Summary: support date_sub on V2ExpressionBuilder Key: SPARK-46408 URL: https://issues.apache.org/jira/browse/SPARK-46408 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Caican Cai Fix For: 4.0.0 V2ExpressionBuilder currently does not support date_sub, which will affect the filter pushdown of date_sub in logical plan optimization. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46406) Assign a name to the error class _LEGACY_ERROR_TEMP_1023
[ https://issues.apache.org/jira/browse/SPARK-46406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46406: --- Labels: pull-request-available (was: ) > Assign a name to the error class _LEGACY_ERROR_TEMP_1023 > > > Key: SPARK-46406 > URL: https://issues.apache.org/jira/browse/SPARK-46406 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46407) Reorganize `OpsOnDiffFramesDisabledTests`
[ https://issues.apache.org/jira/browse/SPARK-46407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46407: --- Labels: pull-request-available (was: ) > Reorganize `OpsOnDiffFramesDisabledTests` > - > > Key: SPARK-46407 > URL: https://issues.apache.org/jira/browse/SPARK-46407 > Project: Spark > Issue Type: Test > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46407) Reorganize `OpsOnDiffFramesDisabledTests`
Ruifeng Zheng created SPARK-46407: - Summary: Reorganize `OpsOnDiffFramesDisabledTests` Key: SPARK-46407 URL: https://issues.apache.org/jira/browse/SPARK-46407 Project: Spark Issue Type: Test Components: PS, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46406) Assign a name to the error class _LEGACY_ERROR_TEMP_1023
Jiaan Geng created SPARK-46406: -- Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_1023 Key: SPARK-46406 URL: https://issues.apache.org/jira/browse/SPARK-46406 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)
[ https://issues.apache.org/jira/browse/SPARK-45796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45796. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44184 [https://github.com/apache/spark/pull/44184] > Support MODE() WITHIN GROUP (ORDER BY col) > --- > > Key: SPARK-45796 > URL: https://issues.apache.org/jira/browse/SPARK-45796 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Many mainstream databases supports the syntax show below. > { MODE() WITHIN GROUP (ORDER BY sortSpecification) } > [FILTER (WHERE expression)] [OVER windowNameOrSpecification] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46391) Reorganize `ExpandingParityTests`
[ https://issues.apache.org/jira/browse/SPARK-46391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-46391: - Assignee: Ruifeng Zheng > Reorganize `ExpandingParityTests` > - > > Key: SPARK-46391 > URL: https://issues.apache.org/jira/browse/SPARK-46391 > Project: Spark > Issue Type: Test > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46391) Reorganize `ExpandingParityTests`
[ https://issues.apache.org/jira/browse/SPARK-46391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-46391. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44332 [https://github.com/apache/spark/pull/44332] > Reorganize `ExpandingParityTests` > - > > Key: SPARK-46391 > URL: https://issues.apache.org/jira/browse/SPARK-46391 > Project: Spark > Issue Type: Test > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28386) Cannot resolve ORDER BY columns with GROUP BY and HAVING
[ https://issues.apache.org/jira/browse/SPARK-28386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-28386: --- Labels: pull-request-available (was: ) > Cannot resolve ORDER BY columns with GROUP BY and HAVING > > > Key: SPARK-28386 > URL: https://issues.apache.org/jira/browse/SPARK-28386 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > > How to reproduce: > {code:sql} > CREATE TABLE test_having (a int, b int, c string, d string) USING parquet; > INSERT INTO test_having VALUES (0, 1, '', 'A'); > INSERT INTO test_having VALUES (1, 2, '', 'b'); > INSERT INTO test_having VALUES (2, 2, '', 'c'); > INSERT INTO test_having VALUES (3, 3, '', 'D'); > INSERT INTO test_having VALUES (4, 3, '', 'e'); > INSERT INTO test_having VALUES (5, 3, '', 'F'); > INSERT INTO test_having VALUES (6, 4, '', 'g'); > INSERT INTO test_having VALUES (7, 4, '', 'h'); > INSERT INTO test_having VALUES (8, 4, '', 'I'); > INSERT INTO test_having VALUES (9, 4, '', 'j'); > SELECT lower(c), count(c) FROM test_having > GROUP BY lower(c) HAVING count(*) > 2 > ORDER BY lower(c); > {code} > {noformat} > spark-sql> SELECT lower(c), count(c) FROM test_having > > GROUP BY lower(c) HAVING count(*) > 2 > > ORDER BY lower(c); > Error in query: cannot resolve '`c`' given input columns: [lower(c), > count(c)]; line 3 pos 19; > 'Sort ['lower('c) ASC NULLS FIRST], true > +- Project [lower(c)#158, count(c)#159L] >+- Filter (count(1)#161L > cast(2 as bigint)) > +- Aggregate [lower(c#7)], [lower(c#7) AS lower(c)#158, count(c#7) AS > count(c)#159L, count(1) AS count(1)#161L] > +- SubqueryAlias test_having > +- Relation[a#5,b#6,c#7,d#8] parquet > {noformat} > But it works when setting an alias: > {noformat} > spark-sql> SELECT lower(c) withAias, count(c) FROM test_having > > GROUP BY lower(c) HAVING count(*) > 2 > > ORDER BY withAias; > 3 > 4 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46405) Issue with CSV schema inference and malformed records
Yaohua Zhao created SPARK-46405: --- Summary: Issue with CSV schema inference and malformed records Key: SPARK-46405 URL: https://issues.apache.org/jira/browse/SPARK-46405 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Yaohua Zhao There appears to be a discrepancy in the behavior of schema inference in the CSV reader compared to JSON. When processing CSV files without a predefined schema, the mechanism to handle malformed records seems to be inconsistent. Unlike the JSON format, where a `_corrupt_record` column is automatically added in the presence of malformed records, the CSV format does not exhibit this behavior. This inconsistency can lead to unexpected results and data loss during processing. *Steps to Reproduce:* # Create a CSV file with malformed records without providing a schema. # Observe that the `_corrupt_record` column is not automatically added to the final dataframe. *Expected Result:* The `_corrupt_record` column should be automatically added to the final dataframe when processing a CSV file with malformed records, similar to the behavior observed with JSON files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)
[ https://issues.apache.org/jira/browse/SPARK-45796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45796: -- Assignee: Jiaan Geng (was: Apache Spark) > Support MODE() WITHIN GROUP (ORDER BY col) > --- > > Key: SPARK-45796 > URL: https://issues.apache.org/jira/browse/SPARK-45796 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Many mainstream databases supports the syntax show below. > { MODE() WITHIN GROUP (ORDER BY sortSpecification) } > [FILTER (WHERE expression)] [OVER windowNameOrSpecification] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)
[ https://issues.apache.org/jira/browse/SPARK-45796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45796: -- Assignee: Apache Spark (was: Jiaan Geng) > Support MODE() WITHIN GROUP (ORDER BY col) > --- > > Key: SPARK-45796 > URL: https://issues.apache.org/jira/browse/SPARK-45796 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Many mainstream databases supports the syntax show below. > { MODE() WITHIN GROUP (ORDER BY sortSpecification) } > [FILTER (WHERE expression)] [OVER windowNameOrSpecification] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46401) Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`
[ https://issues.apache.org/jira/browse/SPARK-46401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-46401: Assignee: Yang Jie > Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0` > -- > > Key: SPARK-46401 > URL: https://issues.apache.org/jira/browse/SPARK-46401 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46401) Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`
[ https://issues.apache.org/jira/browse/SPARK-46401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-46401. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44347 [https://github.com/apache/spark/pull/44347] > Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0` > -- > > Key: SPARK-46401 > URL: https://issues.apache.org/jira/browse/SPARK-46401 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)
[ https://issues.apache.org/jira/browse/SPARK-45796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45796: -- Assignee: Apache Spark (was: Jiaan Geng) > Support MODE() WITHIN GROUP (ORDER BY col) > --- > > Key: SPARK-45796 > URL: https://issues.apache.org/jira/browse/SPARK-45796 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Many mainstream databases supports the syntax show below. > { MODE() WITHIN GROUP (ORDER BY sortSpecification) } > [FILTER (WHERE expression)] [OVER windowNameOrSpecification] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js
[ https://issues.apache.org/jira/browse/SPARK-46404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796633#comment-17796633 ] ASF GitHub Bot commented on SPARK-46404: User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/44346 > Add structured-streaming-page.test.js to test structured-streaming-page.js > -- > > Key: SPARK-46404 > URL: https://issues.apache.org/jira/browse/SPARK-46404 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming, UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js
[ https://issues.apache.org/jira/browse/SPARK-46404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46404: -- Assignee: Apache Spark > Add structured-streaming-page.test.js to test structured-streaming-page.js > -- > > Key: SPARK-46404 > URL: https://issues.apache.org/jira/browse/SPARK-46404 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming, UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)
[ https://issues.apache.org/jira/browse/SPARK-45796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45796: -- Assignee: Jiaan Geng (was: Apache Spark) > Support MODE() WITHIN GROUP (ORDER BY col) > --- > > Key: SPARK-45796 > URL: https://issues.apache.org/jira/browse/SPARK-45796 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Many mainstream databases supports the syntax show below. > { MODE() WITHIN GROUP (ORDER BY sortSpecification) } > [FILTER (WHERE expression)] [OVER windowNameOrSpecification] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46384) Structured Streaming UI doesn't display graph correctly
[ https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46384: -- Assignee: Apache Spark > Structured Streaming UI doesn't display graph correctly > --- > > Key: SPARK-46384 > URL: https://issues.apache.org/jira/browse/SPARK-46384 > Project: Spark > Issue Type: Task > Components: Structured Streaming, Web UI >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > The Streaming UI is broken currently at spark master. Running a simple query: > ``` > q = > spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start() > ``` > Would make the spark UI shows empty graph for "operation duration": > !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8! > Here is the error: > !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20! > > I verified the same query runs fine on spark 3.5, as in the following graph. > !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI! > > This should be a problem from the library updates, this could be a potential > source of error: [https://github.com/apache/spark/pull/42879] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46401) Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`
[ https://issues.apache.org/jira/browse/SPARK-46401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46401: -- Assignee: Apache Spark > Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0` > -- > > Key: SPARK-46401 > URL: https://issues.apache.org/jira/browse/SPARK-46401 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46384) Structured Streaming UI doesn't display graph correctly
[ https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46384: -- Assignee: (was: Apache Spark) > Structured Streaming UI doesn't display graph correctly > --- > > Key: SPARK-46384 > URL: https://issues.apache.org/jira/browse/SPARK-46384 > Project: Spark > Issue Type: Task > Components: Structured Streaming, Web UI >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > > The Streaming UI is broken currently at spark master. Running a simple query: > ``` > q = > spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start() > ``` > Would make the spark UI shows empty graph for "operation duration": > !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8! > Here is the error: > !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20! > > I verified the same query runs fine on spark 3.5, as in the following graph. > !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI! > > This should be a problem from the library updates, this could be a potential > source of error: [https://github.com/apache/spark/pull/42879] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js
[ https://issues.apache.org/jira/browse/SPARK-46404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46404: -- Assignee: (was: Apache Spark) > Add structured-streaming-page.test.js to test structured-streaming-page.js > -- > > Key: SPARK-46404 > URL: https://issues.apache.org/jira/browse/SPARK-46404 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming, UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46401) Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`
[ https://issues.apache.org/jira/browse/SPARK-46401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46401: -- Assignee: (was: Apache Spark) > Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0` > -- > > Key: SPARK-46401 > URL: https://issues.apache.org/jira/browse/SPARK-46401 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js
[ https://issues.apache.org/jira/browse/SPARK-46404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46404: -- Assignee: Apache Spark > Add structured-streaming-page.test.js to test structured-streaming-page.js > -- > > Key: SPARK-46404 > URL: https://issues.apache.org/jira/browse/SPARK-46404 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming, UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46384) Structured Streaming UI doesn't display graph correctly
[ https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46384: -- Assignee: Apache Spark > Structured Streaming UI doesn't display graph correctly > --- > > Key: SPARK-46384 > URL: https://issues.apache.org/jira/browse/SPARK-46384 > Project: Spark > Issue Type: Task > Components: Structured Streaming, Web UI >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > The Streaming UI is broken currently at spark master. Running a simple query: > ``` > q = > spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start() > ``` > Would make the spark UI shows empty graph for "operation duration": > !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8! > Here is the error: > !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20! > > I verified the same query runs fine on spark 3.5, as in the following graph. > !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI! > > This should be a problem from the library updates, this could be a potential > source of error: [https://github.com/apache/spark/pull/42879] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46384) Structured Streaming UI doesn't display graph correctly
[ https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46384: -- Assignee: (was: Apache Spark) > Structured Streaming UI doesn't display graph correctly > --- > > Key: SPARK-46384 > URL: https://issues.apache.org/jira/browse/SPARK-46384 > Project: Spark > Issue Type: Task > Components: Structured Streaming, Web UI >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > > The Streaming UI is broken currently at spark master. Running a simple query: > ``` > q = > spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start() > ``` > Would make the spark UI shows empty graph for "operation duration": > !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8! > Here is the error: > !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20! > > I verified the same query runs fine on spark 3.5, as in the following graph. > !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI! > > This should be a problem from the library updates, this could be a potential > source of error: [https://github.com/apache/spark/pull/42879] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46400) When there are corrupted files in the local maven repo, retry to skip this cache
[ https://issues.apache.org/jira/browse/SPARK-46400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46400: -- Assignee: Apache Spark > When there are corrupted files in the local maven repo, retry to skip this > cache > > > Key: SPARK-46400 > URL: https://issues.apache.org/jira/browse/SPARK-46400 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46400) When there are corrupted files in the local maven repo, retry to skip this cache
[ https://issues.apache.org/jira/browse/SPARK-46400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46400: -- Assignee: (was: Apache Spark) > When there are corrupted files in the local maven repo, retry to skip this > cache > > > Key: SPARK-46400 > URL: https://issues.apache.org/jira/browse/SPARK-46400 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation
[ https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45959: -- Assignee: (was: Apache Spark) > Abusing DataSet.withColumn can cause huge tree with severe perf degradation > --- > > Key: SPARK-45959 > URL: https://issues.apache.org/jira/browse/SPARK-45959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Asif >Priority: Minor > Labels: pull-request-available > > Though documentation clearly recommends to add all columns in a single shot, > but in reality is difficult to expect customer to modify their code, as in > spark2 the rules in analyzer were such that they did not do deep tree > traversal. Moreover in Spark3 , the plans are cloned before giving to > analyzer , optimizer etc which was not the case in Spark2. > All these things have resulted in query time being increased from 5 min to 2 > - 3 hrs. > Many times the columns are added to plan via some for loop logic which just > keeps adding new computation based on some rule. > So, my suggestion is to do some intial check in the withColumn api, before > creating a new projection, like if all the existing columns are still being > projected, and the new column being added has an expression which is not > depending on the output of the top node , but its child, then instead of > adding a new project, the column can be added to the existing node. > For starts, may be we can just handle Project node .. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation
[ https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45959: -- Assignee: Apache Spark > Abusing DataSet.withColumn can cause huge tree with severe perf degradation > --- > > Key: SPARK-45959 > URL: https://issues.apache.org/jira/browse/SPARK-45959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Asif >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > > Though documentation clearly recommends to add all columns in a single shot, > but in reality is difficult to expect customer to modify their code, as in > spark2 the rules in analyzer were such that they did not do deep tree > traversal. Moreover in Spark3 , the plans are cloned before giving to > analyzer , optimizer etc which was not the case in Spark2. > All these things have resulted in query time being increased from 5 min to 2 > - 3 hrs. > Many times the columns are added to plan via some for loop logic which just > keeps adding new computation based on some rule. > So, my suggestion is to do some intial check in the withColumn api, before > creating a new projection, like if all the existing columns are still being > projected, and the new column being added has an expression which is not > depending on the output of the top node , but its child, then instead of > adding a new project, the column can be added to the existing node. > For starts, may be we can just handle Project node .. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation
[ https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45959: -- Assignee: Apache Spark > Abusing DataSet.withColumn can cause huge tree with severe perf degradation > --- > > Key: SPARK-45959 > URL: https://issues.apache.org/jira/browse/SPARK-45959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Asif >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > > Though documentation clearly recommends to add all columns in a single shot, > but in reality is difficult to expect customer to modify their code, as in > spark2 the rules in analyzer were such that they did not do deep tree > traversal. Moreover in Spark3 , the plans are cloned before giving to > analyzer , optimizer etc which was not the case in Spark2. > All these things have resulted in query time being increased from 5 min to 2 > - 3 hrs. > Many times the columns are added to plan via some for loop logic which just > keeps adding new computation based on some rule. > So, my suggestion is to do some intial check in the withColumn api, before > creating a new projection, like if all the existing columns are still being > projected, and the new column being added has an expression which is not > depending on the output of the top node , but its child, then instead of > adding a new project, the column can be added to the existing node. > For starts, may be we can just handle Project node .. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation
[ https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45959: -- Assignee: (was: Apache Spark) > Abusing DataSet.withColumn can cause huge tree with severe perf degradation > --- > > Key: SPARK-45959 > URL: https://issues.apache.org/jira/browse/SPARK-45959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Asif >Priority: Minor > Labels: pull-request-available > > Though documentation clearly recommends to add all columns in a single shot, > but in reality is difficult to expect customer to modify their code, as in > spark2 the rules in analyzer were such that they did not do deep tree > traversal. Moreover in Spark3 , the plans are cloned before giving to > analyzer , optimizer etc which was not the case in Spark2. > All these things have resulted in query time being increased from 5 min to 2 > - 3 hrs. > Many times the columns are added to plan via some for loop logic which just > keeps adding new computation based on some rule. > So, my suggestion is to do some intial check in the withColumn api, before > creating a new projection, like if all the existing columns are still being > projected, and the new column being added has an expression which is not > depending on the output of the top node , but its child, then instead of > adding a new project, the column can be added to the existing node. > For starts, may be we can just handle Project node .. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46404) Add structured-streaming-page.test.js to test structured-streaming-page.js
Kent Yao created SPARK-46404: Summary: Add structured-streaming-page.test.js to test structured-streaming-page.js Key: SPARK-46404 URL: https://issues.apache.org/jira/browse/SPARK-46404 Project: Spark Issue Type: Sub-task Components: Structured Streaming, UI Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46403) Decode parquet binary with getBytesUnsafe method
[ https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46403: --- Labels: pull-request-available (was: ) > Decode parquet binary with getBytesUnsafe method > > > Key: SPARK-46403 > URL: https://issues.apache.org/jira/browse/SPARK-46403 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wan Kun >Priority: Major > Labels: pull-request-available > Attachments: image-2023-12-14-16-30-39-104.png > > > Now spark will get a parquet binary object with getBytes() method. > The *Binary.getBytes()* method will always make a new copy of the internal > bytes. > We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has > already been called getBytes() and has the cached bytes. > !image-2023-12-14-16-30-39-104.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46403) Decode parquet binary with getBytesUnsafe method
[ https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Kun updated SPARK-46403: Description: Now spark will get a parquet binary object with getBytes() method. The *Binary.getBytes()* method will always make a new copy of the internal bytes. We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has already been called getBytes() and has the cached bytes. !image-2023-12-14-16-30-39-104.png! was: Now spark will get a parquet binary object with getBytes() method. The *Binary.getBytes()* method will always make a new copy of the internal bytes. We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has already been called getBytes() and has the cached bytes. !image-2023-12-14-16-28-04-797.png! > Decode parquet binary with getBytesUnsafe method > > > Key: SPARK-46403 > URL: https://issues.apache.org/jira/browse/SPARK-46403 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wan Kun >Priority: Major > Attachments: image-2023-12-14-16-30-39-104.png > > > Now spark will get a parquet binary object with getBytes() method. > The *Binary.getBytes()* method will always make a new copy of the internal > bytes. > We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has > already been called getBytes() and has the cached bytes. > !image-2023-12-14-16-30-39-104.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46403) Decode parquet binary with getBytesUnsafe method
[ https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Kun updated SPARK-46403: Description: Now spark will get a parquet binary object with getBytes() method. The *Binary.getBytes()* method will always make a new copy of the internal bytes. We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has already been called getBytes() and has the cached bytes. !image-2023-12-14-16-28-04-797.png! was: Now spark will get a parquet binary dictionary object with getBytes() method. The *Binary.getBytes()* method will always make a new copy of the internal bytes. We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has already been called getBytes() and has the cached bytes. !image-2023-12-14-16-28-04-797.png! > Decode parquet binary with getBytesUnsafe method > > > Key: SPARK-46403 > URL: https://issues.apache.org/jira/browse/SPARK-46403 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wan Kun >Priority: Major > Attachments: image-2023-12-14-16-30-39-104.png > > > Now spark will get a parquet binary object with getBytes() method. > The *Binary.getBytes()* method will always make a new copy of the internal > bytes. > We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has > already been called getBytes() and has the cached bytes. > !image-2023-12-14-16-28-04-797.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46403) Decode parquet binary with getBytesUnsafe method
[ https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Kun updated SPARK-46403: Attachment: image-2023-12-14-16-30-39-104.png > Decode parquet binary with getBytesUnsafe method > > > Key: SPARK-46403 > URL: https://issues.apache.org/jira/browse/SPARK-46403 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wan Kun >Priority: Major > Attachments: image-2023-12-14-16-30-39-104.png > > > Now spark will get a parquet binary object with getBytes() method. > The *Binary.getBytes()* method will always make a new copy of the internal > bytes. > We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has > already been called getBytes() and has the cached bytes. > !image-2023-12-14-16-28-04-797.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46403) Decode parquet binary with getBytesUnsafe method
[ https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Kun updated SPARK-46403: Summary: Decode parquet binary with getBytesUnsafe method (was: Decode parquet binary dictionary with getBytesUnsafe method) > Decode parquet binary with getBytesUnsafe method > > > Key: SPARK-46403 > URL: https://issues.apache.org/jira/browse/SPARK-46403 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wan Kun >Priority: Major > > Now spark will get a parquet binary dictionary object with getBytes() method. > The *Binary.getBytes()* method will always make a new copy of the internal > bytes. > We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has > already been called getBytes() and has the cached bytes. > !image-2023-12-14-16-28-04-797.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46403) Decode parquet binary dictionary with getBytesUnsafe method
Wan Kun created SPARK-46403: --- Summary: Decode parquet binary dictionary with getBytesUnsafe method Key: SPARK-46403 URL: https://issues.apache.org/jira/browse/SPARK-46403 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wan Kun Now spark will get a parquet binary dictionary object with getBytes() method. The *Binary.getBytes()* method will always make a new copy of the internal bytes. We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has already been called getBytes() and has the cached bytes. !image-2023-12-14-16-28-04-797.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46396) LegacyFastTimestampFormatter.parseOptional should not throw exception
[ https://issues.apache.org/jira/browse/SPARK-46396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-46396. Fix Version/s: 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 44338 [https://github.com/apache/spark/pull/44338] > LegacyFastTimestampFormatter.parseOptional should not throw exception > - > > Key: SPARK-46396 > URL: https://issues.apache.org/jira/browse/SPARK-46396 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.5.1, 4.0.0 > > > When setting spark.sql.legacy.timeParserPolicy=LEGACY, Spark will use the > LegacyFastTimestampFormatter to infer potential timestamp columns. The > inference shouldn't throw exception. > However, when the input is 23012150952, there is exception: > ``` > For input string: "23012150952" > java.lang.NumberFormatException: For input string: "23012150952" > at > java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67) > at java.base/java.lang.Integer.parseInt(Integer.java:668) > at java.base/java.lang.Integer.parseInt(Integer.java:786) > at > org.apache.commons.lang3.time.FastDateParser$NumberStrategy.parse(FastDateParser.java:304) > at > org.apache.commons.lang3.time.FastDateParser.parse(FastDateParser.java:1045) > at org.apache.commons.lang3.time.FastDateFormat.parse(FastDateFormat.java:651) > at > org.apache.spark.sql.catalyst.util.LegacyFastTimestampFormatter.parseOptional(TimestampFormatter.scala:418) > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46393) Classify exceptions in the JDBC table catalog
[ https://issues.apache.org/jira/browse/SPARK-46393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-46393. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44335 [https://github.com/apache/spark/pull/44335] > Classify exceptions in the JDBC table catalog > - > > Key: SPARK-46393 > URL: https://issues.apache.org/jira/browse/SPARK-46393 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Handle exceptions from JDBC drivers and convert them to AnalysisException > with error classes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org