[jira] [Resolved] (SPARK-45752) Unreferenced CTE should all be checked by CheckAnalysis0
[ https://issues.apache.org/jira/browse/SPARK-45752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-45752. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43614 [https://github.com/apache/spark/pull/43614] > Unreferenced CTE should all be checked by CheckAnalysis0 > > > Key: SPARK-45752 > URL: https://issues.apache.org/jira/browse/SPARK-45752 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45848) spark-build-info.ps1 missing the docroot property
[ https://issues.apache.org/jira/browse/SPARK-45848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45848: --- Labels: pull-request-available (was: ) > spark-build-info.ps1 missing the docroot property > - > > Key: SPARK-45848 > URL: https://issues.apache.org/jira/browse/SPARK-45848 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44 > https://github.com/apache/spark/blob/master/build/spark-build-info#L30-L36 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45848) spark-build-info.ps1 missing the docroot property
[ https://issues.apache.org/jira/browse/SPARK-45848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-45848: Description: https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44 https://github.com/apache/spark/blob/master/build/spark-build-info#L30-L36 was:https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44 > spark-build-info.ps1 missing the docroot property > - > > Key: SPARK-45848 > URL: https://issues.apache.org/jira/browse/SPARK-45848 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > > https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44 > https://github.com/apache/spark/blob/master/build/spark-build-info#L30-L36 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45848) spark-build-info.ps1 missing the docroot property
Yuming Wang created SPARK-45848: --- Summary: spark-build-info.ps1 missing the docroot property Key: SPARK-45848 URL: https://issues.apache.org/jira/browse/SPARK-45848 Project: Spark Issue Type: Bug Components: Build Affects Versions: 4.0.0 Reporter: Yuming Wang https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44662) SPIP: Improving performance of BroadcastHashJoin queries with stream side join key on non partition columns
[ https://issues.apache.org/jira/browse/SPARK-44662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784282#comment-17784282 ] Asif commented on SPARK-44662: -- The changes for iceberg which support broadcast-var-pushdown are present in the git repo: [iceberg-repo|https://github.com/ahshahid/iceberg.git] branch : broadcastvar-push. The changes done in the iceberg branch are compatible with latest apache/spark master ( identified as 3.5 to iceberg) and tested and compiled using scala 2.13. To get the iceberg-spark-run-time jar for use: First locally install the spark jars using the PR of spark mentioned below. (./build/mvn clean install -Phive -Phive-thriftserver -DskipTests) Then use the iceberg branch broadcastvar-push to create the iceberg spark runtime jar such that it uses the locally installed spark as dependency. In case you are interested in evaluating performance, pls let me know. > SPIP: Improving performance of BroadcastHashJoin queries with stream side > join key on non partition columns > --- > > Key: SPARK-44662 > URL: https://issues.apache.org/jira/browse/SPARK-44662 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Asif >Priority: Major > Labels: pull-request-available > Attachments: perf results broadcast var pushdown - Partitioned > TPCDS.pdf > > > h2. *Q1. What are you trying to do? Articulate your objectives using > absolutely no jargon.* > On the lines of DPP which helps DataSourceV2 relations when the joining key > is a partition column, the same concept can be extended over to the case > where joining key is not a partition column. > The Keys of BroadcastHashJoin are already available before actual evaluation > of the stream iterator. These keys can be pushed down to the DataSource as a > SortedSet. > For non partition columns, the DataSources like iceberg have max/min stats on > column available at manifest level, and for formats like parquet , they have > max/min stats at various storage level. The passed SortedSet can be used to > prune using ranges at both driver level ( manifests files) as well as > executor level ( while actually going through chunks , row groups etc at > parquet level) > If the data is stored as Columnar Batch format , then it would not be > possible to filter out individual row at DataSource level, even though we > have keys. > But at the scan level, ( ColumnToRowExec) it is still possible to filter out > as many rows as possible , if the query involves nested joins. Thus reducing > the number of rows to join at the higher join levels. > Will be adding more details.. > h2. *Q2. What problem is this proposal NOT designed to solve?* > This can only help in BroadcastHashJoin's performance if the join is Inner or > Left Semi. > This will also not work if there are nodes like Expand, Generator , Aggregate > (without group by on keys not part of joining column etc) below the > BroadcastHashJoin node being targeted. > h2. *Q3. How is it done today, and what are the limits of current practice?* > Currently this sort of pruning at DataSource level is being done using DPP > (Dynamic Partition Pruning ) and IFF one of the join key column is a > Partitioning column ( so that cost of DPP query is justified and way less > than amount of data it will be filtering by skipping partitions). > The limitation is that DPP type approach is not implemented ( intentionally I > believe), if the join column is a non partition column ( because of cost of > "DPP type" query would most likely be way high as compared to any possible > pruning ( especially if the column is not stored in a sorted manner). > h2. *Q4. What is new in your approach and why do you think it will be > successful?* > 1) This allows pruning on non partition column based joins. > 2) Because it piggy backs on Broadcasted Keys, there is no extra cost of "DPP > type" query. > 3) The Data can be used by DataSource to prune at driver (possibly) and also > at executor level ( as in case of parquet which has max/min at various > structure levels) > 4) The big benefit should be seen in multilevel nested join queries. In the > current code base, if I am correct, only one join's pruning filter would get > pushed at scan level. Since it is on partition key may be that is sufficient. > But if it is a nested Join query , and may be involving different columns on > streaming side for join, each such filter push could do significant pruning. > This requires some handling in case of AQE, as the stream side iterator ( & > hence stage evaluation needs to be delayed, till all the available join > filters in the nested tree are pushed at their respective targe
[jira] [Updated] (SPARK-45830) Refactor StorageUtils#bufferCleaner
[ https://issues.apache.org/jira/browse/SPARK-45830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45830: --- Labels: pull-request-available (was: ) > Refactor StorageUtils#bufferCleaner > --- > > Key: SPARK-45830 > URL: https://issues.apache.org/jira/browse/SPARK-45830 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45831) Change to using the collection factory to create an immutable Java collection
[ https://issues.apache.org/jira/browse/SPARK-45831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45831: - Assignee: Yang Jie > Change to using the collection factory to create an immutable Java collection > - > > Key: SPARK-45831 > URL: https://issues.apache.org/jira/browse/SPARK-45831 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45831) Change to using the collection factory to create an immutable Java collection
[ https://issues.apache.org/jira/browse/SPARK-45831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45831. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43709 [https://github.com/apache/spark/pull/43709] > Change to using the collection factory to create an immutable Java collection > - > > Key: SPARK-45831 > URL: https://issues.apache.org/jira/browse/SPARK-45831 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45835) Make gitHub labeler more accurate and remove outdated comments
[ https://issues.apache.org/jira/browse/SPARK-45835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45835: - Assignee: BingKun Pan > Make gitHub labeler more accurate and remove outdated comments > -- > > Key: SPARK-45835 > URL: https://issues.apache.org/jira/browse/SPARK-45835 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45835) Make gitHub labeler more accurate and remove outdated comments
[ https://issues.apache.org/jira/browse/SPARK-45835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45835. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43716 [https://github.com/apache/spark/pull/43716] > Make gitHub labeler more accurate and remove outdated comments > -- > > Key: SPARK-45835 > URL: https://issues.apache.org/jira/browse/SPARK-45835 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45842) Refactor Catalog Function APIs to use analyzer
[ https://issues.apache.org/jira/browse/SPARK-45842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45842. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43720 [https://github.com/apache/spark/pull/43720] > Refactor Catalog Function APIs to use analyzer > -- > > Key: SPARK-45842 > URL: https://issues.apache.org/jira/browse/SPARK-45842 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yihong He >Assignee: Yihong He >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45842) Refactor Catalog Function APIs to use analyzer
[ https://issues.apache.org/jira/browse/SPARK-45842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45842: - Assignee: Yihong He > Refactor Catalog Function APIs to use analyzer > -- > > Key: SPARK-45842 > URL: https://issues.apache.org/jira/browse/SPARK-45842 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yihong He >Assignee: Yihong He >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45846) spark.sql.optimizeNullAwareAntiJoin should respect spark.sql.autoBroadcastJoinThreshold
Chao Sun created SPARK-45846: Summary: spark.sql.optimizeNullAwareAntiJoin should respect spark.sql.autoBroadcastJoinThreshold Key: SPARK-45846 URL: https://issues.apache.org/jira/browse/SPARK-45846 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Chao Sun Normally broadcast join can be disabled when users set {{spark.sql.autoBroadcastJoinThreshold}} to -1. However this doesn't apply to {{spark.sql.optimizeNullAwareAntiJoin}}: {code} case j @ ExtractSingleColumnNullAwareAntiJoin(leftKeys, rightKeys) => Seq(joins.BroadcastHashJoinExec(leftKeys, rightKeys, LeftAnti, BuildRight, None, planLater(j.left), planLater(j.right), isNullAwareAntiJoin = true)) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45845) Streaming UI add number of evicted state rows
[ https://issues.apache.org/jira/browse/SPARK-45845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45845: --- Labels: pull-request-available (was: ) > Streaming UI add number of evicted state rows > - > > Key: SPARK-45845 > URL: https://issues.apache.org/jira/browse/SPARK-45845 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > > The UI is missing this chart, and people always confuse "aggregated number of > rows dropped by watermark" with this newly added metric -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45845) Streaming UI add number of evicted state rows
Wei Liu created SPARK-45845: --- Summary: Streaming UI add number of evicted state rows Key: SPARK-45845 URL: https://issues.apache.org/jira/browse/SPARK-45845 Project: Spark Issue Type: Task Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Wei Liu The UI is missing this chart, and people always confuse "aggregated number of rows dropped by watermark" with this newly added metric -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45843) Support `killAll` in REST Submission API
[ https://issues.apache.org/jira/browse/SPARK-45843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45843: - Assignee: Dongjoon Hyun > Support `killAll` in REST Submission API > > > Key: SPARK-45843 > URL: https://issues.apache.org/jira/browse/SPARK-45843 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45843) Support `killAll` in REST Submission API
[ https://issues.apache.org/jira/browse/SPARK-45843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45843. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43721 [https://github.com/apache/spark/pull/43721] > Support `killAll` in REST Submission API > > > Key: SPARK-45843 > URL: https://issues.apache.org/jira/browse/SPARK-45843 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45843) Support `killall` in REST Submission API
[ https://issues.apache.org/jira/browse/SPARK-45843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45843: -- Summary: Support `killall` in REST Submission API (was: Support `killAll` in REST Submission API) > Support `killall` in REST Submission API > > > Key: SPARK-45843 > URL: https://issues.apache.org/jira/browse/SPARK-45843 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42821) Remove unused parameters in splitFiles methods
[ https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-42821. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 40454 [https://github.com/apache/spark/pull/40454] > Remove unused parameters in splitFiles methods > -- > > Key: SPARK-42821 > URL: https://issues.apache.org/jira/browse/SPARK-42821 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42821) Remove unused parameters in splitFiles methods
[ https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-42821: Assignee: BingKun Pan > Remove unused parameters in splitFiles methods > -- > > Key: SPARK-42821 > URL: https://issues.apache.org/jira/browse/SPARK-42821 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45844) Implement case insensitivity for XML
[ https://issues.apache.org/jira/browse/SPARK-45844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45844: --- Labels: pull-request-available (was: ) > Implement case insensitivity for XML > > > Key: SPARK-45844 > URL: https://issues.apache.org/jira/browse/SPARK-45844 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Shujing Yang >Priority: Major > Labels: pull-request-available > > Currently, we don't follow the `SQLConf` of case insensitivity in XML, which > is inconsistent with other file formats. This PR implements the > case-insensitive behavior for schema inference and file reads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45844) Implement case insensitivity for XML
Shujing Yang created SPARK-45844: Summary: Implement case insensitivity for XML Key: SPARK-45844 URL: https://issues.apache.org/jira/browse/SPARK-45844 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Shujing Yang Currently, we don't follow the `SQLConf` of case insensitivity in XML, which is inconsistent with other file formats. This PR implements the case-insensitive behavior for schema inference and file reads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45827) Add variant data type in Spark
[ https://issues.apache.org/jira/browse/SPARK-45827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45827: --- Labels: pull-request-available (was: ) > Add variant data type in Spark > -- > > Key: SPARK-45827 > URL: https://issues.apache.org/jira/browse/SPARK-45827 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45843) Support `killAll` in REST Submission API
[ https://issues.apache.org/jira/browse/SPARK-45843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45843: --- Labels: pull-request-available (was: ) > Support `killAll` in REST Submission API > > > Key: SPARK-45843 > URL: https://issues.apache.org/jira/browse/SPARK-45843 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45843) Support `killAll` in REST Submission API
Dongjoon Hyun created SPARK-45843: - Summary: Support `killAll` in REST Submission API Key: SPARK-45843 URL: https://issues.apache.org/jira/browse/SPARK-45843 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45639) Support loading Python data sources in DataFrameReader
[ https://issues.apache.org/jira/browse/SPARK-45639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45639. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43630 [https://github.com/apache/spark/pull/43630] > Support loading Python data sources in DataFrameReader > -- > > Key: SPARK-45639 > URL: https://issues.apache.org/jira/browse/SPARK-45639 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Allow users to read from a Python data source using > `spark.read.format(...).load()` in PySpark. For example > Users can extend the DataSource and the DataSourceReader classes to create > their own Python data source reader and use them in PySpark: > {code:java} > class MyReader(DataSourceReader): > def read(self, partition): > yield (0, 1) > class MyDataSource(DataSource): > def schema(self): > return "id INT, value INT" > > def reader(self, schema): > return MyReader() > df = spark.read.format("MyDataSource").load() > df.show() > +---+-+ > | id|value| > +---+-+ > | 0| 1| > +---+-+ > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45639) Support loading Python data sources in DataFrameReader
[ https://issues.apache.org/jira/browse/SPARK-45639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45639: Assignee: Allison Wang > Support loading Python data sources in DataFrameReader > -- > > Key: SPARK-45639 > URL: https://issues.apache.org/jira/browse/SPARK-45639 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Allow users to read from a Python data source using > `spark.read.format(...).load()` in PySpark. For example > Users can extend the DataSource and the DataSourceReader classes to create > their own Python data source reader and use them in PySpark: > {code:java} > class MyReader(DataSourceReader): > def read(self, partition): > yield (0, 1) > class MyDataSource(DataSource): > def schema(self): > return "id INT, value INT" > > def reader(self, schema): > return MyReader() > df = spark.read.format("MyDataSource").load() > df.show() > +---+-+ > | id|value| > +---+-+ > | 0| 1| > +---+-+ > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45828) Remove deprecated method in dsl
[ https://issues.apache.org/jira/browse/SPARK-45828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45828. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43708 [https://github.com/apache/spark/pull/43708] > Remove deprecated method in dsl > --- > > Key: SPARK-45828 > URL: https://issues.apache.org/jira/browse/SPARK-45828 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45282) Join loses records for cached datasets
[ https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784115#comment-17784115 ] koert kuipers commented on SPARK-45282: --- it does look like same issue and partitioning being the cause makes sense too > Join loses records for cached datasets > -- > > Key: SPARK-45282 > URL: https://issues.apache.org/jira/browse/SPARK-45282 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 > Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or > databricks 13.3 >Reporter: koert kuipers >Priority: Blocker > Labels: CorrectnessBug, correctness > > we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is > not present on spark 3.3.1. > it only shows up in distributed environment. i cannot replicate in unit test. > however i did get it to show up on hadoop cluster, kubernetes, and on > databricks 13.3 > the issue is that records are dropped when two cached dataframes are joined. > it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an > optimization while in spark 3.3.1 these Exhanges are still present. it seems > to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true. > to reproduce on distributed cluster these settings needed are: > {code:java} > spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432 > spark.sql.adaptive.coalescePartitions.parallelismFirst false > spark.sql.adaptive.enabled true > spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code} > code using scala to reproduce is: > {code:java} > import java.util.UUID > import org.apache.spark.sql.functions.col > import spark.implicits._ > val data = (1 to 100).toDS().map(i => > UUID.randomUUID().toString).persist() > val left = data.map(k => (k, 1)) > val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works! > println("number of left " + left.count()) > println("number of right " + right.count()) > println("number of (left join right) " + > left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count() > ) > val left1 = left > .toDF("key", "value1") > .repartition(col("key")) // comment out this line to make it work > .persist() > println("number of left1 " + left1.count()) > val right1 = right > .toDF("key", "value2") > .repartition(col("key")) // comment out this line to make it work > .persist() > println("number of right1 " + right1.count()) > println("number of (left1 join right1) " + left1.join(right1, > "key").count()) // this gives incorrect result{code} > this produces the following output: > {code:java} > number of left 100 > number of right 100 > number of (left join right) 100 > number of left1 100 > number of right1 100 > number of (left1 join right1) 859531 {code} > note that the last number (the incorrect one) actually varies depending on > settings and cluster size etc. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45282) Join loses records for cached datasets
[ https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784109#comment-17784109 ] Emil Ejbyfeldt commented on SPARK-45282: The code reproducing the bug looks quite similar to https://issues.apache.org/jira/browse/SPARK-45592 I wonder if the fix for that might also have solved this bug as I could not reproduce this issue on a build from the master branch. > Join loses records for cached datasets > -- > > Key: SPARK-45282 > URL: https://issues.apache.org/jira/browse/SPARK-45282 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 > Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or > databricks 13.3 >Reporter: koert kuipers >Priority: Blocker > Labels: CorrectnessBug, correctness > > we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is > not present on spark 3.3.1. > it only shows up in distributed environment. i cannot replicate in unit test. > however i did get it to show up on hadoop cluster, kubernetes, and on > databricks 13.3 > the issue is that records are dropped when two cached dataframes are joined. > it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an > optimization while in spark 3.3.1 these Exhanges are still present. it seems > to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true. > to reproduce on distributed cluster these settings needed are: > {code:java} > spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432 > spark.sql.adaptive.coalescePartitions.parallelismFirst false > spark.sql.adaptive.enabled true > spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code} > code using scala to reproduce is: > {code:java} > import java.util.UUID > import org.apache.spark.sql.functions.col > import spark.implicits._ > val data = (1 to 100).toDS().map(i => > UUID.randomUUID().toString).persist() > val left = data.map(k => (k, 1)) > val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works! > println("number of left " + left.count()) > println("number of right " + right.count()) > println("number of (left join right) " + > left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count() > ) > val left1 = left > .toDF("key", "value1") > .repartition(col("key")) // comment out this line to make it work > .persist() > println("number of left1 " + left1.count()) > val right1 = right > .toDF("key", "value2") > .repartition(col("key")) // comment out this line to make it work > .persist() > println("number of right1 " + right1.count()) > println("number of (left1 join right1) " + left1.join(right1, > "key").count()) // this gives incorrect result{code} > this produces the following output: > {code:java} > number of left 100 > number of right 100 > number of (left join right) 100 > number of left1 100 > number of right1 100 > number of (left1 join right1) 859531 {code} > note that the last number (the incorrect one) actually varies depending on > settings and cluster size etc. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45826) Add a SQL config for extra stack traces in Origin
[ https://issues.apache.org/jira/browse/SPARK-45826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45826: --- Labels: pull-request-available (was: ) > Add a SQL config for extra stack traces in Origin > - > > Key: SPARK-45826 > URL: https://issues.apache.org/jira/browse/SPARK-45826 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > Add a SQL config to control how many extra stack traces should be captured in > the withOrigin method. This should improve user experience in troubleshooting > issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45842) Refactor Catalog Function APIs to use analyzer
[ https://issues.apache.org/jira/browse/SPARK-45842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45842: --- Labels: pull-request-available (was: ) > Refactor Catalog Function APIs to use analyzer > -- > > Key: SPARK-45842 > URL: https://issues.apache.org/jira/browse/SPARK-45842 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yihong He >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45842) Refactor Catalog Function APIs to use analyzer
Yihong He created SPARK-45842: - Summary: Refactor Catalog Function APIs to use analyzer Key: SPARK-45842 URL: https://issues.apache.org/jira/browse/SPARK-45842 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Yihong He -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45841) Expose stack trace by DataFrameQueryContext
[ https://issues.apache.org/jira/browse/SPARK-45841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-45841. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43703 [https://github.com/apache/spark/pull/43703] > Expose stack trace by DataFrameQueryContext > --- > > Key: SPARK-45841 > URL: https://issues.apache.org/jira/browse/SPARK-45841 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Modify DataFrameQueryContext and expose stack traces to users. This should > allow easily troubleshoot issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45841) Expose stack trace by DataFrameQueryContext
[ https://issues.apache.org/jira/browse/SPARK-45841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45841: --- Labels: pull-request-available (was: ) > Expose stack trace by DataFrameQueryContext > --- > > Key: SPARK-45841 > URL: https://issues.apache.org/jira/browse/SPARK-45841 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > Modify DataFrameQueryContext and expose stack traces to users. This should > allow easily troubleshoot issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45837) Report underlying error in scala client
[ https://issues.apache.org/jira/browse/SPARK-45837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45837: --- Labels: pull-request-available (was: ) > Report underlying error in scala client > --- > > Key: SPARK-45837 > URL: https://issues.apache.org/jira/browse/SPARK-45837 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Alice Sayutina >Priority: Minor > Labels: pull-request-available > > When there is retry-worthy error, we need to not just throw RetryException, > but also -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45841) Expose stack trace by DataFrameQueryContext
Max Gekk created SPARK-45841: Summary: Expose stack trace by DataFrameQueryContext Key: SPARK-45841 URL: https://issues.apache.org/jira/browse/SPARK-45841 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Modify DataFrameQueryContext and expose stack traces to users. This should allow easily troubleshoot issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45840) Fix these issue in module sql/hive, sql/hive-thriftserver
[ https://issues.apache.org/jira/browse/SPARK-45840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45840: --- Summary: Fix these issue in module sql/hive, sql/hive-thriftserver (was: Fix these issue in module sql/hive) > Fix these issue in module sql/hive, sql/hive-thriftserver > - > > Key: SPARK-45840 > URL: https://issues.apache.org/jira/browse/SPARK-45840 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45840) Fix these issue in module sql/hive
Jiaan Geng created SPARK-45840: -- Summary: Fix these issue in module sql/hive Key: SPARK-45840 URL: https://issues.apache.org/jira/browse/SPARK-45840 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45839) Fix these issue in module sql/api
Jiaan Geng created SPARK-45839: -- Summary: Fix these issue in module sql/api Key: SPARK-45839 URL: https://issues.apache.org/jira/browse/SPARK-45839 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45838) Fix these issue in module sql/core
Jiaan Geng created SPARK-45838: -- Summary: Fix these issue in module sql/core Key: SPARK-45838 URL: https://issues.apache.org/jira/browse/SPARK-45838 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45825) Fix these issue in module sql/catalyst
[ https://issues.apache.org/jira/browse/SPARK-45825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45825: --- Summary: Fix these issue in module sql/catalyst (was: Fix these issue in package sql/catalyst) > Fix these issue in module sql/catalyst > -- > > Key: SPARK-45825 > URL: https://issues.apache.org/jira/browse/SPARK-45825 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45816) Return null when overflowing during casting from timestamp to integers
[ https://issues.apache.org/jira/browse/SPARK-45816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45816. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43694 [https://github.com/apache/spark/pull/43694] > Return null when overflowing during casting from timestamp to integers > -- > > Key: SPARK-45816 > URL: https://issues.apache.org/jira/browse/SPARK-45816 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.1, 3.5.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Spark cast works in two modes: ansi and non-ansi. When overflowing during > casting, the common behavior under non-ansi mode is to return null. However, > casting from Timestamp to Int/Short/Byte returns a wrapping value now. The > behavior to silently overflow doesn't make sense. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45816) Return null when overflowing during casting from timestamp to integers
[ https://issues.apache.org/jira/browse/SPARK-45816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng reassigned SPARK-45816: -- Assignee: L. C. Hsieh > Return null when overflowing during casting from timestamp to integers > -- > > Key: SPARK-45816 > URL: https://issues.apache.org/jira/browse/SPARK-45816 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.1, 3.5.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > > Spark cast works in two modes: ansi and non-ansi. When overflowing during > casting, the common behavior under non-ansi mode is to return null. However, > casting from Timestamp to Int/Short/Byte returns a wrapping value now. The > behavior to silently overflow doesn't make sense. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45837) Report underlying error in scala client
Alice Sayutina created SPARK-45837: -- Summary: Report underlying error in scala client Key: SPARK-45837 URL: https://issues.apache.org/jira/browse/SPARK-45837 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Alice Sayutina When there is retry-worthy error, we need to not just throw RetryException, but also -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45606) Release restrictions on multi-layer runtime filter
[ https://issues.apache.org/jira/browse/SPARK-45606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45606. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43449 [https://github.com/apache/spark/pull/43449] > Release restrictions on multi-layer runtime filter > -- > > Key: SPARK-45606 > URL: https://issues.apache.org/jira/browse/SPARK-45606 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Before https://issues.apache.org/jira/browse/SPARK-41674, Spark only supports > insert runtime filter for application side of shuffle join on single-layer. > Considered it's not worth to insert more runtime filter if one side of the > shuffle join already exists runtime filter, Spark restricts it. > After https://issues.apache.org/jira/browse/SPARK-41674, Spark supports > insert runtime filter for one side of any shuffle join on multi-layer. But > the restrictions on multi-layer runtime filter looks outdated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45829) The default value of ‘spark.executor.logs.rolling.maxSize' on the official website is incorrect
[ https://issues.apache.org/jira/browse/SPARK-45829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-45829. -- Fix Version/s: 3.3.4 3.5.1 4.0.0 3.4.2 Resolution: Fixed Issue resolved by pull request 43712 [https://github.com/apache/spark/pull/43712] > The default value of ‘spark.executor.logs.rolling.maxSize' on the official > website is incorrect > --- > > Key: SPARK-45829 > URL: https://issues.apache.org/jira/browse/SPARK-45829 > Project: Spark > Issue Type: Improvement > Components: Spark Core, UI >Affects Versions: 3.5.0 >Reporter: chenyu >Assignee: chenyu >Priority: Trivial > Labels: pull-request-available > Fix For: 3.3.4, 3.5.1, 4.0.0, 3.4.2 > > Attachments: the default value.png, the value on the website.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45829) The default value of ‘spark.executor.logs.rolling.maxSize' on the official website is incorrect
[ https://issues.apache.org/jira/browse/SPARK-45829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-45829: Assignee: chenyu > The default value of ‘spark.executor.logs.rolling.maxSize' on the official > website is incorrect > --- > > Key: SPARK-45829 > URL: https://issues.apache.org/jira/browse/SPARK-45829 > Project: Spark > Issue Type: Improvement > Components: Spark Core, UI >Affects Versions: 3.5.0 >Reporter: chenyu >Assignee: chenyu >Priority: Trivial > Labels: pull-request-available > Attachments: the default value.png, the value on the website.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45825) Fix these issue in package sql/catalyst
[ https://issues.apache.org/jira/browse/SPARK-45825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45825: --- Labels: pull-request-available (was: ) > Fix these issue in package sql/catalyst > --- > > Key: SPARK-45825 > URL: https://issues.apache.org/jira/browse/SPARK-45825 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43341) StructType.toDDL does not pick up on non-nullability of column in nested struct
[ https://issues.apache.org/jira/browse/SPARK-43341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43341: --- Labels: pull-request-available (was: ) > StructType.toDDL does not pick up on non-nullability of column in nested > struct > --- > > Key: SPARK-43341 > URL: https://issues.apache.org/jira/browse/SPARK-43341 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.3.2 >Reporter: Bram Boogaarts >Priority: Major > Labels: pull-request-available > > h2. The problem > When converting a StructType instance containing a nested StructType column > which in turn contains a column for which {{nullable = false}} to a DDL > string using {{{}.toDDL{}}}, the resulting DDL string does not include this > non-nullability. For example: > {code:java} > val testschema = StructType(List( > StructField("key", IntegerType, false), > StructField("value", StringType, true), > StructField("nestedCols", StructType(List( > StructField("nestedKey", IntegerType, false), > StructField("nestedValue", StringType, true) > )), false) > )) > println(testschema.toDDL) > println(StructType.fromDDL(testschema.toDDL)){code} > gives: > {code:java} > key INT NOT NULL,value STRING,nestedCols STRUCT STRING> NOT NULL > StructType( > StructField(key,IntegerType,false), > StructField(value,StringType,true), > StructField(nestedCols,StructType( > StructField(nestedKey,IntegerType,true), > StructField(nestedValue,StringType,true) > ),false) > ){code} > > This is due to the fact that {{StructType.toDDL}} calls {{StructField.toDDL}} > for its fields, which in turn calls {{.sql}} for its {{{}dataType{}}}. If > {{dataType}} is a {{{}StructType{}}}, the call to {{.sql}} in turn calls > {{.sql}} for all the nested fields, and this last method does not include the > nullability of the field in its output. > h2. Proposed solution > {{StructField.toDDL}} should call {{dataType.toDDL}} for a > {{{}StructType{}}}, since this will include information about nullability of > nested columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45816) Return null when overflowing during casting from timestamp to integers
[ https://issues.apache.org/jira/browse/SPARK-45816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45816: -- Assignee: Apache Spark > Return null when overflowing during casting from timestamp to integers > -- > > Key: SPARK-45816 > URL: https://issues.apache.org/jira/browse/SPARK-45816 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.1, 3.5.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Spark cast works in two modes: ansi and non-ansi. When overflowing during > casting, the common behavior under non-ansi mode is to return null. However, > casting from Timestamp to Int/Short/Byte returns a wrapping value now. The > behavior to silently overflow doesn't make sense. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45816) Return null when overflowing during casting from timestamp to integers
[ https://issues.apache.org/jira/browse/SPARK-45816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45816: -- Assignee: (was: Apache Spark) > Return null when overflowing during casting from timestamp to integers > -- > > Key: SPARK-45816 > URL: https://issues.apache.org/jira/browse/SPARK-45816 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.1, 3.5.0 >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > > Spark cast works in two modes: ansi and non-ansi. When overflowing during > casting, the common behavior under non-ansi mode is to return null. However, > casting from Timestamp to Int/Short/Byte returns a wrapping value now. The > behavior to silently overflow doesn't make sense. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-42821) Remove unused parameters in splitFiles methods
[ https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reopened SPARK-42821: -- > Remove unused parameters in splitFiles methods > -- > > Key: SPARK-42821 > URL: https://issues.apache.org/jira/browse/SPARK-42821 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42821) Remove unused parameters in splitFiles methods
[ https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42821: - Affects Version/s: 4.0.0 (was: 3.5.0) > Remove unused parameters in splitFiles methods > -- > > Key: SPARK-42821 > URL: https://issues.apache.org/jira/browse/SPARK-42821 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45824) Enforce error class in ParseException
[ https://issues.apache.org/jira/browse/SPARK-45824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-45824. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43702 [https://github.com/apache/spark/pull/43702] > Enforce error class in ParseException > - > > Key: SPARK-45824 > URL: https://issues.apache.org/jira/browse/SPARK-45824 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Make the error class in ParseException mandatory to enforce callers to always > set it. This simplifies migration on error classes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42821) Remove unused parameters in splitFiles methods
[ https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42821: --- Labels: pull-request-available (was: ) > Remove unused parameters in splitFiles methods > -- > > Key: SPARK-42821 > URL: https://issues.apache.org/jira/browse/SPARK-42821 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45835) Make gitHub labeler more accurate and remove outdated comments
[ https://issues.apache.org/jira/browse/SPARK-45835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45835: --- Labels: pull-request-available (was: ) > Make gitHub labeler more accurate and remove outdated comments > -- > > Key: SPARK-45835 > URL: https://issues.apache.org/jira/browse/SPARK-45835 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org