[jira] [Commented] (SPARK-42151) Align UPDATE assignments with table attributes
[ https://issues.apache.org/jira/browse/SPARK-42151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697153#comment-17697153 ] Apache Spark commented on SPARK-42151: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/40308 > Align UPDATE assignments with table attributes > -- > > Key: SPARK-42151 > URL: https://issues.apache.org/jira/browse/SPARK-42151 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Priority: Major > > Assignment in UPDATE commands should be aligned with table attributes prior > to rewriting those UPDATE commands. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42689) Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
[ https://issues.apache.org/jira/browse/SPARK-42689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697121#comment-17697121 ] Apache Spark commented on SPARK-42689: -- User 'mridulm' has created a pull request for this issue: https://github.com/apache/spark/pull/40307 > Allow ShuffleDriverComponent to declare if shuffle data is reliably stored > -- > > Key: SPARK-42689 > URL: https://issues.apache.org/jira/browse/SPARK-42689 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0 >Reporter: Mridul Muralidharan >Priority: Major > > Currently, if there is an executor node loss, we assume the shuffle data on > that node is also lost. This is not necessarily the case if there is a > shuffle component managing the shuffle data and reliably maintaining it (for > example, in distributed filesystem or in a disaggregated shuffle cluster). > Downstream projects have patches to Apache Spark in order to workaround this > issue, for example Apache Celeborn has > [this|https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42689) Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
[ https://issues.apache.org/jira/browse/SPARK-42689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42689: Assignee: Apache Spark > Allow ShuffleDriverComponent to declare if shuffle data is reliably stored > -- > > Key: SPARK-42689 > URL: https://issues.apache.org/jira/browse/SPARK-42689 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0 >Reporter: Mridul Muralidharan >Assignee: Apache Spark >Priority: Major > > Currently, if there is an executor node loss, we assume the shuffle data on > that node is also lost. This is not necessarily the case if there is a > shuffle component managing the shuffle data and reliably maintaining it (for > example, in distributed filesystem or in a disaggregated shuffle cluster). > Downstream projects have patches to Apache Spark in order to workaround this > issue, for example Apache Celeborn has > [this|https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42689) Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
[ https://issues.apache.org/jira/browse/SPARK-42689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42689: Assignee: (was: Apache Spark) > Allow ShuffleDriverComponent to declare if shuffle data is reliably stored > -- > > Key: SPARK-42689 > URL: https://issues.apache.org/jira/browse/SPARK-42689 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0 >Reporter: Mridul Muralidharan >Priority: Major > > Currently, if there is an executor node loss, we assume the shuffle data on > that node is also lost. This is not necessarily the case if there is a > shuffle component managing the shuffle data and reliably maintaining it (for > example, in distributed filesystem or in a disaggregated shuffle cluster). > Downstream projects have patches to Apache Spark in order to workaround this > issue, for example Apache Celeborn has > [this|https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42689) Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
[ https://issues.apache.org/jira/browse/SPARK-42689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697120#comment-17697120 ] Apache Spark commented on SPARK-42689: -- User 'mridulm' has created a pull request for this issue: https://github.com/apache/spark/pull/40307 > Allow ShuffleDriverComponent to declare if shuffle data is reliably stored > -- > > Key: SPARK-42689 > URL: https://issues.apache.org/jira/browse/SPARK-42689 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0 >Reporter: Mridul Muralidharan >Priority: Major > > Currently, if there is an executor node loss, we assume the shuffle data on > that node is also lost. This is not necessarily the case if there is a > shuffle component managing the shuffle data and reliably maintaining it (for > example, in distributed filesystem or in a disaggregated shuffle cluster). > Downstream projects have patches to Apache Spark in order to workaround this > issue, for example Apache Celeborn has > [this|https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42687) Better error message for unspported `Pivot` operator in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42687: Assignee: (was: Apache Spark) > Better error message for unspported `Pivot` operator in Structure Streaming > --- > > Key: SPARK-42687 > URL: https://issues.apache.org/jira/browse/SPARK-42687 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Priority: Minor > > {{pivot}} is an unsupported operation in structured streaming but produces a > bad error message that is quite misleading. > > The following is the current error message for the {{pivot}} in SS: > {{AnalysisException: Queries with streaming sources must be executed with > writeStream.start();}} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42687) Better error message for unspported `Pivot` operator in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697100#comment-17697100 ] Apache Spark commented on SPARK-42687: -- User 'huanliwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40306 > Better error message for unspported `Pivot` operator in Structure Streaming > --- > > Key: SPARK-42687 > URL: https://issues.apache.org/jira/browse/SPARK-42687 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Priority: Minor > > {{pivot}} is an unsupported operation in structured streaming but produces a > bad error message that is quite misleading. > > The following is the current error message for the {{pivot}} in SS: > {{AnalysisException: Queries with streaming sources must be executed with > writeStream.start();}} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42687) Better error message for unspported `Pivot` operator in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42687: Assignee: Apache Spark > Better error message for unspported `Pivot` operator in Structure Streaming > --- > > Key: SPARK-42687 > URL: https://issues.apache.org/jira/browse/SPARK-42687 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Assignee: Apache Spark >Priority: Minor > > {{pivot}} is an unsupported operation in structured streaming but produces a > bad error message that is quite misleading. > > The following is the current error message for the {{pivot}} in SS: > {{AnalysisException: Queries with streaming sources must be executed with > writeStream.start();}} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42665) `simple udf` test failed using Maven
[ https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697043#comment-17697043 ] Apache Spark commented on SPARK-42665: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40304 > `simple udf` test failed using Maven > - > > Key: SPARK-42665 > URL: https://issues.apache.org/jira/browse/SPARK-42665 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > simple udf *** FAILED *** > io.grpc.StatusRuntimeException: INTERNAL: > org.apache.spark.sql.ClientE2ETestSuite > at io.grpc.Status.asRuntimeException(Status.java:535) > at > io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) > at > org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61) > at > org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106) > at > org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123) > at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426) > at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425) > at > org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42665) `simple udf` test failed using Maven
[ https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697042#comment-17697042 ] Apache Spark commented on SPARK-42665: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40304 > `simple udf` test failed using Maven > - > > Key: SPARK-42665 > URL: https://issues.apache.org/jira/browse/SPARK-42665 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > simple udf *** FAILED *** > io.grpc.StatusRuntimeException: INTERNAL: > org.apache.spark.sql.ClientE2ETestSuite > at io.grpc.Status.asRuntimeException(Status.java:535) > at > io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) > at > org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61) > at > org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106) > at > org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123) > at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426) > at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425) > at > org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42665) `simple udf` test failed using Maven
[ https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42665: Assignee: Apache Spark > `simple udf` test failed using Maven > - > > Key: SPARK-42665 > URL: https://issues.apache.org/jira/browse/SPARK-42665 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > {code:java} > simple udf *** FAILED *** > io.grpc.StatusRuntimeException: INTERNAL: > org.apache.spark.sql.ClientE2ETestSuite > at io.grpc.Status.asRuntimeException(Status.java:535) > at > io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) > at > org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61) > at > org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106) > at > org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123) > at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426) > at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425) > at > org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42665) `simple udf` test failed using Maven
[ https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697041#comment-17697041 ] Apache Spark commented on SPARK-42665: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40304 > `simple udf` test failed using Maven > - > > Key: SPARK-42665 > URL: https://issues.apache.org/jira/browse/SPARK-42665 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > simple udf *** FAILED *** > io.grpc.StatusRuntimeException: INTERNAL: > org.apache.spark.sql.ClientE2ETestSuite > at io.grpc.Status.asRuntimeException(Status.java:535) > at > io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) > at > org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61) > at > org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106) > at > org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123) > at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426) > at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425) > at > org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42665) `simple udf` test failed using Maven
[ https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42665: Assignee: (was: Apache Spark) > `simple udf` test failed using Maven > - > > Key: SPARK-42665 > URL: https://issues.apache.org/jira/browse/SPARK-42665 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > simple udf *** FAILED *** > io.grpc.StatusRuntimeException: INTERNAL: > org.apache.spark.sql.ClientE2ETestSuite > at io.grpc.Status.asRuntimeException(Status.java:535) > at > io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) > at > org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61) > at > org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106) > at > org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123) > at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426) > at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425) > at > org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42685) optimize byteToString routines
[ https://issues.apache.org/jira/browse/SPARK-42685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697036#comment-17697036 ] Apache Spark commented on SPARK-42685: -- User 'alkis' has created a pull request for this issue: https://github.com/apache/spark/pull/40301 > optimize byteToString routines > -- > > Key: SPARK-42685 > URL: https://issues.apache.org/jira/browse/SPARK-42685 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: Alkis Evlogimenos >Priority: Major > > {{Utils.byteToString routines are slow because they use BigInt and > BigDecimal. This is causing visible CPU usage (1-2% in scan benchmarks).}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42685) optimize byteToString routines
[ https://issues.apache.org/jira/browse/SPARK-42685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697034#comment-17697034 ] Apache Spark commented on SPARK-42685: -- User 'alkis' has created a pull request for this issue: https://github.com/apache/spark/pull/40301 > optimize byteToString routines > -- > > Key: SPARK-42685 > URL: https://issues.apache.org/jira/browse/SPARK-42685 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: Alkis Evlogimenos >Priority: Major > > {{Utils.byteToString routines are slow because they use BigInt and > BigDecimal. This is causing visible CPU usage (1-2% in scan benchmarks).}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42685) optimize byteToString routines
[ https://issues.apache.org/jira/browse/SPARK-42685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42685: Assignee: (was: Apache Spark) > optimize byteToString routines > -- > > Key: SPARK-42685 > URL: https://issues.apache.org/jira/browse/SPARK-42685 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: Alkis Evlogimenos >Priority: Major > > {{Utils.byteToString routines are slow because they use BigInt and > BigDecimal. This is causing visible CPU usage (1-2% in scan benchmarks).}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42685) optimize byteToString routines
[ https://issues.apache.org/jira/browse/SPARK-42685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42685: Assignee: Apache Spark > optimize byteToString routines > -- > > Key: SPARK-42685 > URL: https://issues.apache.org/jira/browse/SPARK-42685 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: Alkis Evlogimenos >Assignee: Apache Spark >Priority: Major > > {{Utils.byteToString routines are slow because they use BigInt and > BigDecimal. This is causing visible CPU usage (1-2% in scan benchmarks).}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42656) Spark Connect Scala Client Shell Script
[ https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697033#comment-17697033 ] Apache Spark commented on SPARK-42656: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40303 > Spark Connect Scala Client Shell Script > --- > > Key: SPARK-42656 > URL: https://issues.apache.org/jira/browse/SPARK-42656 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.4.0 > > > Adding a shell script to run scala client in a scala REPL to allow users to > connect to spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42686) TaskMemoryManager debug logging is expensive
[ https://issues.apache.org/jira/browse/SPARK-42686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42686: Assignee: Apache Spark > TaskMemoryManager debug logging is expensive > > > Key: SPARK-42686 > URL: https://issues.apache.org/jira/browse/SPARK-42686 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: Alkis Evlogimenos >Assignee: Apache Spark >Priority: Major > > TaskMemoryManager debug logging is expensive mostly because formatting > operations are done eagerly and some of them are quite expensive (i.e. > humanized strings). This causes visible CPU usage in scan benchmarks for > example. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42686) TaskMemoryManager debug logging is expensive
[ https://issues.apache.org/jira/browse/SPARK-42686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42686: Assignee: (was: Apache Spark) > TaskMemoryManager debug logging is expensive > > > Key: SPARK-42686 > URL: https://issues.apache.org/jira/browse/SPARK-42686 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: Alkis Evlogimenos >Priority: Major > > TaskMemoryManager debug logging is expensive mostly because formatting > operations are done eagerly and some of them are quite expensive (i.e. > humanized strings). This causes visible CPU usage in scan benchmarks for > example. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42686) TaskMemoryManager debug logging is expensive
[ https://issues.apache.org/jira/browse/SPARK-42686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697032#comment-17697032 ] Apache Spark commented on SPARK-42686: -- User 'alkis' has created a pull request for this issue: https://github.com/apache/spark/pull/40302 > TaskMemoryManager debug logging is expensive > > > Key: SPARK-42686 > URL: https://issues.apache.org/jira/browse/SPARK-42686 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: Alkis Evlogimenos >Priority: Major > > TaskMemoryManager debug logging is expensive mostly because formatting > operations are done eagerly and some of them are quite expensive (i.e. > humanized strings). This causes visible CPU usage in scan benchmarks for > example. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42683) Automatically rename metadata columns that conflict with data schema columns
[ https://issues.apache.org/jira/browse/SPARK-42683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42683: Assignee: Apache Spark > Automatically rename metadata columns that conflict with data schema columns > > > Key: SPARK-42683 > URL: https://issues.apache.org/jira/browse/SPARK-42683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Ryan Johnson >Assignee: Apache Spark >Priority: Major > > Today, if a datasource already has a column called `_metadata`, queries > cannot access the file-source metadata column that normally carries that > name. We can address this conflict with two changes to metadata column > handling: > # Automatically rename any metadata column whose name conflicts with a data > schema column > # Add a facility to reliably find metadata columns by their original/logical > name, even if they were renamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42683) Automatically rename metadata columns that conflict with data schema columns
[ https://issues.apache.org/jira/browse/SPARK-42683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42683: Assignee: (was: Apache Spark) > Automatically rename metadata columns that conflict with data schema columns > > > Key: SPARK-42683 > URL: https://issues.apache.org/jira/browse/SPARK-42683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Ryan Johnson >Priority: Major > > Today, if a datasource already has a column called `_metadata`, queries > cannot access the file-source metadata column that normally carries that > name. We can address this conflict with two changes to metadata column > handling: > # Automatically rename any metadata column whose name conflicts with a data > schema column > # Add a facility to reliably find metadata columns by their original/logical > name, even if they were renamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42683) Automatically rename metadata columns that conflict with data schema columns
[ https://issues.apache.org/jira/browse/SPARK-42683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697017#comment-17697017 ] Apache Spark commented on SPARK-42683: -- User 'ryan-johnson-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/40300 > Automatically rename metadata columns that conflict with data schema columns > > > Key: SPARK-42683 > URL: https://issues.apache.org/jira/browse/SPARK-42683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Ryan Johnson >Priority: Major > > Today, if a datasource already has a column called `_metadata`, queries > cannot access the file-source metadata column that normally carries that > name. We can address this conflict with two changes to metadata column > handling: > # Automatically rename any metadata column whose name conflicts with a data > schema column > # Add a facility to reliably find metadata columns by their original/logical > name, even if they were renamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42684) v2 catalog should not allow column default value by default
[ https://issues.apache.org/jira/browse/SPARK-42684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42684: Assignee: Apache Spark > v2 catalog should not allow column default value by default > --- > > Key: SPARK-42684 > URL: https://issues.apache.org/jira/browse/SPARK-42684 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42684) v2 catalog should not allow column default value by default
[ https://issues.apache.org/jira/browse/SPARK-42684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42684: Assignee: (was: Apache Spark) > v2 catalog should not allow column default value by default > --- > > Key: SPARK-42684 > URL: https://issues.apache.org/jira/browse/SPARK-42684 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42595) Support query inserted partitions after insert data into table when hive.exec.dynamic.partition=true
[ https://issues.apache.org/jira/browse/SPARK-42595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42595: Assignee: (was: Apache Spark) > Support query inserted partitions after insert data into table when > hive.exec.dynamic.partition=true > > > Key: SPARK-42595 > URL: https://issues.apache.org/jira/browse/SPARK-42595 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: zhang haoyan >Priority: Major > > When hive.exec.dynamic.partition=true and > hive.exec.dynamic.partition.mode=nonstrict, we can insert table by sql like > 'insert overwrite table aaa partition(dt) select ', of course we can > know the partitions inserted into the table by the sql itself, but if we > want do something for common use, we need some common way to get the inserted > partitions, for example: > spark.sql("insert overwrite table aaa partition(dt) select ") > //insert table > val partitions = getInsertedPartitions() //need some way to get > inserted partitions > monitorInsertedPartitions(partitions) //do something for common use > Since insert statement should not return any data, this ticket propose to > introduce spark.hive.exec.dynamic.partition.savePartitions=true (default > false) > spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix=hive_dynamic_inserted_partitions > when spark.hive.exec.dynamic.partition.savePartitions=true we save the > partitions to the > temporary view > $spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix_$dbName_$tableName > we will allow user to do this > scala> spark.conf.set("hive.exec.dynamic.partition", true) > scala> spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict") > scala> spark.conf.set("spark.hive.exec.dynamic.partition.savePartitions", > true) > scala> spark.sql("insert overwrite table db1.test_partition_table partition > (dt) select 1, '2023-02-22'").show(false) > ++ > > || > ++ > ++ > scala> spark.sql("select * from > hive_dynamic_inserted_partitions_db1_test_partition_table").show(false) > +--+ > > |dt | > +--+ > |2023-02-22| > +--+ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42595) Support query inserted partitions after insert data into table when hive.exec.dynamic.partition=true
[ https://issues.apache.org/jira/browse/SPARK-42595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697007#comment-17697007 ] Apache Spark commented on SPARK-42595: -- User 'haoyanzhang' has created a pull request for this issue: https://github.com/apache/spark/pull/40298 > Support query inserted partitions after insert data into table when > hive.exec.dynamic.partition=true > > > Key: SPARK-42595 > URL: https://issues.apache.org/jira/browse/SPARK-42595 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: zhang haoyan >Priority: Major > > When hive.exec.dynamic.partition=true and > hive.exec.dynamic.partition.mode=nonstrict, we can insert table by sql like > 'insert overwrite table aaa partition(dt) select ', of course we can > know the partitions inserted into the table by the sql itself, but if we > want do something for common use, we need some common way to get the inserted > partitions, for example: > spark.sql("insert overwrite table aaa partition(dt) select ") > //insert table > val partitions = getInsertedPartitions() //need some way to get > inserted partitions > monitorInsertedPartitions(partitions) //do something for common use > Since insert statement should not return any data, this ticket propose to > introduce spark.hive.exec.dynamic.partition.savePartitions=true (default > false) > spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix=hive_dynamic_inserted_partitions > when spark.hive.exec.dynamic.partition.savePartitions=true we save the > partitions to the > temporary view > $spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix_$dbName_$tableName > we will allow user to do this > scala> spark.conf.set("hive.exec.dynamic.partition", true) > scala> spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict") > scala> spark.conf.set("spark.hive.exec.dynamic.partition.savePartitions", > true) > scala> spark.sql("insert overwrite table db1.test_partition_table partition > (dt) select 1, '2023-02-22'").show(false) > ++ > > || > ++ > ++ > scala> spark.sql("select * from > hive_dynamic_inserted_partitions_db1_test_partition_table").show(false) > +--+ > > |dt | > +--+ > |2023-02-22| > +--+ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42595) Support query inserted partitions after insert data into table when hive.exec.dynamic.partition=true
[ https://issues.apache.org/jira/browse/SPARK-42595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42595: Assignee: Apache Spark > Support query inserted partitions after insert data into table when > hive.exec.dynamic.partition=true > > > Key: SPARK-42595 > URL: https://issues.apache.org/jira/browse/SPARK-42595 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: zhang haoyan >Assignee: Apache Spark >Priority: Major > > When hive.exec.dynamic.partition=true and > hive.exec.dynamic.partition.mode=nonstrict, we can insert table by sql like > 'insert overwrite table aaa partition(dt) select ', of course we can > know the partitions inserted into the table by the sql itself, but if we > want do something for common use, we need some common way to get the inserted > partitions, for example: > spark.sql("insert overwrite table aaa partition(dt) select ") > //insert table > val partitions = getInsertedPartitions() //need some way to get > inserted partitions > monitorInsertedPartitions(partitions) //do something for common use > Since insert statement should not return any data, this ticket propose to > introduce spark.hive.exec.dynamic.partition.savePartitions=true (default > false) > spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix=hive_dynamic_inserted_partitions > when spark.hive.exec.dynamic.partition.savePartitions=true we save the > partitions to the > temporary view > $spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix_$dbName_$tableName > we will allow user to do this > scala> spark.conf.set("hive.exec.dynamic.partition", true) > scala> spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict") > scala> spark.conf.set("spark.hive.exec.dynamic.partition.savePartitions", > true) > scala> spark.sql("insert overwrite table db1.test_partition_table partition > (dt) select 1, '2023-02-22'").show(false) > ++ > > || > ++ > ++ > scala> spark.sql("select * from > hive_dynamic_inserted_partitions_db1_test_partition_table").show(false) > +--+ > > |dt | > +--+ > |2023-02-22| > +--+ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42684) v2 catalog should not allow column default value by default
[ https://issues.apache.org/jira/browse/SPARK-42684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697006#comment-17697006 ] Apache Spark commented on SPARK-42684: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/40299 > v2 catalog should not allow column default value by default > --- > > Key: SPARK-42684 > URL: https://issues.apache.org/jira/browse/SPARK-42684 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42412) Initial prototype implementation for PySparkML
[ https://issues.apache.org/jira/browse/SPARK-42412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696957#comment-17696957 ] Apache Spark commented on SPARK-42412: -- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/40297 > Initial prototype implementation for PySparkML > -- > > Key: SPARK-42412 > URL: https://issues.apache.org/jira/browse/SPARK-42412 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42680) Create the helper function withSQLConf for connect's test
[ https://issues.apache.org/jira/browse/SPARK-42680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42680: Assignee: (was: Apache Spark) > Create the helper function withSQLConf for connect's test > - > > Key: SPARK-42680 > URL: https://issues.apache.org/jira/browse/SPARK-42680 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.1 >Reporter: jiaan.geng >Priority: Major > > Spark SQL have the helper function withSQLConf that is easy to change SQL > config and make test easy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42680) Create the helper function withSQLConf for connect's test
[ https://issues.apache.org/jira/browse/SPARK-42680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42680: Assignee: Apache Spark > Create the helper function withSQLConf for connect's test > - > > Key: SPARK-42680 > URL: https://issues.apache.org/jira/browse/SPARK-42680 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.1 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > Spark SQL have the helper function withSQLConf that is easy to change SQL > config and make test easy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42680) Create the helper function withSQLConf for connect's test
[ https://issues.apache.org/jira/browse/SPARK-42680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696824#comment-17696824 ] Apache Spark commented on SPARK-42680: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40296 > Create the helper function withSQLConf for connect's test > - > > Key: SPARK-42680 > URL: https://issues.apache.org/jira/browse/SPARK-42680 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.1 >Reporter: jiaan.geng >Priority: Major > > Spark SQL have the helper function withSQLConf that is easy to change SQL > config and make test easy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42681) Relax ordering constraint for ALTER TABLE ADD|REPLACE column options
[ https://issues.apache.org/jira/browse/SPARK-42681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42681: Assignee: Apache Spark > Relax ordering constraint for ALTER TABLE ADD|REPLACE column options > > > Key: SPARK-42681 > URL: https://issues.apache.org/jira/browse/SPARK-42681 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vitalii Li >Assignee: Apache Spark >Priority: Major > Fix For: 3.5.0 > > > Currently the grammar for ALTER TABLE ADD|REPLACE column is: > qualifiedColTypeWithPosition > : name=multipartIdentifier dataType (NOT NULL)? defaultExpression? > commentSpec? colPosition? > ; > This enforces a constraint on the order of: (NOT NULL, DEFAULT value, COMMENT > value FIRST|AFTER value). We can update the grammar to allow these options in > any order instead, to improve usability. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42681) Relax ordering constraint for ALTER TABLE ADD|REPLACE column options
[ https://issues.apache.org/jira/browse/SPARK-42681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42681: Assignee: (was: Apache Spark) > Relax ordering constraint for ALTER TABLE ADD|REPLACE column options > > > Key: SPARK-42681 > URL: https://issues.apache.org/jira/browse/SPARK-42681 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vitalii Li >Priority: Major > Fix For: 3.5.0 > > > Currently the grammar for ALTER TABLE ADD|REPLACE column is: > qualifiedColTypeWithPosition > : name=multipartIdentifier dataType (NOT NULL)? defaultExpression? > commentSpec? colPosition? > ; > This enforces a constraint on the order of: (NOT NULL, DEFAULT value, COMMENT > value FIRST|AFTER value). We can update the grammar to allow these options in > any order instead, to improve usability. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42681) Relax ordering constraint for ALTER TABLE ADD|REPLACE column options
[ https://issues.apache.org/jira/browse/SPARK-42681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696767#comment-17696767 ] Apache Spark commented on SPARK-42681: -- User 'vitaliili-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40295 > Relax ordering constraint for ALTER TABLE ADD|REPLACE column options > > > Key: SPARK-42681 > URL: https://issues.apache.org/jira/browse/SPARK-42681 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vitalii Li >Priority: Major > Fix For: 3.5.0 > > > Currently the grammar for ALTER TABLE ADD|REPLACE column is: > qualifiedColTypeWithPosition > : name=multipartIdentifier dataType (NOT NULL)? defaultExpression? > commentSpec? colPosition? > ; > This enforces a constraint on the order of: (NOT NULL, DEFAULT value, COMMENT > value FIRST|AFTER value). We can update the grammar to allow these options in > any order instead, to improve usability. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40610) Spark fall back to use getPartitions instead of getPartitionsByFilter when date_add functions used in where clause
[ https://issues.apache.org/jira/browse/SPARK-40610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696716#comment-17696716 ] Apache Spark commented on SPARK-40610: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/40294 > Spark fall back to use getPartitions instead of getPartitionsByFilter when > date_add functions used in where clause > --- > > Key: SPARK-40610 > URL: https://issues.apache.org/jira/browse/SPARK-40610 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 > Environment: edw.tmp_test_metastore_usage_source is a big table with > 1000 partitions and hundreds of columns >Reporter: icyjhl >Priority: Major > Attachments: spark_error.log, spark_sql.sql, sql_in_mysql.sql > > > When I run a insert overwrite statement, I got error saying: > > {code:java} > MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. > listPartitions {code} > > It's weird as I only selected for about 3 partitions, so I rerun the sql and > checked the metastore, then I found it's fetching all columns in all > partitions: > > {code:java} > select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where > "CD_ID" > in > (675384,675393,675385,675394,675396,675397,675395,675398,675399,675401,675402,675400,675406……){code} > > > After testing, I found the problem is with the date_add function in where > clause, if remove it ,sql works fine, else metastore would fetch all columns > in all partitions. > > > {code:java} > insert overwrite table test.tmp_test_metastore_usage > SELECT userid > ,SUBSTR(sendtime,1,10) AS creation_date > ,cast(json_bh_esdate_deltadays_max as DECIMAL(38,2)) AS > bh_esdate_deltadays_max > ,json_bh_qiye_industryphyname AS bh_qiye_industryphyname > ,cast(json_bh_esdate_deltadays_min as DECIMAL(38,2)) AS > bh_esdate_deltadays_min > ,cast(json_bh_subconam_min as DECIMAL(38,2)) AS bh_subconam_min > ,cast(json_bh_qiye_regcap_min as DECIMAL(38,2)) AS bh_qiye_regcap_min > ,json_bh_industryphyname AS bh_industryphyname > ,cast(json_bh_subconam_mean as DECIMAL(38,2)) AS bh_subconam_mean > ,cast(json_bh_industryphyname_nunique as DECIMAL(38,2)) AS > bh_industryphyname_nunique > ,cast(current_timestamp() as string) as dw_cre_date > ,cast(current_timestamp() as string) as dw_upd_date > FROM ( > SELECT userid > ,sendtime > ,json_bh_esdate_deltadays_max > ,json_bh_qiye_industryphyname > ,json_bh_esdate_deltadays_min > ,json_bh_subconam_min > ,json_bh_qiye_regcap_min > ,json_bh_industryphyname > ,json_bh_subconam_mean > ,json_bh_industryphyname_nunique > ,row_number() OVER ( > PARTITION BY userid,dt ORDER BY sendtime DESC > ) rn > FROM edw.tmp_test_metastore_usage_source > WHERE dt >= date_add('2022-09-22',-3 ) > AND json_bizid IN ('6101') > AND json_dingid IN ('611') > ) t > WHERE rn = 1 {code} > > By the way 2.4.7 works good. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40610) Spark fall back to use getPartitions instead of getPartitionsByFilter when date_add functions used in where clause
[ https://issues.apache.org/jira/browse/SPARK-40610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40610: Assignee: Apache Spark > Spark fall back to use getPartitions instead of getPartitionsByFilter when > date_add functions used in where clause > --- > > Key: SPARK-40610 > URL: https://issues.apache.org/jira/browse/SPARK-40610 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 > Environment: edw.tmp_test_metastore_usage_source is a big table with > 1000 partitions and hundreds of columns >Reporter: icyjhl >Assignee: Apache Spark >Priority: Major > Attachments: spark_error.log, spark_sql.sql, sql_in_mysql.sql > > > When I run a insert overwrite statement, I got error saying: > > {code:java} > MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. > listPartitions {code} > > It's weird as I only selected for about 3 partitions, so I rerun the sql and > checked the metastore, then I found it's fetching all columns in all > partitions: > > {code:java} > select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where > "CD_ID" > in > (675384,675393,675385,675394,675396,675397,675395,675398,675399,675401,675402,675400,675406……){code} > > > After testing, I found the problem is with the date_add function in where > clause, if remove it ,sql works fine, else metastore would fetch all columns > in all partitions. > > > {code:java} > insert overwrite table test.tmp_test_metastore_usage > SELECT userid > ,SUBSTR(sendtime,1,10) AS creation_date > ,cast(json_bh_esdate_deltadays_max as DECIMAL(38,2)) AS > bh_esdate_deltadays_max > ,json_bh_qiye_industryphyname AS bh_qiye_industryphyname > ,cast(json_bh_esdate_deltadays_min as DECIMAL(38,2)) AS > bh_esdate_deltadays_min > ,cast(json_bh_subconam_min as DECIMAL(38,2)) AS bh_subconam_min > ,cast(json_bh_qiye_regcap_min as DECIMAL(38,2)) AS bh_qiye_regcap_min > ,json_bh_industryphyname AS bh_industryphyname > ,cast(json_bh_subconam_mean as DECIMAL(38,2)) AS bh_subconam_mean > ,cast(json_bh_industryphyname_nunique as DECIMAL(38,2)) AS > bh_industryphyname_nunique > ,cast(current_timestamp() as string) as dw_cre_date > ,cast(current_timestamp() as string) as dw_upd_date > FROM ( > SELECT userid > ,sendtime > ,json_bh_esdate_deltadays_max > ,json_bh_qiye_industryphyname > ,json_bh_esdate_deltadays_min > ,json_bh_subconam_min > ,json_bh_qiye_regcap_min > ,json_bh_industryphyname > ,json_bh_subconam_mean > ,json_bh_industryphyname_nunique > ,row_number() OVER ( > PARTITION BY userid,dt ORDER BY sendtime DESC > ) rn > FROM edw.tmp_test_metastore_usage_source > WHERE dt >= date_add('2022-09-22',-3 ) > AND json_bizid IN ('6101') > AND json_dingid IN ('611') > ) t > WHERE rn = 1 {code} > > By the way 2.4.7 works good. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40610) Spark fall back to use getPartitions instead of getPartitionsByFilter when date_add functions used in where clause
[ https://issues.apache.org/jira/browse/SPARK-40610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40610: Assignee: (was: Apache Spark) > Spark fall back to use getPartitions instead of getPartitionsByFilter when > date_add functions used in where clause > --- > > Key: SPARK-40610 > URL: https://issues.apache.org/jira/browse/SPARK-40610 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 > Environment: edw.tmp_test_metastore_usage_source is a big table with > 1000 partitions and hundreds of columns >Reporter: icyjhl >Priority: Major > Attachments: spark_error.log, spark_sql.sql, sql_in_mysql.sql > > > When I run a insert overwrite statement, I got error saying: > > {code:java} > MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. > listPartitions {code} > > It's weird as I only selected for about 3 partitions, so I rerun the sql and > checked the metastore, then I found it's fetching all columns in all > partitions: > > {code:java} > select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where > "CD_ID" > in > (675384,675393,675385,675394,675396,675397,675395,675398,675399,675401,675402,675400,675406……){code} > > > After testing, I found the problem is with the date_add function in where > clause, if remove it ,sql works fine, else metastore would fetch all columns > in all partitions. > > > {code:java} > insert overwrite table test.tmp_test_metastore_usage > SELECT userid > ,SUBSTR(sendtime,1,10) AS creation_date > ,cast(json_bh_esdate_deltadays_max as DECIMAL(38,2)) AS > bh_esdate_deltadays_max > ,json_bh_qiye_industryphyname AS bh_qiye_industryphyname > ,cast(json_bh_esdate_deltadays_min as DECIMAL(38,2)) AS > bh_esdate_deltadays_min > ,cast(json_bh_subconam_min as DECIMAL(38,2)) AS bh_subconam_min > ,cast(json_bh_qiye_regcap_min as DECIMAL(38,2)) AS bh_qiye_regcap_min > ,json_bh_industryphyname AS bh_industryphyname > ,cast(json_bh_subconam_mean as DECIMAL(38,2)) AS bh_subconam_mean > ,cast(json_bh_industryphyname_nunique as DECIMAL(38,2)) AS > bh_industryphyname_nunique > ,cast(current_timestamp() as string) as dw_cre_date > ,cast(current_timestamp() as string) as dw_upd_date > FROM ( > SELECT userid > ,sendtime > ,json_bh_esdate_deltadays_max > ,json_bh_qiye_industryphyname > ,json_bh_esdate_deltadays_min > ,json_bh_subconam_min > ,json_bh_qiye_regcap_min > ,json_bh_industryphyname > ,json_bh_subconam_mean > ,json_bh_industryphyname_nunique > ,row_number() OVER ( > PARTITION BY userid,dt ORDER BY sendtime DESC > ) rn > FROM edw.tmp_test_metastore_usage_source > WHERE dt >= date_add('2022-09-22',-3 ) > AND json_bizid IN ('6101') > AND json_dingid IN ('611') > ) t > WHERE rn = 1 {code} > > By the way 2.4.7 works good. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40610) Spark fall back to use getPartitions instead of getPartitionsByFilter when date_add functions used in where clause
[ https://issues.apache.org/jira/browse/SPARK-40610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696715#comment-17696715 ] Apache Spark commented on SPARK-40610: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/40294 > Spark fall back to use getPartitions instead of getPartitionsByFilter when > date_add functions used in where clause > --- > > Key: SPARK-40610 > URL: https://issues.apache.org/jira/browse/SPARK-40610 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 > Environment: edw.tmp_test_metastore_usage_source is a big table with > 1000 partitions and hundreds of columns >Reporter: icyjhl >Priority: Major > Attachments: spark_error.log, spark_sql.sql, sql_in_mysql.sql > > > When I run a insert overwrite statement, I got error saying: > > {code:java} > MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. > listPartitions {code} > > It's weird as I only selected for about 3 partitions, so I rerun the sql and > checked the metastore, then I found it's fetching all columns in all > partitions: > > {code:java} > select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where > "CD_ID" > in > (675384,675393,675385,675394,675396,675397,675395,675398,675399,675401,675402,675400,675406……){code} > > > After testing, I found the problem is with the date_add function in where > clause, if remove it ,sql works fine, else metastore would fetch all columns > in all partitions. > > > {code:java} > insert overwrite table test.tmp_test_metastore_usage > SELECT userid > ,SUBSTR(sendtime,1,10) AS creation_date > ,cast(json_bh_esdate_deltadays_max as DECIMAL(38,2)) AS > bh_esdate_deltadays_max > ,json_bh_qiye_industryphyname AS bh_qiye_industryphyname > ,cast(json_bh_esdate_deltadays_min as DECIMAL(38,2)) AS > bh_esdate_deltadays_min > ,cast(json_bh_subconam_min as DECIMAL(38,2)) AS bh_subconam_min > ,cast(json_bh_qiye_regcap_min as DECIMAL(38,2)) AS bh_qiye_regcap_min > ,json_bh_industryphyname AS bh_industryphyname > ,cast(json_bh_subconam_mean as DECIMAL(38,2)) AS bh_subconam_mean > ,cast(json_bh_industryphyname_nunique as DECIMAL(38,2)) AS > bh_industryphyname_nunique > ,cast(current_timestamp() as string) as dw_cre_date > ,cast(current_timestamp() as string) as dw_upd_date > FROM ( > SELECT userid > ,sendtime > ,json_bh_esdate_deltadays_max > ,json_bh_qiye_industryphyname > ,json_bh_esdate_deltadays_min > ,json_bh_subconam_min > ,json_bh_qiye_regcap_min > ,json_bh_industryphyname > ,json_bh_subconam_mean > ,json_bh_industryphyname_nunique > ,row_number() OVER ( > PARTITION BY userid,dt ORDER BY sendtime DESC > ) rn > FROM edw.tmp_test_metastore_usage_source > WHERE dt >= date_add('2022-09-22',-3 ) > AND json_bizid IN ('6101') > AND json_dingid IN ('611') > ) t > WHERE rn = 1 {code} > > By the way 2.4.7 works good. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42677) Fix the invalid tests for broadcast hint
[ https://issues.apache.org/jira/browse/SPARK-42677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696687#comment-17696687 ] Apache Spark commented on SPARK-42677: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40293 > Fix the invalid tests for broadcast hint > > > Key: SPARK-42677 > URL: https://issues.apache.org/jira/browse/SPARK-42677 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, there are a lot of test cases for broadcast hint is invalid. > Because the data size is smaller than broadcast threshold. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42677) Fix the invalid tests for broadcast hint
[ https://issues.apache.org/jira/browse/SPARK-42677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42677: Assignee: Apache Spark > Fix the invalid tests for broadcast hint > > > Key: SPARK-42677 > URL: https://issues.apache.org/jira/browse/SPARK-42677 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > Currently, there are a lot of test cases for broadcast hint is invalid. > Because the data size is smaller than broadcast threshold. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42677) Fix the invalid tests for broadcast hint
[ https://issues.apache.org/jira/browse/SPARK-42677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696686#comment-17696686 ] Apache Spark commented on SPARK-42677: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40293 > Fix the invalid tests for broadcast hint > > > Key: SPARK-42677 > URL: https://issues.apache.org/jira/browse/SPARK-42677 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, there are a lot of test cases for broadcast hint is invalid. > Because the data size is smaller than broadcast threshold. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42677) Fix the invalid tests for broadcast hint
[ https://issues.apache.org/jira/browse/SPARK-42677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42677: Assignee: (was: Apache Spark) > Fix the invalid tests for broadcast hint > > > Key: SPARK-42677 > URL: https://issues.apache.org/jira/browse/SPARK-42677 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, there are a lot of test cases for broadcast hint is invalid. > Because the data size is smaller than broadcast threshold. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42676) Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently
[ https://issues.apache.org/jira/browse/SPARK-42676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696680#comment-17696680 ] Apache Spark commented on SPARK-42676: -- User 'anishshri-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40292 > Write temp checkpoints for streaming queries to local filesystem even if > default FS is set differently > -- > > Key: SPARK-42676 > URL: https://issues.apache.org/jira/browse/SPARK-42676 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > Write temp checkpoints for streaming queries to local filesystem even if > default FS is set differently > > We have seen cases where the default FS could be a remote file system and > since the path for streaming checkpoints is not specified explcitily, this > could cause pileup under 2 cases: > * query exits with exception and the flag to force checkpoint removal is not > set > * driver/cluster terminates without query being terminated gracefully -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42676) Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently
[ https://issues.apache.org/jira/browse/SPARK-42676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696678#comment-17696678 ] Apache Spark commented on SPARK-42676: -- User 'anishshri-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40292 > Write temp checkpoints for streaming queries to local filesystem even if > default FS is set differently > -- > > Key: SPARK-42676 > URL: https://issues.apache.org/jira/browse/SPARK-42676 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > Write temp checkpoints for streaming queries to local filesystem even if > default FS is set differently > > We have seen cases where the default FS could be a remote file system and > since the path for streaming checkpoints is not specified explcitily, this > could cause pileup under 2 cases: > * query exits with exception and the flag to force checkpoint removal is not > set > * driver/cluster terminates without query being terminated gracefully -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42676) Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently
[ https://issues.apache.org/jira/browse/SPARK-42676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42676: Assignee: (was: Apache Spark) > Write temp checkpoints for streaming queries to local filesystem even if > default FS is set differently > -- > > Key: SPARK-42676 > URL: https://issues.apache.org/jira/browse/SPARK-42676 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > Write temp checkpoints for streaming queries to local filesystem even if > default FS is set differently > > We have seen cases where the default FS could be a remote file system and > since the path for streaming checkpoints is not specified explcitily, this > could cause pileup under 2 cases: > * query exits with exception and the flag to force checkpoint removal is not > set > * driver/cluster terminates without query being terminated gracefully -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42676) Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently
[ https://issues.apache.org/jira/browse/SPARK-42676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42676: Assignee: Apache Spark > Write temp checkpoints for streaming queries to local filesystem even if > default FS is set differently > -- > > Key: SPARK-42676 > URL: https://issues.apache.org/jira/browse/SPARK-42676 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Assignee: Apache Spark >Priority: Major > > Write temp checkpoints for streaming queries to local filesystem even if > default FS is set differently > > We have seen cases where the default FS could be a remote file system and > since the path for streaming checkpoints is not specified explcitily, this > could cause pileup under 2 cases: > * query exits with exception and the flag to force checkpoint removal is not > set > * driver/cluster terminates without query being terminated gracefully -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42578) Add JDBC to DataFrameWriter
[ https://issues.apache.org/jira/browse/SPARK-42578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42578: Assignee: (was: Apache Spark) > Add JDBC to DataFrameWriter > --- > > Key: SPARK-42578 > URL: https://issues.apache.org/jira/browse/SPARK-42578 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42578) Add JDBC to DataFrameWriter
[ https://issues.apache.org/jira/browse/SPARK-42578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42578: Assignee: Apache Spark > Add JDBC to DataFrameWriter > --- > > Key: SPARK-42578 > URL: https://issues.apache.org/jira/browse/SPARK-42578 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42578) Add JDBC to DataFrameWriter
[ https://issues.apache.org/jira/browse/SPARK-42578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696671#comment-17696671 ] Apache Spark commented on SPARK-42578: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40291 > Add JDBC to DataFrameWriter > --- > > Key: SPARK-42578 > URL: https://issues.apache.org/jira/browse/SPARK-42578 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42478) Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory
[ https://issues.apache.org/jira/browse/SPARK-42478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696668#comment-17696668 ] Apache Spark commented on SPARK-42478: -- User 'Yikf' has created a pull request for this issue: https://github.com/apache/spark/pull/40289 > Make a serializable jobTrackerId instead of a non-serializable JobID in > FileWriterFactory > - > > Key: SPARK-42478 > URL: https://issues.apache.org/jira/browse/SPARK-42478 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2 >Reporter: yikaifei >Assignee: yikaifei >Priority: Major > Fix For: 3.4.0 > > > https://issues.apache.org/jira/browse/SPARK-41448 make consistent MR job IDs > in FileBatchWriter and FileFormatWriter, but it breaks a serializable issue, > JobId is non-serializable -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42478) Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory
[ https://issues.apache.org/jira/browse/SPARK-42478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696667#comment-17696667 ] Apache Spark commented on SPARK-42478: -- User 'Yikf' has created a pull request for this issue: https://github.com/apache/spark/pull/40290 > Make a serializable jobTrackerId instead of a non-serializable JobID in > FileWriterFactory > - > > Key: SPARK-42478 > URL: https://issues.apache.org/jira/browse/SPARK-42478 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2 >Reporter: yikaifei >Assignee: yikaifei >Priority: Major > Fix For: 3.4.0 > > > https://issues.apache.org/jira/browse/SPARK-41448 make consistent MR job IDs > in FileBatchWriter and FileFormatWriter, but it breaks a serializable issue, > JobId is non-serializable -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42478) Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory
[ https://issues.apache.org/jira/browse/SPARK-42478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1769#comment-1769 ] Apache Spark commented on SPARK-42478: -- User 'Yikf' has created a pull request for this issue: https://github.com/apache/spark/pull/40289 > Make a serializable jobTrackerId instead of a non-serializable JobID in > FileWriterFactory > - > > Key: SPARK-42478 > URL: https://issues.apache.org/jira/browse/SPARK-42478 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2 >Reporter: yikaifei >Assignee: yikaifei >Priority: Major > Fix For: 3.4.0 > > > https://issues.apache.org/jira/browse/SPARK-41448 make consistent MR job IDs > in FileBatchWriter and FileFormatWriter, but it breaks a serializable issue, > JobId is non-serializable -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42496) Introduction Spark Connect at main page.
[ https://issues.apache.org/jira/browse/SPARK-42496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42496: Assignee: (was: Apache Spark) > Introduction Spark Connect at main page. > > > Key: SPARK-42496 > URL: https://issues.apache.org/jira/browse/SPARK-42496 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should document the introduction of Spark Connect at PySpark main > documentation page to give a summary to users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42496) Introduction Spark Connect at main page.
[ https://issues.apache.org/jira/browse/SPARK-42496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42496: Assignee: Apache Spark > Introduction Spark Connect at main page. > > > Key: SPARK-42496 > URL: https://issues.apache.org/jira/browse/SPARK-42496 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > We should document the introduction of Spark Connect at PySpark main > documentation page to give a summary to users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42496) Introduction Spark Connect at main page.
[ https://issues.apache.org/jira/browse/SPARK-42496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696655#comment-17696655 ] Apache Spark commented on SPARK-42496: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40288 > Introduction Spark Connect at main page. > > > Key: SPARK-42496 > URL: https://issues.apache.org/jira/browse/SPARK-42496 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should document the introduction of Spark Connect at PySpark main > documentation page to give a summary to users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42562) UnresolvedLambdaVariables in python do not need unique names
[ https://issues.apache.org/jira/browse/SPARK-42562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42562: Assignee: (was: Apache Spark) > UnresolvedLambdaVariables in python do not need unique names > > > Key: SPARK-42562 > URL: https://issues.apache.org/jira/browse/SPARK-42562 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > UnresolvedLambdaVariables do not need unique names in python. We already did > this for the scala client, and it is good to have parity between the two > implementations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42562) UnresolvedLambdaVariables in python do not need unique names
[ https://issues.apache.org/jira/browse/SPARK-42562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696647#comment-17696647 ] Apache Spark commented on SPARK-42562: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40287 > UnresolvedLambdaVariables in python do not need unique names > > > Key: SPARK-42562 > URL: https://issues.apache.org/jira/browse/SPARK-42562 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > UnresolvedLambdaVariables do not need unique names in python. We already did > this for the scala client, and it is good to have parity between the two > implementations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42562) UnresolvedLambdaVariables in python do not need unique names
[ https://issues.apache.org/jira/browse/SPARK-42562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42562: Assignee: Apache Spark > UnresolvedLambdaVariables in python do not need unique names > > > Key: SPARK-42562 > URL: https://issues.apache.org/jira/browse/SPARK-42562 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > > UnresolvedLambdaVariables do not need unique names in python. We already did > this for the scala client, and it is good to have parity between the two > implementations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42577) A large stage could run indefinitely due to executor lost
[ https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696570#comment-17696570 ] Apache Spark commented on SPARK-42577: -- User 'ivoson' has created a pull request for this issue: https://github.com/apache/spark/pull/40286 > A large stage could run indefinitely due to executor lost > - > > Key: SPARK-42577 > URL: https://issues.apache.org/jira/browse/SPARK-42577 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2 >Reporter: wuyi >Priority: Major > > When a stage is extremely large and Spark runs on spot instances or > problematic clusters with frequent worker/executor loss, the stage could run > indefinitely due to task rerun caused by the executor loss. This happens, > when the external shuffle service is on, and the large stages runs hours to > complete, when spark tries to submit a child stage, it will find the parent > stage - the large one, has missed some partitions, so the large stage has to > rerun. When it completes again, it finds new missing partitions due to the > same reason. > We should add a attempt limitation for this kind of scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42577) A large stage could run indefinitely due to executor lost
[ https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42577: Assignee: (was: Apache Spark) > A large stage could run indefinitely due to executor lost > - > > Key: SPARK-42577 > URL: https://issues.apache.org/jira/browse/SPARK-42577 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2 >Reporter: wuyi >Priority: Major > > When a stage is extremely large and Spark runs on spot instances or > problematic clusters with frequent worker/executor loss, the stage could run > indefinitely due to task rerun caused by the executor loss. This happens, > when the external shuffle service is on, and the large stages runs hours to > complete, when spark tries to submit a child stage, it will find the parent > stage - the large one, has missed some partitions, so the large stage has to > rerun. When it completes again, it finds new missing partitions due to the > same reason. > We should add a attempt limitation for this kind of scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42577) A large stage could run indefinitely due to executor lost
[ https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696569#comment-17696569 ] Apache Spark commented on SPARK-42577: -- User 'ivoson' has created a pull request for this issue: https://github.com/apache/spark/pull/40286 > A large stage could run indefinitely due to executor lost > - > > Key: SPARK-42577 > URL: https://issues.apache.org/jira/browse/SPARK-42577 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2 >Reporter: wuyi >Priority: Major > > When a stage is extremely large and Spark runs on spot instances or > problematic clusters with frequent worker/executor loss, the stage could run > indefinitely due to task rerun caused by the executor loss. This happens, > when the external shuffle service is on, and the large stages runs hours to > complete, when spark tries to submit a child stage, it will find the parent > stage - the large one, has missed some partitions, so the large stage has to > rerun. When it completes again, it finds new missing partitions due to the > same reason. > We should add a attempt limitation for this kind of scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42577) A large stage could run indefinitely due to executor lost
[ https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42577: Assignee: Apache Spark > A large stage could run indefinitely due to executor lost > - > > Key: SPARK-42577 > URL: https://issues.apache.org/jira/browse/SPARK-42577 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2 >Reporter: wuyi >Assignee: Apache Spark >Priority: Major > > When a stage is extremely large and Spark runs on spot instances or > problematic clusters with frequent worker/executor loss, the stage could run > indefinitely due to task rerun caused by the executor loss. This happens, > when the external shuffle service is on, and the large stages runs hours to > complete, when spark tries to submit a child stage, it will find the parent > stage - the large one, has missed some partitions, so the large stage has to > rerun. When it completes again, it finds new missing partitions due to the > same reason. > We should add a attempt limitation for this kind of scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42675) Should clean up temp view after test
[ https://issues.apache.org/jira/browse/SPARK-42675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42675: Assignee: (was: Apache Spark) > Should clean up temp view after test > > > Key: SPARK-42675 > URL: https://issues.apache.org/jira/browse/SPARK-42675 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42675) Should clean up temp view after test
[ https://issues.apache.org/jira/browse/SPARK-42675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696564#comment-17696564 ] Apache Spark commented on SPARK-42675: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40285 > Should clean up temp view after test > > > Key: SPARK-42675 > URL: https://issues.apache.org/jira/browse/SPARK-42675 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42675) Should clean up temp view after test
[ https://issues.apache.org/jira/browse/SPARK-42675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42675: Assignee: Apache Spark > Should clean up temp view after test > > > Key: SPARK-42675 > URL: https://issues.apache.org/jira/browse/SPARK-42675 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42675) Should clean up temp view after test
[ https://issues.apache.org/jira/browse/SPARK-42675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42675: Assignee: Apache Spark > Should clean up temp view after test > > > Key: SPARK-42675 > URL: https://issues.apache.org/jira/browse/SPARK-42675 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42674) Upgrade scalafmt from 3.7.1 to 3.7.2
[ https://issues.apache.org/jira/browse/SPARK-42674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42674: Assignee: (was: Apache Spark) > Upgrade scalafmt from 3.7.1 to 3.7.2 > - > > Key: SPARK-42674 > URL: https://issues.apache.org/jira/browse/SPARK-42674 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.1 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42674) Upgrade scalafmt from 3.7.1 to 3.7.2
[ https://issues.apache.org/jira/browse/SPARK-42674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42674: Assignee: Apache Spark > Upgrade scalafmt from 3.7.1 to 3.7.2 > - > > Key: SPARK-42674 > URL: https://issues.apache.org/jira/browse/SPARK-42674 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.1 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42674) Upgrade scalafmt from 3.7.1 to 3.7.2
[ https://issues.apache.org/jira/browse/SPARK-42674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696539#comment-17696539 ] Apache Spark commented on SPARK-42674: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40284 > Upgrade scalafmt from 3.7.1 to 3.7.2 > - > > Key: SPARK-42674 > URL: https://issues.apache.org/jira/browse/SPARK-42674 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.1 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42673) Ban maven 3.9.x for Spark Build
[ https://issues.apache.org/jira/browse/SPARK-42673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42673: Assignee: (was: Apache Spark) > Ban maven 3.9.x for Spark Build > --- > > Key: SPARK-42673 > URL: https://issues.apache.org/jira/browse/SPARK-42673 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > [ERROR] An error occurred attempting to read POM > org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml > decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen version="1.0" encoding="ISO-8859-1"... @1:42) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDeclWithVersion > (MXParser.java:3423) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDecl > (MXParser.java:3345) > at org.codehaus.plexus.util.xml.pull.MXParser.parsePI (MXParser.java:3197) > at org.codehaus.plexus.util.xml.pull.MXParser.parseProlog > (MXParser.java:1828) > at org.codehaus.plexus.util.xml.pull.MXParser.nextImpl > (MXParser.java:1757) > at org.codehaus.plexus.util.xml.pull.MXParser.next (MXParser.java:1375) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:3940) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:612) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:627) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:759) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:746) > at org.cyclonedx.maven.BaseCycloneDxMojo.retrieveParentProject > (BaseCycloneDxMojo.java:694) > at org.cyclonedx.maven.BaseCycloneDxMojo.getClosestMetadata > (BaseCycloneDxMojo.java:524) > at org.cyclonedx.maven.BaseCycloneDxMojo.convert > (BaseCycloneDxMojo.java:481) > at org.cyclonedx.maven.CycloneDxMojo.execute (CycloneDxMojo.java:70) > at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo > (DefaultBuildPluginManager.java:126) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 > (MojoExecutor.java:342) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute > (MojoExecutor.java:330) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:213) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:175) > at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 > (MojoExecutor.java:76) > at org.apache.maven.lifecycle.internal.MojoExecutor$1.run > (MojoExecutor.java:163) > at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute > (DefaultMojosExecutionStrategy.java:39) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:160) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:105) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:73) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build > (SingleThreadedBuilder.java:53) > at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:118) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:260) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:172) > at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:100) > at org.apache.maven.cli.MavenCli.execute (MavenCli.java:821) > at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:270) > at org.apache.maven.cli.MavenCli.main (MavenCli.java:192) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:282) > at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:225) > at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:406) > at org.codehaus.plexus.classworlds.launcher.Launcher.main > (Launcher.java:347) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42673) Ban maven 3.9.x for Spark Build
[ https://issues.apache.org/jira/browse/SPARK-42673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42673: Assignee: Apache Spark > Ban maven 3.9.x for Spark Build > --- > > Key: SPARK-42673 > URL: https://issues.apache.org/jira/browse/SPARK-42673 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > {code:java} > [ERROR] An error occurred attempting to read POM > org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml > decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen version="1.0" encoding="ISO-8859-1"... @1:42) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDeclWithVersion > (MXParser.java:3423) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDecl > (MXParser.java:3345) > at org.codehaus.plexus.util.xml.pull.MXParser.parsePI (MXParser.java:3197) > at org.codehaus.plexus.util.xml.pull.MXParser.parseProlog > (MXParser.java:1828) > at org.codehaus.plexus.util.xml.pull.MXParser.nextImpl > (MXParser.java:1757) > at org.codehaus.plexus.util.xml.pull.MXParser.next (MXParser.java:1375) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:3940) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:612) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:627) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:759) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:746) > at org.cyclonedx.maven.BaseCycloneDxMojo.retrieveParentProject > (BaseCycloneDxMojo.java:694) > at org.cyclonedx.maven.BaseCycloneDxMojo.getClosestMetadata > (BaseCycloneDxMojo.java:524) > at org.cyclonedx.maven.BaseCycloneDxMojo.convert > (BaseCycloneDxMojo.java:481) > at org.cyclonedx.maven.CycloneDxMojo.execute (CycloneDxMojo.java:70) > at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo > (DefaultBuildPluginManager.java:126) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 > (MojoExecutor.java:342) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute > (MojoExecutor.java:330) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:213) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:175) > at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 > (MojoExecutor.java:76) > at org.apache.maven.lifecycle.internal.MojoExecutor$1.run > (MojoExecutor.java:163) > at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute > (DefaultMojosExecutionStrategy.java:39) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:160) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:105) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:73) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build > (SingleThreadedBuilder.java:53) > at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:118) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:260) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:172) > at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:100) > at org.apache.maven.cli.MavenCli.execute (MavenCli.java:821) > at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:270) > at org.apache.maven.cli.MavenCli.main (MavenCli.java:192) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:282) > at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:225) > at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:406) > at org.codehaus.plexus.classworlds.launcher.Launcher.main > (Launcher.java:347) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42673) Ban maven 3.9.x for Spark Build
[ https://issues.apache.org/jira/browse/SPARK-42673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696537#comment-17696537 ] Apache Spark commented on SPARK-42673: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40283 > Ban maven 3.9.x for Spark Build > --- > > Key: SPARK-42673 > URL: https://issues.apache.org/jira/browse/SPARK-42673 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > [ERROR] An error occurred attempting to read POM > org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml > decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen version="1.0" encoding="ISO-8859-1"... @1:42) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDeclWithVersion > (MXParser.java:3423) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDecl > (MXParser.java:3345) > at org.codehaus.plexus.util.xml.pull.MXParser.parsePI (MXParser.java:3197) > at org.codehaus.plexus.util.xml.pull.MXParser.parseProlog > (MXParser.java:1828) > at org.codehaus.plexus.util.xml.pull.MXParser.nextImpl > (MXParser.java:1757) > at org.codehaus.plexus.util.xml.pull.MXParser.next (MXParser.java:1375) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:3940) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:612) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:627) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:759) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:746) > at org.cyclonedx.maven.BaseCycloneDxMojo.retrieveParentProject > (BaseCycloneDxMojo.java:694) > at org.cyclonedx.maven.BaseCycloneDxMojo.getClosestMetadata > (BaseCycloneDxMojo.java:524) > at org.cyclonedx.maven.BaseCycloneDxMojo.convert > (BaseCycloneDxMojo.java:481) > at org.cyclonedx.maven.CycloneDxMojo.execute (CycloneDxMojo.java:70) > at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo > (DefaultBuildPluginManager.java:126) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 > (MojoExecutor.java:342) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute > (MojoExecutor.java:330) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:213) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:175) > at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 > (MojoExecutor.java:76) > at org.apache.maven.lifecycle.internal.MojoExecutor$1.run > (MojoExecutor.java:163) > at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute > (DefaultMojosExecutionStrategy.java:39) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:160) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:105) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:73) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build > (SingleThreadedBuilder.java:53) > at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:118) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:260) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:172) > at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:100) > at org.apache.maven.cli.MavenCli.execute (MavenCli.java:821) > at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:270) > at org.apache.maven.cli.MavenCli.main (MavenCli.java:192) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:282) > at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:225) > at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:406) > at org.codehaus.plexus.classworlds.launcher.Launcher.main > (Launcher.java:347) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For addition
[jira] [Commented] (SPARK-42672) Document error class list
[ https://issues.apache.org/jira/browse/SPARK-42672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696536#comment-17696536 ] Apache Spark commented on SPARK-42672: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40282 > Document error class list > - > > Key: SPARK-42672 > URL: https://issues.apache.org/jira/browse/SPARK-42672 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42672) Document error class list
[ https://issues.apache.org/jira/browse/SPARK-42672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696535#comment-17696535 ] Apache Spark commented on SPARK-42672: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40282 > Document error class list > - > > Key: SPARK-42672 > URL: https://issues.apache.org/jira/browse/SPARK-42672 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42672) Document error class list
[ https://issues.apache.org/jira/browse/SPARK-42672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42672: Assignee: (was: Apache Spark) > Document error class list > - > > Key: SPARK-42672 > URL: https://issues.apache.org/jira/browse/SPARK-42672 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42672) Document error class list
[ https://issues.apache.org/jira/browse/SPARK-42672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42672: Assignee: Apache Spark > Document error class list > - > > Key: SPARK-42672 > URL: https://issues.apache.org/jira/browse/SPARK-42672 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41497) Accumulator undercounting in the case of retry task with rdd cache
[ https://issues.apache.org/jira/browse/SPARK-41497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696526#comment-17696526 ] Apache Spark commented on SPARK-41497: -- User 'ivoson' has created a pull request for this issue: https://github.com/apache/spark/pull/40281 > Accumulator undercounting in the case of retry task with rdd cache > -- > > Key: SPARK-41497 > URL: https://issues.apache.org/jira/browse/SPARK-41497 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.8, 3.0.3, 3.1.3, 3.2.2, 3.3.1 >Reporter: wuyi >Assignee: Tengfei Huang >Priority: Major > Fix For: 3.5.0 > > > Accumulator could be undercounted when the retried task has rdd cache. See > the example below and you could also find the completed and reproducible > example at > [https://github.com/apache/spark/compare/master...Ngone51:spark:fix-acc] > > {code:scala} > test("SPARK-XXX") { > // Set up a cluster with 2 executors > val conf = new SparkConf() > .setMaster("local-cluster[2, 1, > 1024]").setAppName("TaskSchedulerImplSuite") > sc = new SparkContext(conf) > // Set up a custom task scheduler. The scheduler will fail the first task > attempt of the job > // submitted below. In particular, the failed first attempt task would > success on computation > // (accumulator accounting, result caching) but only fail to report its > success status due > // to the concurrent executor lost. The second task attempt would success. > taskScheduler = setupSchedulerWithCustomStatusUpdate(sc) > val myAcc = sc.longAccumulator("myAcc") > // Initiate a rdd with only one partition so there's only one task and > specify the storage level > // with MEMORY_ONLY_2 so that the rdd result will be cached on both two > executors. > val rdd = sc.parallelize(0 until 10, 1).mapPartitions { iter => > myAcc.add(100) > iter.map(x => x + 1) > }.persist(StorageLevel.MEMORY_ONLY_2) > // This will pass since the second task attempt will succeed > assert(rdd.count() === 10) > // This will fail due to `myAcc.add(100)` won't be executed during the > second task attempt's > // execution. Because the second task attempt will load the rdd cache > directly instead of > // executing the task function so `myAcc.add(100)` is skipped. > assert(myAcc.value === 100) > } {code} > > We could also hit this issue with decommission even if the rdd only has one > copy. For example, decommission could migrate the rdd cache block to another > executor (the result is actually the same with 2 copies) and the > decommissioned executor lost before the task reports its success status to > the driver. > > And the issue is a bit more complicated than expected to fix. I have tried to > give some fixes but all of them are not ideal: > Option 1: Clean up any rdd cache related to the failed task: in practice, > this option can already fix the issue in most cases. However, theoretically, > rdd cache could be reported to the driver right after the driver cleans up > the failed task's caches due to asynchronous communication. So this option > can’t resolve the issue thoroughly; > Option 2: Disallow rdd cache reuse across the task attempts for the same > task: this option can 100% fix the issue. The problem is this way can also > affect the case where rdd cache can be reused across the attempts (e.g., when > there is no accumulator operation in the task), which can have perf > regression; > Option 3: Introduce accumulator cache: first, this requires a new framework > for supporting accumulator cache; second, the driver should improve its logic > to distinguish whether the accumulator cache value should be reported to the > user to avoid overcounting. For example, in the case of task retry, the value > should be reported. However, in the case of rdd cache reuse, the value > shouldn’t be reported (should it?); > Option 4: Do task success validation when a task trying to load the rdd > cache: this way defines a rdd cache is only valid/accessible if the task has > succeeded. This way could be either overkill or a bit complex (because > currently Spark would clean up the task state once it’s finished. So we need > to maintain a structure to know if task once succeeded or not. ) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42671) Fix bug for createDataFrame from complex type schema
[ https://issues.apache.org/jira/browse/SPARK-42671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696504#comment-17696504 ] Apache Spark commented on SPARK-42671: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40280 > Fix bug for createDataFrame from complex type schema > > > Key: SPARK-42671 > URL: https://issues.apache.org/jira/browse/SPARK-42671 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.1 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42671) Fix bug for createDataFrame from complex type schema
[ https://issues.apache.org/jira/browse/SPARK-42671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42671: Assignee: (was: Apache Spark) > Fix bug for createDataFrame from complex type schema > > > Key: SPARK-42671 > URL: https://issues.apache.org/jira/browse/SPARK-42671 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.1 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42671) Fix bug for createDataFrame from complex type schema
[ https://issues.apache.org/jira/browse/SPARK-42671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696503#comment-17696503 ] Apache Spark commented on SPARK-42671: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40280 > Fix bug for createDataFrame from complex type schema > > > Key: SPARK-42671 > URL: https://issues.apache.org/jira/browse/SPARK-42671 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.1 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42671) Fix bug for createDataFrame from complex type schema
[ https://issues.apache.org/jira/browse/SPARK-42671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42671: Assignee: Apache Spark > Fix bug for createDataFrame from complex type schema > > > Key: SPARK-42671 > URL: https://issues.apache.org/jira/browse/SPARK-42671 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.1 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42670) Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings
[ https://issues.apache.org/jira/browse/SPARK-42670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42670: Assignee: (was: Apache Spark) > Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings > > > Key: SPARK-42670 > URL: https://issues.apache.org/jira/browse/SPARK-42670 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42670) Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings
[ https://issues.apache.org/jira/browse/SPARK-42670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42670: Assignee: Apache Spark > Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings > > > Key: SPARK-42670 > URL: https://issues.apache.org/jira/browse/SPARK-42670 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42670) Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings
[ https://issues.apache.org/jira/browse/SPARK-42670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696450#comment-17696450 ] Apache Spark commented on SPARK-42670: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40278 > Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings > > > Key: SPARK-42670 > URL: https://issues.apache.org/jira/browse/SPARK-42670 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42555) Add JDBC to DataFrameReader
[ https://issues.apache.org/jira/browse/SPARK-42555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696410#comment-17696410 ] Apache Spark commented on SPARK-42555: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40277 > Add JDBC to DataFrameReader > --- > > Key: SPARK-42555 > URL: https://issues.apache.org/jira/browse/SPARK-42555 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`
[ https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696392#comment-17696392 ] Apache Spark commented on SPARK-42630: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40276 > Make `parse_data_type` use new proto message `DDLParse` > --- > > Key: SPARK-42630 > URL: https://issues.apache.org/jira/browse/SPARK-42630 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`
[ https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696391#comment-17696391 ] Apache Spark commented on SPARK-42630: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40276 > Make `parse_data_type` use new proto message `DDLParse` > --- > > Key: SPARK-42630 > URL: https://issues.apache.org/jira/browse/SPARK-42630 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42557) Add Broadcast to functions
[ https://issues.apache.org/jira/browse/SPARK-42557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42557: Assignee: (was: Apache Spark) > Add Broadcast to functions > -- > > Key: SPARK-42557 > URL: https://issues.apache.org/jira/browse/SPARK-42557 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > Add the {{broadcast}} function to functions.scala. Please check if we can get > the same semantics as the current implementation using unresolved hints. > https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42557) Add Broadcast to functions
[ https://issues.apache.org/jira/browse/SPARK-42557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42557: Assignee: Apache Spark > Add Broadcast to functions > -- > > Key: SPARK-42557 > URL: https://issues.apache.org/jira/browse/SPARK-42557 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > > Add the {{broadcast}} function to functions.scala. Please check if we can get > the same semantics as the current implementation using unresolved hints. > https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42557) Add Broadcast to functions
[ https://issues.apache.org/jira/browse/SPARK-42557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696387#comment-17696387 ] Apache Spark commented on SPARK-42557: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40275 > Add Broadcast to functions > -- > > Key: SPARK-42557 > URL: https://issues.apache.org/jira/browse/SPARK-42557 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > Add the {{broadcast}} function to functions.scala. Please check if we can get > the same semantics as the current implementation using unresolved hints. > https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42215) Better Scala Client Integration test
[ https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42215: Assignee: (was: Apache Spark) > Better Scala Client Integration test > > > Key: SPARK-42215 > URL: https://issues.apache.org/jira/browse/SPARK-42215 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > The current Scala client has a few integration tests that requires a build > first before running client tests. This is not very nice to maven developers > as they will not be able to do a `mvn clean install` to run all tests. > > Look into marking these test as ITs and other better ways for maven to run > test after packages are built. > > Make sure the test run in SBT as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42215) Better Scala Client Integration test
[ https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696378#comment-17696378 ] Apache Spark commented on SPARK-42215: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40274 > Better Scala Client Integration test > > > Key: SPARK-42215 > URL: https://issues.apache.org/jira/browse/SPARK-42215 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > The current Scala client has a few integration tests that requires a build > first before running client tests. This is not very nice to maven developers > as they will not be able to do a `mvn clean install` to run all tests. > > Look into marking these test as ITs and other better ways for maven to run > test after packages are built. > > Make sure the test run in SBT as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42215) Better Scala Client Integration test
[ https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42215: Assignee: Apache Spark > Better Scala Client Integration test > > > Key: SPARK-42215 > URL: https://issues.apache.org/jira/browse/SPARK-42215 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Apache Spark >Priority: Major > > The current Scala client has a few integration tests that requires a build > first before running client tests. This is not very nice to maven developers > as they will not be able to do a `mvn clean install` to run all tests. > > Look into marking these test as ITs and other better ways for maven to run > test after packages are built. > > Make sure the test run in SBT as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42668) Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort
[ https://issues.apache.org/jira/browse/SPARK-42668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696374#comment-17696374 ] Apache Spark commented on SPARK-42668: -- User 'anishshri-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40273 > Catch exception while trying to close compressed stream in > HDFSStateStoreProvider abort > --- > > Key: SPARK-42668 > URL: https://issues.apache.org/jira/browse/SPARK-42668 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > Catch exception while trying to close compressed stream in > HDFSStateStoreProvider abort > We have seen some cases where the task exits as cancelled/failed which > triggers the abort in the task completion listener for > HDFSStateStoreProvider. As part of this, we cancel the backing stream and > close the compressed stream. However, different stores such as Azure blob > store could throw exceptions which are not caught in the current path, > leading to job failures. This change proposes to fix this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42668) Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort
[ https://issues.apache.org/jira/browse/SPARK-42668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42668: Assignee: (was: Apache Spark) > Catch exception while trying to close compressed stream in > HDFSStateStoreProvider abort > --- > > Key: SPARK-42668 > URL: https://issues.apache.org/jira/browse/SPARK-42668 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > Catch exception while trying to close compressed stream in > HDFSStateStoreProvider abort > We have seen some cases where the task exits as cancelled/failed which > triggers the abort in the task completion listener for > HDFSStateStoreProvider. As part of this, we cancel the backing stream and > close the compressed stream. However, different stores such as Azure blob > store could throw exceptions which are not caught in the current path, > leading to job failures. This change proposes to fix this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42668) Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort
[ https://issues.apache.org/jira/browse/SPARK-42668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42668: Assignee: Apache Spark > Catch exception while trying to close compressed stream in > HDFSStateStoreProvider abort > --- > > Key: SPARK-42668 > URL: https://issues.apache.org/jira/browse/SPARK-42668 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Assignee: Apache Spark >Priority: Major > > Catch exception while trying to close compressed stream in > HDFSStateStoreProvider abort > We have seen some cases where the task exits as cancelled/failed which > triggers the abort in the task completion listener for > HDFSStateStoreProvider. As part of this, we cancel the backing stream and > close the compressed stream. However, different stores such as Azure blob > store could throw exceptions which are not caught in the current path, > leading to job failures. This change proposes to fix this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42667) Spark Connect: newSession API
[ https://issues.apache.org/jira/browse/SPARK-42667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696350#comment-17696350 ] Apache Spark commented on SPARK-42667: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40272 > Spark Connect: newSession API > - > > Key: SPARK-42667 > URL: https://issues.apache.org/jira/browse/SPARK-42667 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.1 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org