[jira] [Resolved] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`
[ https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48523. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46862 [https://github.com/apache/spark/pull/46862] > Add `grpc_max_message_size ` description to `client-connection-string.md` > - > > Key: SPARK-48523 > URL: https://issues.apache.org/jira/browse/SPARK-48523 > Project: Spark > Issue Type: Improvement > Components: Connect, Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`
[ https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48523: Assignee: BingKun Pan > Add `grpc_max_message_size ` description to `client-connection-string.md` > - > > Key: SPARK-48523 > URL: https://issues.apache.org/jira/browse/SPARK-48523 > Project: Spark > Issue Type: Improvement > Components: Connect, Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48485) Support interruptTag and interruptAll in streaming queries
[ https://issues.apache.org/jira/browse/SPARK-48485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48485. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46819 [https://github.com/apache/spark/pull/46819] > Support interruptTag and interruptAll in streaming queries > -- > > Key: SPARK-48485 > URL: https://issues.apache.org/jira/browse/SPARK-48485 > Project: Spark > Issue Type: Improvement > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Spark Connect's interrupt API does not interrupt streaming queries. We should > support them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
[ https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48482: Assignee: Wei Liu > dropDuplicates and dropDuplicatesWithinWatermark should accept varargs > -- > > Key: SPARK-48482 > URL: https://issues.apache.org/jira/browse/SPARK-48482 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
[ https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48482. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46817 [https://github.com/apache/spark/pull/46817] > dropDuplicates and dropDuplicatesWithinWatermark should accept varargs > -- > > Key: SPARK-48482 > URL: https://issues.apache.org/jira/browse/SPARK-48482 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48508) Client Side RPC optimization for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48508: Assignee: Ruifeng Zheng > Client Side RPC optimization for Spark Connect > -- > > Key: SPARK-48508 > URL: https://issues.apache.org/jira/browse/SPARK-48508 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48508) Client Side RPC optimization for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48508. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46848 [https://github.com/apache/spark/pull/46848] > Client Side RPC optimization for Spark Connect > -- > > Key: SPARK-48508 > URL: https://issues.apache.org/jira/browse/SPARK-48508 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48507) Use Hadoop 3.3.6 winutils in `build_sparkr_window`
[ https://issues.apache.org/jira/browse/SPARK-48507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48507. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46846 [https://github.com/apache/spark/pull/46846] > Use Hadoop 3.3.6 winutils in `build_sparkr_window` > -- > > Key: SPARK-48507 > URL: https://issues.apache.org/jira/browse/SPARK-48507 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48507) Use Hadoop 3.3.6 winutils in `build_sparkr_window`
[ https://issues.apache.org/jira/browse/SPARK-48507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48507: Assignee: BingKun Pan > Use Hadoop 3.3.6 winutils in `build_sparkr_window` > -- > > Key: SPARK-48507 > URL: https://issues.apache.org/jira/browse/SPARK-48507 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48504) Parent Window class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-48504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48504: Assignee: Ruifeng Zheng > Parent Window class for Spark Connect and Spark Classic > --- > > Key: SPARK-48504 > URL: https://issues.apache.org/jira/browse/SPARK-48504 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48504) Parent Window class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-48504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48504. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46841 [https://github.com/apache/spark/pull/46841] > Parent Window class for Spark Connect and Spark Classic > --- > > Key: SPARK-48504 > URL: https://issues.apache.org/jira/browse/SPARK-48504 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48496) Use static regex Pattern instances in common/utils JavaUtils
[ https://issues.apache.org/jira/browse/SPARK-48496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48496. -- Fix Version/s: 4.0.0 Resolution: Fixed Fixed in https://github.com/apache/spark/pull/46829 > Use static regex Pattern instances in common/utils JavaUtils > > > Key: SPARK-48496 > URL: https://issues.apache.org/jira/browse/SPARK-48496 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Some methods in JavaUtils.java are recompiling regexes on every invocation; > we should instead store a single cached Pattern. > This is a minor perf. issue that I spotted in the context of other profiling. > Not a huge bottleneck in the grand scheme of things, but simple and > straightforward to fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48489) Throw an user-facing error when reading invalid schema from text DataSource
[ https://issues.apache.org/jira/browse/SPARK-48489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48489. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46823 [https://github.com/apache/spark/pull/46823] > Throw an user-facing error when reading invalid schema from text DataSource > --- > > Key: SPARK-48489 > URL: https://issues.apache.org/jira/browse/SPARK-48489 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.3 >Reporter: Stefan Bukorovic >Assignee: Stefan Bukorovic >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Text DataSource produces table schema with only 1 column, but it is possible > to try and create a table with schema having multiple columns. > Currently, when user tries this, we have an assert in the code, which fails > and throws internal spark error. We should throw a better user-facing error. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48489) Throw an user-facing error when reading invalid schema from text DataSource
[ https://issues.apache.org/jira/browse/SPARK-48489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48489: Assignee: Stefan Bukorovic > Throw an user-facing error when reading invalid schema from text DataSource > --- > > Key: SPARK-48489 > URL: https://issues.apache.org/jira/browse/SPARK-48489 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.3 >Reporter: Stefan Bukorovic >Assignee: Stefan Bukorovic >Priority: Minor > Labels: pull-request-available > > Text DataSource produces table schema with only 1 column, but it is possible > to try and create a table with schema having multiple columns. > Currently, when user tries this, we have an assert in the code, which fails > and throws internal spark error. We should throw a better user-facing error. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48374) Support additional PyArrow Table column types
[ https://issues.apache.org/jira/browse/SPARK-48374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48374: Assignee: Ian Cook > Support additional PyArrow Table column types > - > > Key: SPARK-48374 > URL: https://issues.apache.org/jira/browse/SPARK-48374 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Labels: pull-request-available > > SPARK-48220 adds support for passing a PyArrow Table to > {{{}createDataFrame(){}}}, but there are a few PyArrow column types that are > not yet supported: > * fixed-size binary > * fixed-size list > * large list > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48374) Support additional PyArrow Table column types
[ https://issues.apache.org/jira/browse/SPARK-48374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48374. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46688 [https://github.com/apache/spark/pull/46688] > Support additional PyArrow Table column types > - > > Key: SPARK-48374 > URL: https://issues.apache.org/jira/browse/SPARK-48374 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SPARK-48220 adds support for passing a PyArrow Table to > {{{}createDataFrame(){}}}, but there are a few PyArrow column types that are > not yet supported: > * fixed-size binary > * fixed-size list > * large list > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()
[ https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48220: Assignee: Ian Cook > Allow passing PyArrow Table to createDataFrame() > > > Key: SPARK-48220 > URL: https://issues.apache.org/jira/browse/SPARK-48220 > Project: Spark > Issue Type: Sub-task > Components: Connect, Input/Output, PySpark, SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Labels: pull-request-available > > SPARK-47365 added support for returning a Spark DataFrame as a PyArrow Table. > It would be nice if we could also go in the opposite direction, enabling > users to create a Spark DataFrame from a PyArrow Table by passing the PyArrow > Table to {{spark.createDataFrame()}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()
[ https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48220. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46529 [https://github.com/apache/spark/pull/46529] > Allow passing PyArrow Table to createDataFrame() > > > Key: SPARK-48220 > URL: https://issues.apache.org/jira/browse/SPARK-48220 > Project: Spark > Issue Type: Sub-task > Components: Connect, Input/Output, PySpark, SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SPARK-47365 added support for returning a Spark DataFrame as a PyArrow Table. > It would be nice if we could also go in the opposite direction, enabling > users to create a Spark DataFrame from a PyArrow Table by passing the PyArrow > Table to {{spark.createDataFrame()}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48485) Support interruptTag and interruptAll in streaming queries
Hyukjin Kwon created SPARK-48485: Summary: Support interruptTag and interruptAll in streaming queries Key: SPARK-48485 URL: https://issues.apache.org/jira/browse/SPARK-48485 Project: Spark Issue Type: Improvement Components: Connect, Structured Streaming Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Spark Connect's interrupt API does not interrupt streaming queries. We should support them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
[ https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48474: Assignee: BingKun Pan > Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit` > --- > > Key: SPARK-48474 > URL: https://issues.apache.org/jira/browse/SPARK-48474 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
[ https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48474. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46808 [https://github.com/apache/spark/pull/46808] > Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit` > --- > > Key: SPARK-48474 > URL: https://issues.apache.org/jira/browse/SPARK-48474 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48467) Upgrade Maven to 3.9.7
[ https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48467: Assignee: BingKun Pan > Upgrade Maven to 3.9.7 > -- > > Key: SPARK-48467 > URL: https://issues.apache.org/jira/browse/SPARK-48467 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48467) Upgrade Maven to 3.9.7
[ https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48467. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46798 [https://github.com/apache/spark/pull/46798] > Upgrade Maven to 3.9.7 > -- > > Key: SPARK-48467 > URL: https://issues.apache.org/jira/browse/SPARK-48467 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict
[ https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47716: Assignee: Jack Chen > SQLQueryTestSuite flaky case due to view name conflict > -- > > Key: SPARK-47716 > URL: https://issues.apache.org/jira/browse/SPARK-47716 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jack Chen >Assignee: Jack Chen >Priority: Major > Labels: pull-request-available > > In SQLQueryTestSuite, the test case "Test logic for determining whether a > query is semantically sorted" can sometimes fail with an error > {{Cannot create table or view `main`.`default`.`t1` because it already > exists.}} > if run concurrently with other sql test cases that also create tables with > the same name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict
[ https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47716. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45855 [https://github.com/apache/spark/pull/45855] > SQLQueryTestSuite flaky case due to view name conflict > -- > > Key: SPARK-47716 > URL: https://issues.apache.org/jira/browse/SPARK-47716 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jack Chen >Assignee: Jack Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > In SQLQueryTestSuite, the test case "Test logic for determining whether a > query is semantically sorted" can sometimes fail with an error > {{Cannot create table or view `main`.`default`.`t1` because it already > exists.}} > if run concurrently with other sql test cases that also create tables with > the same name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48461) Replace NullPointerExceptions with proper error classes in AssertNotNull expression
[ https://issues.apache.org/jira/browse/SPARK-48461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48461. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46793 [https://github.com/apache/spark/pull/46793] > Replace NullPointerExceptions with proper error classes in AssertNotNull > expression > --- > > Key: SPARK-48461 > URL: https://issues.apache.org/jira/browse/SPARK-48461 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > [Code location > here|https://github.com/apache/spark/blob/f5d9b809881552c0e1b5af72b2a32caa25018eb3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L1929] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
[ https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48446: Assignee: Yuchen Liu > Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax > -- > > Key: SPARK-48446 > URL: https://issues.apache.org/jira/browse/SPARK-48446 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Yuchen Liu >Assignee: Yuchen Liu >Priority: Minor > Labels: easyfix, pull-request-available > Original Estimate: 1h > Remaining Estimate: 1h > > For dropDuplicates, the example on > [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22] > is out of date compared with > [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html]. > The argument should be a list. > The discrepancy is also true for dropDuplicatesWithinWatermark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
[ https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48446. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46797 [https://github.com/apache/spark/pull/46797] > Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax > -- > > Key: SPARK-48446 > URL: https://issues.apache.org/jira/browse/SPARK-48446 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Yuchen Liu >Assignee: Yuchen Liu >Priority: Minor > Labels: easyfix, pull-request-available > Fix For: 4.0.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > For dropDuplicates, the example on > [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22] > is out of date compared with > [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html]. > The argument should be a list. > The discrepancy is also true for dropDuplicatesWithinWatermark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48475) Optimize _get_jvm_function in PySpark.
[ https://issues.apache.org/jira/browse/SPARK-48475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48475. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46809 [https://github.com/apache/spark/pull/46809] > Optimize _get_jvm_function in PySpark. > -- > > Key: SPARK-48475 > URL: https://issues.apache.org/jira/browse/SPARK-48475 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48464) Refactor SQLConfSuite and StatisticsSuite
[ https://issues.apache.org/jira/browse/SPARK-48464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48464. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46796 [https://github.com/apache/spark/pull/46796] > Refactor SQLConfSuite and StatisticsSuite > - > > Key: SPARK-48464 > URL: https://issues.apache.org/jira/browse/SPARK-48464 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48454) Directly use the parent dataframe class
[ https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48454. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46785 [https://github.com/apache/spark/pull/46785] > Directly use the parent dataframe class > --- > > Key: SPARK-48454 > URL: https://issues.apache.org/jira/browse/SPARK-48454 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48454) Directly use the parent dataframe class
[ https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48454: Assignee: Ruifeng Zheng > Directly use the parent dataframe class > --- > > Key: SPARK-48454 > URL: https://issues.apache.org/jira/browse/SPARK-48454 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48442) Add parenthesis to awaitTermination call
[ https://issues.apache.org/jira/browse/SPARK-48442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48442. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46779 [https://github.com/apache/spark/pull/46779] > Add parenthesis to awaitTermination call > > > Key: SPARK-48442 > URL: https://issues.apache.org/jira/browse/SPARK-48442 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.3 >Reporter: Riya Verma >Assignee: Riya Verma >Priority: Trivial > Labels: correctness, pull-request-available, starter > Fix For: 4.0.0 > > > In {{test_stream_reader}} and {{test_stream_writer}} of > {*}test_python_streaming_datasource.py{*}, the call {{q.awaitTermination}} > does not invoke a function call as intended, but instead returns a python > function object. The fix is to change this to {{{}q.awaitTermination(){}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48459) Implement DataFrameQueryContext in Spark Connect
Hyukjin Kwon created SPARK-48459: Summary: Implement DataFrameQueryContext in Spark Connect Key: SPARK-48459 URL: https://issues.apache.org/jira/browse/SPARK-48459 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Implements the same https://github.com/apache/spark/pull/45377 in Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48445) Don't inline UDFs with non-cheap children in CollapseProject
[ https://issues.apache.org/jira/browse/SPARK-48445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48445. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46780 [https://github.com/apache/spark/pull/46780] > Don't inline UDFs with non-cheap children in CollapseProject > > > Key: SPARK-48445 > URL: https://issues.apache.org/jira/browse/SPARK-48445 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Because UDFs (and certain other expressions) are considered cheap by > CollapseProject.isCheap, they are inlined and potentially duplicated (which > is ok, because rules like ExtractPythonUDFs will de-duplicate them). However, > if the UDFs contain other non-cheap expressions, those will also be > duplicated and can potentially cause performance regressions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48445) Don't inline UDFs with non-cheap children in CollapseProject
[ https://issues.apache.org/jira/browse/SPARK-48445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48445: Assignee: Kelvin Jiang > Don't inline UDFs with non-cheap children in CollapseProject > > > Key: SPARK-48445 > URL: https://issues.apache.org/jira/browse/SPARK-48445 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > > Because UDFs (and certain other expressions) are considered cheap by > CollapseProject.isCheap, they are inlined and potentially duplicated (which > is ok, because rules like ExtractPythonUDFs will de-duplicate them). However, > if the UDFs contain other non-cheap expressions, those will also be > duplicated and can potentially cause performance regressions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850230#comment-17850230 ] Hyukjin Kwon commented on SPARK-23015: -- Fixed in https://github.com/apache/spark/pull/43706 > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > Labels: bulk-closed, pull-request-available > Fix For: 4.0.0 > > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-23015: -- > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > Labels: bulk-closed, pull-request-available > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23015. -- Fix Version/s: 4.0.0 Resolution: Fixed > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > Labels: bulk-closed, pull-request-available > Fix For: 4.0.0 > > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42965) metadata mismatch for StructField when running some tests.
[ https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42965: Assignee: Ruifeng Zheng > metadata mismatch for StructField when running some tests. > -- > > Key: SPARK-42965 > URL: https://issues.apache.org/jira/browse/SPARK-42965 > Project: Spark > Issue Type: Improvement > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 4.0.0 > > > For some reason, the metadata of `StructField` is different in a few tests > when using Spark Connect. However, the function works properly. > For example, when running `python/run-tests --testnames > 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops > BinaryOpsParityTests.test_add'` it complains `AssertionError: > ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), > False))], [StructField('bool', LongType(), False)])` because metadata is > different something like `\{'__autoGeneratedAlias': 'true'}` but they have > same name, type and nullable, so the function just works well. > Therefore, we have temporarily added a branch for Spark Connect in the code > so that we can create InternalFrame properly to provide more pandas APIs in > Spark Connect. If a clear cause is found, we may need to revert it back to > its original state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48322) Drop internal metadata in `DataFrame.schema`
[ https://issues.apache.org/jira/browse/SPARK-48322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48322. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46636 [https://github.com/apache/spark/pull/46636] > Drop internal metadata in `DataFrame.schema` > > > Key: SPARK-48322 > URL: https://issues.apache.org/jira/browse/SPARK-48322 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42965) metadata mismatch for StructField when running some tests.
[ https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42965. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46636 [https://github.com/apache/spark/pull/46636] > metadata mismatch for StructField when running some tests. > -- > > Key: SPARK-42965 > URL: https://issues.apache.org/jira/browse/SPARK-42965 > Project: Spark > Issue Type: Improvement > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > For some reason, the metadata of `StructField` is different in a few tests > when using Spark Connect. However, the function works properly. > For example, when running `python/run-tests --testnames > 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops > BinaryOpsParityTests.test_add'` it complains `AssertionError: > ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), > False))], [StructField('bool', LongType(), False)])` because metadata is > different something like `\{'__autoGeneratedAlias': 'true'}` but they have > same name, type and nullable, so the function just works well. > Therefore, we have temporarily added a branch for Spark Connect in the code > so that we can create InternalFrame properly to provide more pandas APIs in > Spark Connect. If a clear cause is found, we may need to revert it back to > its original state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48322) Drop internal metadata in `DataFrame.schema`
[ https://issues.apache.org/jira/browse/SPARK-48322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48322: Assignee: Ruifeng Zheng > Drop internal metadata in `DataFrame.schema` > > > Key: SPARK-48322 > URL: https://issues.apache.org/jira/browse/SPARK-48322 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48438) Directly use the parent column class
[ https://issues.apache.org/jira/browse/SPARK-48438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48438. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46775 [https://github.com/apache/spark/pull/46775] > Directly use the parent column class > > > Key: SPARK-48438 > URL: https://issues.apache.org/jira/browse/SPARK-48438 > Project: Spark > Issue Type: Improvement > Components: Connect, PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48434) Make printSchema use the cached schema
[ https://issues.apache.org/jira/browse/SPARK-48434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48434. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46764 [https://github.com/apache/spark/pull/46764] > Make printSchema use the cached schema > -- > > Key: SPARK-48434 > URL: https://issues.apache.org/jira/browse/SPARK-48434 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48434) Make printSchema use the cached schema
[ https://issues.apache.org/jira/browse/SPARK-48434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48434: Assignee: Ruifeng Zheng > Make printSchema use the cached schema > -- > > Key: SPARK-48434 > URL: https://issues.apache.org/jira/browse/SPARK-48434 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48432) Unnecessary Integer unboxing in UnivocityParser
[ https://issues.apache.org/jira/browse/SPARK-48432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48432: Assignee: Vladimir Golubev > Unnecessary Integer unboxing in UnivocityParser > --- > > Key: SPARK-48432 > URL: https://issues.apache.org/jira/browse/SPARK-48432 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > > `tokenIndexArr` is created as an array of `java.lang.Integers`. However, it > is used not only for the wrapped java parser, but also during parsing to > identify the correct token index. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48432) Unnecessary Integer unboxing in UnivocityParser
[ https://issues.apache.org/jira/browse/SPARK-48432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48432. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46759 [https://github.com/apache/spark/pull/46759] > Unnecessary Integer unboxing in UnivocityParser > --- > > Key: SPARK-48432 > URL: https://issues.apache.org/jira/browse/SPARK-48432 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > `tokenIndexArr` is created as an array of `java.lang.Integers`. However, it > is used not only for the wrapped java parser, but also during parsing to > identify the correct token index. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name
[ https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48425. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46751 [https://github.com/apache/spark/pull/46751] > Replaces pyspark-connect to pyspark_connect for its output name > --- > > Key: SPARK-48425 > URL: https://issues.apache.org/jira/browse/SPARK-48425 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The issue is at setuptools from 69.X.X. > It replaces dash in package name to underscore > (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) > https://github.com/pypa/setuptools/issues/4214 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name
[ https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48425: Assignee: Hyukjin Kwon > Replaces pyspark-connect to pyspark_connect for its output name > --- > > Key: SPARK-48425 > URL: https://issues.apache.org/jira/browse/SPARK-48425 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > The issue is at setuptools from 69.X.X. > It replaces dash in package name to underscore > (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) > https://github.com/pypa/setuptools/issues/4214 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name
[ https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48425: - Description: The issue is at setuptools from 69.X.X. It replaces dash in package name to underscore (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) https://github.com/pypa/setuptools/issues/4214 was: The issue is in the regression at setuptools from 69.X.X. It replaces dash in package name to underscore (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) https://github.com/pypa/setuptools/issues/4214 > Replaces pyspark-connect to pyspark_connect for its output name > --- > > Key: SPARK-48425 > URL: https://issues.apache.org/jira/browse/SPARK-48425 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > The issue is at setuptools from 69.X.X. > It replaces dash in package name to underscore > (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) > https://github.com/pypa/setuptools/issues/4214 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name
Hyukjin Kwon created SPARK-48425: Summary: Replaces pyspark-connect to pyspark_connect for its output name Key: SPARK-48425 URL: https://issues.apache.org/jira/browse/SPARK-48425 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon The issue is in the regression at setuptools from 69.X.X. It replaces dash in package name to underscore (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) https://github.com/pypa/setuptools/issues/4214 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48424) Make dev/is-changed.py to return true it it fails
[ https://issues.apache.org/jira/browse/SPARK-48424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48424. -- Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46749 [https://github.com/apache/spark/pull/46749] > Make dev/is-changed.py to return true it it fails > - > > Key: SPARK-48424 > URL: https://issues.apache.org/jira/browse/SPARK-48424 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0, 3.5.2 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > e.g., > https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48424) Make dev/is-changed.py to return true it it fails
[ https://issues.apache.org/jira/browse/SPARK-48424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48424: Assignee: Hyukjin Kwon > Make dev/is-changed.py to return true it it fails > - > > Key: SPARK-48424 > URL: https://issues.apache.org/jira/browse/SPARK-48424 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0, 3.5.2 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > e.g., > https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48424) Make dev/is-changed.py to return true it it fails
Hyukjin Kwon created SPARK-48424: Summary: Make dev/is-changed.py to return true it it fails Key: SPARK-48424 URL: https://issues.apache.org/jira/browse/SPARK-48424 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 4.0.0, 3.5.2 Reporter: Hyukjin Kwon e.g., https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client
[ https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48370. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46683 [https://github.com/apache/spark/pull/46683] > Checkpoint and localCheckpoint in Scala Spark Connect client > > > Key: SPARK-48370 > URL: https://issues.apache.org/jira/browse/SPARK-48370 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark > Connect client. We should do it in Scala too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client
[ https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48370: Assignee: Hyukjin Kwon > Checkpoint and localCheckpoint in Scala Spark Connect client > > > Key: SPARK-48370 > URL: https://issues.apache.org/jira/browse/SPARK-48370 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark > Connect client. We should do it in Scala too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48393) Move a group of constants to `pyspark.util`
[ https://issues.apache.org/jira/browse/SPARK-48393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48393: Assignee: Ruifeng Zheng > Move a group of constants to `pyspark.util` > --- > > Key: SPARK-48393 > URL: https://issues.apache.org/jira/browse/SPARK-48393 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48393) Move a group of constants to `pyspark.util`
[ https://issues.apache.org/jira/browse/SPARK-48393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48393. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46710 [https://github.com/apache/spark/pull/46710] > Move a group of constants to `pyspark.util` > --- > > Key: SPARK-48393 > URL: https://issues.apache.org/jira/browse/SPARK-48393 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-48379) Cancel build during a PR when a new commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-48379: -- Assignee: (was: Stefan Kandic) Reverted in https://github.com/apache/spark/commit/9fd85d9acc5acf455d0ad910ef2848695576242b > Cancel build during a PR when a new commit is pushed > > > Key: SPARK-48379 > URL: https://issues.apache.org/jira/browse/SPARK-48379 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Creating a new commit on a branch should cancel the build of previous commits > for the same branch. > Exceptions are master and branch-* branches where we still want to have > concurrent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48379) Cancel build during a PR when a new commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48379: - Fix Version/s: (was: 4.0.0) > Cancel build during a PR when a new commit is pushed > > > Key: SPARK-48379 > URL: https://issues.apache.org/jira/browse/SPARK-48379 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > > Creating a new commit on a branch should cancel the build of previous commits > for the same branch. > Exceptions are master and branch-* branches where we still want to have > concurrent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs
[ https://issues.apache.org/jira/browse/SPARK-48389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48389. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46703 [https://github.com/apache/spark/pull/46703] > Remove obsolete workflow cancel_duplicate_workflow_runs > --- > > Key: SPARK-48389 > URL: https://issues.apache.org/jira/browse/SPARK-48389 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > After https://github.com/apache/spark/pull/46689, we don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs
[ https://issues.apache.org/jira/browse/SPARK-48389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48389: Assignee: Hyukjin Kwon > Remove obsolete workflow cancel_duplicate_workflow_runs > --- > > Key: SPARK-48389 > URL: https://issues.apache.org/jira/browse/SPARK-48389 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > After https://github.com/apache/spark/pull/46689, we don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs
Hyukjin Kwon created SPARK-48389: Summary: Remove obsolete workflow cancel_duplicate_workflow_runs Key: SPARK-48389 URL: https://issues.apache.org/jira/browse/SPARK-48389 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon After https://github.com/apache/spark/pull/46689, we don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48379) Cancel build during a PR when a new commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48379: Assignee: Stefan Kandic > Cancel build during a PR when a new commit is pushed > > > Key: SPARK-48379 > URL: https://issues.apache.org/jira/browse/SPARK-48379 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Creating a new commit on a branch should cancel the build of previous commits > for the same branch. > Exceptions are master and branch-* branches where we still want to have > concurrent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48379) Cancel build during a PR when a new commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48379. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46689 [https://github.com/apache/spark/pull/46689] > Cancel build during a PR when a new commit is pushed > > > Key: SPARK-48379 > URL: https://issues.apache.org/jira/browse/SPARK-48379 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Creating a new commit on a branch should cancel the build of previous commits > for the same branch. > Exceptions are master and branch-* branches where we still want to have > concurrent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48341) Allow Spark Connect plugins to use QueryTest in their tests
[ https://issues.apache.org/jira/browse/SPARK-48341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48341. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46667 [https://github.com/apache/spark/pull/46667] > Allow Spark Connect plugins to use QueryTest in their tests > --- > > Key: SPARK-48341 > URL: https://issues.apache.org/jira/browse/SPARK-48341 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Tom van Bussel >Assignee: Tom van Bussel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client
Hyukjin Kwon created SPARK-48370: Summary: Checkpoint and localCheckpoint in Scala Spark Connect client Key: SPARK-48370 URL: https://issues.apache.org/jira/browse/SPARK-48370 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark Connect client. We should do it in Scala too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client
[ https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48370: - Issue Type: Improvement (was: Bug) > Checkpoint and localCheckpoint in Scala Spark Connect client > > > Key: SPARK-48370 > URL: https://issues.apache.org/jira/browse/SPARK-48370 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark > Connect client. We should do it in Scala too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48367) Fix lint-scala for scalafmt to detect properly
[ https://issues.apache.org/jira/browse/SPARK-48367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48367. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46679 [https://github.com/apache/spark/pull/46679] > Fix lint-scala for scalafmt to detect properly > -- > > Key: SPARK-48367 > URL: https://issues.apache.org/jira/browse/SPARK-48367 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > ./build/mvn \ > -Pscala-2.13 \ > scalafmt:format \ > -Dscalafmt.skip=false \ > -Dscalafmt.validateOnly=true \ > -Dscalafmt.changedOnly=false \ > -pl connector/connect/common \ > -pl connector/connect/server \ > -pl connector/connect/client/jvm > {code} > fails as below: > {code} > [INFO] Scalafmt results: 1 of 36 were unformatted > [INFO] Details: > [INFO] - Requires formatting: ConnectProtoUtils.scala > [INFO] - Formatted: UdfUtils.scala > [INFO] - Formatted: DataTypeProtoConverter.scala > [INFO] - Formatted: ConnectCommon.scala > [INFO] - Formatted: ProtoUtils.scala > [INFO] - Formatted: Abbreviator.scala > [INFO] - Formatted: ProtoDataTypes.scala > [INFO] - Formatted: LiteralValueProtoConverter.scala > [INFO] - Formatted: InvalidPlanInput.scala > [INFO] - Formatted: ForeachWriterPacket.scala > [INFO] - Formatted: StreamingListenerPacket.scala > [INFO] - Formatted: StorageLevelProtoConverter.scala > [INFO] - Formatted: UdfPacket.scala > [INFO] - Formatted: ClassFinder.scala > [INFO] - Formatted: SparkConnectClient.scala > [INFO] - Formatted: GrpcRetryHandler.scala > [INFO] - Formatted: GrpcExceptionConverter.scala > [INFO] - Formatted: ArrowEncoderUtils.scala > [INFO] - Formatted: ScalaCollectionUtils.scala > [INFO] - Formatted: ArrowDeserializer.scala > [INFO] - Formatted: ArrowVectorReader.scala > [INFO] - Formatted: ArrowSerializer.scala > [INFO] - Formatted: ConcatenatingArrowStreamReader.scala > [INFO] - Formatted: RetryPolicy.scala > [INFO] - Formatted: SparkConnectStubState.scala > [INFO] - Formatted: ArtifactManager.scala > [INFO] - Formatted: SparkResult.scala > [INFO] - Formatted: RetriesExceeded.scala > [INFO] - Formatted: CloseableIterator.scala > [INFO] - Formatted: package.scala > [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala > [INFO] - Formatted: ResponseValidator.scala > [INFO] - Formatted: SparkConnectClientParser.scala > [INFO] - Formatted: CustomSparkConnectStub.scala > [INFO] - Formatted: CustomSparkConnectBlockingStub.scala > [INFO] - Formatted: TestUDFs.scala > {code} > This is because the output format has changed due to scalafmt version upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48367) Fix lint-scala for scalafmt to detect properly
[ https://issues.apache.org/jira/browse/SPARK-48367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48367: Assignee: Hyukjin Kwon > Fix lint-scala for scalafmt to detect properly > -- > > Key: SPARK-48367 > URL: https://issues.apache.org/jira/browse/SPARK-48367 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > ./build/mvn \ > -Pscala-2.13 \ > scalafmt:format \ > -Dscalafmt.skip=false \ > -Dscalafmt.validateOnly=true \ > -Dscalafmt.changedOnly=false \ > -pl connector/connect/common \ > -pl connector/connect/server \ > -pl connector/connect/client/jvm > {code} > fails as below: > {code} > [INFO] Scalafmt results: 1 of 36 were unformatted > [INFO] Details: > [INFO] - Requires formatting: ConnectProtoUtils.scala > [INFO] - Formatted: UdfUtils.scala > [INFO] - Formatted: DataTypeProtoConverter.scala > [INFO] - Formatted: ConnectCommon.scala > [INFO] - Formatted: ProtoUtils.scala > [INFO] - Formatted: Abbreviator.scala > [INFO] - Formatted: ProtoDataTypes.scala > [INFO] - Formatted: LiteralValueProtoConverter.scala > [INFO] - Formatted: InvalidPlanInput.scala > [INFO] - Formatted: ForeachWriterPacket.scala > [INFO] - Formatted: StreamingListenerPacket.scala > [INFO] - Formatted: StorageLevelProtoConverter.scala > [INFO] - Formatted: UdfPacket.scala > [INFO] - Formatted: ClassFinder.scala > [INFO] - Formatted: SparkConnectClient.scala > [INFO] - Formatted: GrpcRetryHandler.scala > [INFO] - Formatted: GrpcExceptionConverter.scala > [INFO] - Formatted: ArrowEncoderUtils.scala > [INFO] - Formatted: ScalaCollectionUtils.scala > [INFO] - Formatted: ArrowDeserializer.scala > [INFO] - Formatted: ArrowVectorReader.scala > [INFO] - Formatted: ArrowSerializer.scala > [INFO] - Formatted: ConcatenatingArrowStreamReader.scala > [INFO] - Formatted: RetryPolicy.scala > [INFO] - Formatted: SparkConnectStubState.scala > [INFO] - Formatted: ArtifactManager.scala > [INFO] - Formatted: SparkResult.scala > [INFO] - Formatted: RetriesExceeded.scala > [INFO] - Formatted: CloseableIterator.scala > [INFO] - Formatted: package.scala > [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala > [INFO] - Formatted: ResponseValidator.scala > [INFO] - Formatted: SparkConnectClientParser.scala > [INFO] - Formatted: CustomSparkConnectStub.scala > [INFO] - Formatted: CustomSparkConnectBlockingStub.scala > [INFO] - Formatted: TestUDFs.scala > {code} > This is because the output format has changed due to scalafmt version upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48367) Fix lint-scala for scalafmt to detect properly
Hyukjin Kwon created SPARK-48367: Summary: Fix lint-scala for scalafmt to detect properly Key: SPARK-48367 URL: https://issues.apache.org/jira/browse/SPARK-48367 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} ./build/mvn \ -Pscala-2.13 \ scalafmt:format \ -Dscalafmt.skip=false \ -Dscalafmt.validateOnly=true \ -Dscalafmt.changedOnly=false \ -pl connector/connect/common \ -pl connector/connect/server \ -pl connector/connect/client/jvm {code} fails as below: {code} [INFO] Scalafmt results: 1 of 36 were unformatted [INFO] Details: [INFO] - Requires formatting: ConnectProtoUtils.scala [INFO] - Formatted: UdfUtils.scala [INFO] - Formatted: DataTypeProtoConverter.scala [INFO] - Formatted: ConnectCommon.scala [INFO] - Formatted: ProtoUtils.scala [INFO] - Formatted: Abbreviator.scala [INFO] - Formatted: ProtoDataTypes.scala [INFO] - Formatted: LiteralValueProtoConverter.scala [INFO] - Formatted: InvalidPlanInput.scala [INFO] - Formatted: ForeachWriterPacket.scala [INFO] - Formatted: StreamingListenerPacket.scala [INFO] - Formatted: StorageLevelProtoConverter.scala [INFO] - Formatted: UdfPacket.scala [INFO] - Formatted: ClassFinder.scala [INFO] - Formatted: SparkConnectClient.scala [INFO] - Formatted: GrpcRetryHandler.scala [INFO] - Formatted: GrpcExceptionConverter.scala [INFO] - Formatted: ArrowEncoderUtils.scala [INFO] - Formatted: ScalaCollectionUtils.scala [INFO] - Formatted: ArrowDeserializer.scala [INFO] - Formatted: ArrowVectorReader.scala [INFO] - Formatted: ArrowSerializer.scala [INFO] - Formatted: ConcatenatingArrowStreamReader.scala [INFO] - Formatted: RetryPolicy.scala [INFO] - Formatted: SparkConnectStubState.scala [INFO] - Formatted: ArtifactManager.scala [INFO] - Formatted: SparkResult.scala [INFO] - Formatted: RetriesExceeded.scala [INFO] - Formatted: CloseableIterator.scala [INFO] - Formatted: package.scala [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala [INFO] - Formatted: ResponseValidator.scala [INFO] - Formatted: SparkConnectClientParser.scala [INFO] - Formatted: CustomSparkConnectStub.scala [INFO] - Formatted: CustomSparkConnectBlockingStub.scala [INFO] - Formatted: TestUDFs.scala {code} This is because the output format has changed due to scalafmt version upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48363) Cleanup some redundant codes in `from_xml`
[ https://issues.apache.org/jira/browse/SPARK-48363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48363. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46674 [https://github.com/apache/spark/pull/46674] > Cleanup some redundant codes in `from_xml` > -- > > Key: SPARK-48363 > URL: https://issues.apache.org/jira/browse/SPARK-48363 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48363) Cleanup some redundant codes in `from_xml`
[ https://issues.apache.org/jira/browse/SPARK-48363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48363: Assignee: BingKun Pan > Cleanup some redundant codes in `from_xml` > -- > > Key: SPARK-48363 > URL: https://issues.apache.org/jira/browse/SPARK-48363 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz
[ https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48340. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 4 [https://github.com/apache/spark/pull/4] > Support TimestampNTZ infer schema miss prefer_timestamp_ntz > > > Key: SPARK-48340 > URL: https://issues.apache.org/jira/browse/SPARK-48340 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: image-2024-05-20-18-38-39-769.png > > > !image-2024-05-20-18-38-39-769.png|width=746,height=450! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz
[ https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48340: Assignee: angerszhu > Support TimestampNTZ infer schema miss prefer_timestamp_ntz > > > Key: SPARK-48340 > URL: https://issues.apache.org/jira/browse/SPARK-48340 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Attachments: image-2024-05-20-18-38-39-769.png > > > !image-2024-05-20-18-38-39-769.png|width=746,height=450! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48258) Implement DataFrame.checkpoint and DataFrame.localCheckpoint
[ https://issues.apache.org/jira/browse/SPARK-48258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48258. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46570 [https://github.com/apache/spark/pull/46570] > Implement DataFrame.checkpoint and DataFrame.localCheckpoint > > > Key: SPARK-48258 > URL: https://issues.apache.org/jira/browse/SPARK-48258 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We should add DataFrame.checkpoint and DataFrame.localCheckpoint for feature > parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48333) Test `test_sorting_functions_with_column` with same `Column`
[ https://issues.apache.org/jira/browse/SPARK-48333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48333. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46654 [https://github.com/apache/spark/pull/46654] > Test `test_sorting_functions_with_column` with same `Column` > > > Key: SPARK-48333 > URL: https://issues.apache.org/jira/browse/SPARK-48333 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48333) Test `test_sorting_functions_with_column` with same `Column`
[ https://issues.apache.org/jira/browse/SPARK-48333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48333: Assignee: Ruifeng Zheng > Test `test_sorting_functions_with_column` with same `Column` > > > Key: SPARK-48333 > URL: https://issues.apache.org/jira/browse/SPARK-48333 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48319) Test `assert_true` and `raise_error` with the same error class as Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-48319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48319: Assignee: Ruifeng Zheng > Test `assert_true` and `raise_error` with the same error class as Spark > Classic > --- > > Key: SPARK-48319 > URL: https://issues.apache.org/jira/browse/SPARK-48319 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48319) Test `assert_true` and `raise_error` with the same error class as Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-48319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48319. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46633 [https://github.com/apache/spark/pull/46633] > Test `assert_true` and `raise_error` with the same error class as Spark > Classic > --- > > Key: SPARK-48319 > URL: https://issues.apache.org/jira/browse/SPARK-48319 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48317) Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file
[ https://issues.apache.org/jira/browse/SPARK-48317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48317: Assignee: Hyukjin Kwon > Enable test_udtf_with_analyze_using_archive and > test_udtf_with_analyze_using_file > - > > Key: SPARK-48317 > URL: https://issues.apache.org/jira/browse/SPARK-48317 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48317) Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file
[ https://issues.apache.org/jira/browse/SPARK-48317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48317. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46632 [https://github.com/apache/spark/pull/46632] > Enable test_udtf_with_analyze_using_archive and > test_udtf_with_analyze_using_file > - > > Key: SPARK-48317 > URL: https://issues.apache.org/jira/browse/SPARK-48317 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48316) Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition
[ https://issues.apache.org/jira/browse/SPARK-48316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48316: Assignee: Hyukjin Kwon > Fix comments for SparkFrameMethodsParityTests.test_coalesce and > test_repartition > > > Key: SPARK-48316 > URL: https://issues.apache.org/jira/browse/SPARK-48316 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48316) Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition
[ https://issues.apache.org/jira/browse/SPARK-48316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48316. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46629 [https://github.com/apache/spark/pull/46629] > Fix comments for SparkFrameMethodsParityTests.test_coalesce and > test_repartition > > > Key: SPARK-48316 > URL: https://issues.apache.org/jira/browse/SPARK-48316 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48316) Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition
[ https://issues.apache.org/jira/browse/SPARK-48316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48316: - Summary: Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition (was: Enable SparkFrameMethodsParityTests.test_coalesce and test_repartition) > Fix comments for SparkFrameMethodsParityTests.test_coalesce and > test_repartition > > > Key: SPARK-48316 > URL: https://issues.apache.org/jira/browse/SPARK-48316 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48317) Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file
Hyukjin Kwon created SPARK-48317: Summary: Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file Key: SPARK-48317 URL: https://issues.apache.org/jira/browse/SPARK-48317 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
[ https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48238: - Parent: (was: SPARK-47970) Issue Type: Bug (was: Sub-task) > Spark fail to start due to class > o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter > --- > > Key: SPARK-48238 > URL: https://issues.apache.org/jira/browse/SPARK-48238 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Blocker > Labels: pull-request-available > > I tested the latest master branch, it failed to start on YARN mode > {code:java} > dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code} > > {code:java} > $ bin/spark-sql --master yarn > WARNING: Using incubator modules: jdk.incubator.vector > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor > spark.yarn.archive} is set, falling back to uploading libraries under > SPARK_HOME. > 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext. > org.sparkproject.jetty.util.MultiException: Multiple exceptions > at > org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) > ~[scala-library-2.13.13.jar:?] > at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) > ~[scala-library-2.13.13.jar:?] > at scala.collection.AbstractIterable.foreach(Iterable.scala:935) > ~[scala-library-2.13.13.jar:?] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] > at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] > at org.apache.spark.SparkContext.(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118) > ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?] > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112) > [spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:64) >
[jira] [Created] (SPARK-48316) Enable SparkFrameMethodsParityTests.test_coalesce and test_repartition
Hyukjin Kwon created SPARK-48316: Summary: Enable SparkFrameMethodsParityTests.test_coalesce and test_repartition Key: SPARK-48316 URL: https://issues.apache.org/jira/browse/SPARK-48316 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48310) Cached Properties Should return copies instead of values
[ https://issues.apache.org/jira/browse/SPARK-48310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48310. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46621 [https://github.com/apache/spark/pull/46621] > Cached Properties Should return copies instead of values > > > Key: SPARK-48310 > URL: https://issues.apache.org/jira/browse/SPARK-48310 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When returning cached properties for schema and columns a user might > incidentally modify the cached values. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir
[ https://issues.apache.org/jira/browse/SPARK-48268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48268. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46571 [https://github.com/apache/spark/pull/46571] > Add a configuration for SparkContext.setCheckpointDir > - > > Key: SPARK-48268 > URL: https://issues.apache.org/jira/browse/SPARK-48268 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Would be great to have it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir
[ https://issues.apache.org/jira/browse/SPARK-48268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48268: Assignee: Hyukjin Kwon > Add a configuration for SparkContext.setCheckpointDir > - > > Key: SPARK-48268 > URL: https://issues.apache.org/jira/browse/SPARK-48268 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Would be great to have it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48295) Turn on compute.ops_on_diff_frames by default
[ https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48295. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46602 [https://github.com/apache/spark/pull/46602] > Turn on compute.ops_on_diff_frames by default > - > > Key: SPARK-48295 > URL: https://issues.apache.org/jira/browse/SPARK-48295 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48295) Turn on compute.ops_on_diff_frames by default
[ https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48295: Assignee: Ruifeng Zheng > Turn on compute.ops_on_diff_frames by default > - > > Key: SPARK-48295 > URL: https://issues.apache.org/jira/browse/SPARK-48295 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48100) [SQL][XML] Fix issues in skipping nested structure fields not selected in schema
[ https://issues.apache.org/jira/browse/SPARK-48100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48100: Assignee: Shujing Yang > [SQL][XML] Fix issues in skipping nested structure fields not selected in > schema > > > Key: SPARK-48100 > URL: https://issues.apache.org/jira/browse/SPARK-48100 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > > Previously, the XML parser can't skip nested structure data fields > effectively when they were not selected in the schema. For instance, in the > below example, `df.select("struct2").collect()` returns `Seq(null)` as > `struct1` wasn't effectively skipped. This PR fixes this issue. > {code:java} > > > 1 > > > 2 > > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48100) [SQL][XML] Fix issues in skipping nested structure fields not selected in schema
[ https://issues.apache.org/jira/browse/SPARK-48100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48100. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46348 [https://github.com/apache/spark/pull/46348] > [SQL][XML] Fix issues in skipping nested structure fields not selected in > schema > > > Key: SPARK-48100 > URL: https://issues.apache.org/jira/browse/SPARK-48100 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Previously, the XML parser can't skip nested structure data fields > effectively when they were not selected in the schema. For instance, in the > below example, `df.select("struct2").collect()` returns `Seq(null)` as > `struct1` wasn't effectively skipped. This PR fixes this issue. > {code:java} > > > 1 > > > 2 > > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48247) Use all values in a python dict when inferring MapType schema
[ https://issues.apache.org/jira/browse/SPARK-48247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48247. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46547 [https://github.com/apache/spark/pull/46547] > Use all values in a python dict when inferring MapType schema > - > > Key: SPARK-48247 > URL: https://issues.apache.org/jira/browse/SPARK-48247 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Similar with SPARK-39168 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48247) Use all values in a python dict when inferring MapType schema
[ https://issues.apache.org/jira/browse/SPARK-48247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48247: Assignee: Hyukjin Kwon > Use all values in a python dict when inferring MapType schema > - > > Key: SPARK-48247 > URL: https://issues.apache.org/jira/browse/SPARK-48247 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Similar with SPARK-39168 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48266) Move o.a.spark.sql.connect.dsl to test dir
[ https://issues.apache.org/jira/browse/SPARK-48266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48266. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46567 [https://github.com/apache/spark/pull/46567] > Move o.a.spark.sql.connect.dsl to test dir > -- > > Key: SPARK-48266 > URL: https://issues.apache.org/jira/browse/SPARK-48266 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir
Hyukjin Kwon created SPARK-48268: Summary: Add a configuration for SparkContext.setCheckpointDir Key: SPARK-48268 URL: https://issues.apache.org/jira/browse/SPARK-48268 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Would be great to have it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org