[jira] [Assigned] (SPARK-43150) Remove workaround for PARQUET-2160
[ https://issues.apache.org/jira/browse/SPARK-43150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-43150: Assignee: Cheng Pan > Remove workaround for PARQUET-2160 > -- > > Key: SPARK-43150 > URL: https://issues.apache.org/jira/browse/SPARK-43150 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43150) Remove workaround for PARQUET-2160
[ https://issues.apache.org/jira/browse/SPARK-43150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-43150. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40802 [https://github.com/apache/spark/pull/40802] > Remove workaround for PARQUET-2160 > -- > > Key: SPARK-43150 > URL: https://issues.apache.org/jira/browse/SPARK-43150 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43050) Fix construct aggregate expressions by replacing grouping functions
[ https://issues.apache.org/jira/browse/SPARK-43050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-43050. - Fix Version/s: 3.3.3 3.4.1 3.5.0 Assignee: Yuming Wang Resolution: Fixed Issue resolved by pull request 40685 https://github.com/apache/spark/pull/40685 > Fix construct aggregate expressions by replacing grouping functions > --- > > Key: SPARK-43050 > URL: https://issues.apache.org/jira/browse/SPARK-43050 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.3.3, 3.4.1, 3.5.0 > > > {code:sql} > CREATE TEMPORARY VIEW grouping AS SELECT * FROM VALUES > ("1", "2", "3", 1), > ("4", "5", "6", 1), > ("7", "8", "9", 1) > as grouping(a, b, c, d); > {code} > {noformat} > spark-sql (default)> SELECT CASE WHEN a IS NULL THEN count(b) WHEN b IS NULL > THEN count(c) END >> FROM grouping >> GROUP BY GROUPING SETS (a, b, c); > [MISSING_AGGREGATION] The non-aggregating expression "b" is based on columns > which are not participating in the GROUP BY clause. > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43150) Remove workaround for PARQUET-2160
Cheng Pan created SPARK-43150: - Summary: Remove workaround for PARQUET-2160 Key: SPARK-43150 URL: https://issues.apache.org/jira/browse/SPARK-43150 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43149) When CREATE USING fails to store metadata in metastore, data gets left around
Bruce Robbins created SPARK-43149: - Summary: When CREATE USING fails to store metadata in metastore, data gets left around Key: SPARK-43149 URL: https://issues.apache.org/jira/browse/SPARK-43149 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Bruce Robbins For example: {noformat} drop table if exists parquet_ds1; -- try creating table with invalid column name -- use 'using parquet' to designate the data source create table parquet_ds1 using parquet as select id, date'2018-01-01' + make_dt_interval(0, id) from range(0, 10); Cannot create a table having a column whose name contains commas in Hive metastore. Table: `spark_catalog`.`default`.`parquet_ds1`; Column: DATE '2018-01-01' + make_dt_interval(0, id, 0, 0.00) -- show that table did not get created show tables; -- try again with valid column name -- spark will complain that directory already exists create table parquet_ds1 using parquet as select id, date'2018-01-01' + make_dt_interval(0, id) as ts from range(0, 10); [LOCATION_ALREADY_EXISTS] Cannot name the managed table as `spark_catalog`.`default`.`parquet_ds1`, as its associated location 'file:/Users/bruce/github/spark_upstream/spark-warehouse/parquet_ds1' already exists. Please pick a different table name, or remove the existing location first. org.apache.spark.SparkRuntimeException: [LOCATION_ALREADY_EXISTS] Cannot name the managed table as `spark_catalog`.`default`.`parquet_ds1`, as its associated location 'file:/Users/bruce/github/spark_upstream/spark-warehouse/parquet_ds1' already exists. Please pick a different table name, or remove the existing location first. at org.apache.spark.sql.errors.QueryExecutionErrors$.locationAlreadyExists(QueryExecutionErrors.scala:2804) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateTableLocation(SessionCatalog.scala:414) at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:176) ... {noformat} One must manually remove the directory {{spark-warehouse/parquet_ds1}} before the {{create table}} command will succeed. It seems that datasource table creation runs the data-creation job first, then stores the metadata into the metastore. When using Spark to create Hive tables, the issue does not happen: {noformat} drop table if exists parquet_hive1; -- try creating table with invalid column name, -- but use 'stored as parquet' instead of 'using' create table parquet_hive1 stored as parquet as select id, date'2018-01-01' + make_dt_interval(0, id) from range(0, 10); Cannot create a table having a column whose name contains commas in Hive metastore. Table: `spark_catalog`.`default`.`parquet_hive1`; Column: DATE '2018-01-01' + make_dt_interval(0, id, 0, 0.00) -- try again with valid column name. This will succeed; create table parquet_hive1 stored as parquet as select id, date'2018-01-01' + make_dt_interval(0, id) as ts from range(0, 10); {noformat} It seems that Hive table creation stores metadata into the metastore first, then runs the data-creation job. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43149) When CTAS with USING fails to store metadata in metastore, data gets left around
[ https://issues.apache.org/jira/browse/SPARK-43149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-43149: -- Summary: When CTAS with USING fails to store metadata in metastore, data gets left around (was: When CREATE USING fails to store metadata in metastore, data gets left around) > When CTAS with USING fails to store metadata in metastore, data gets left > around > > > Key: SPARK-43149 > URL: https://issues.apache.org/jira/browse/SPARK-43149 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Bruce Robbins >Priority: Major > > For example: > {noformat} > drop table if exists parquet_ds1; > -- try creating table with invalid column name > -- use 'using parquet' to designate the data source > create table parquet_ds1 using parquet as > select id, date'2018-01-01' + make_dt_interval(0, id) > from range(0, 10); > Cannot create a table having a column whose name contains commas in Hive > metastore. Table: `spark_catalog`.`default`.`parquet_ds1`; Column: DATE > '2018-01-01' + make_dt_interval(0, id, 0, 0.00) > -- show that table did not get created > show tables; > -- try again with valid column name > -- spark will complain that directory already exists > create table parquet_ds1 using parquet as > select id, date'2018-01-01' + make_dt_interval(0, id) as ts > from range(0, 10); > [LOCATION_ALREADY_EXISTS] Cannot name the managed table as > `spark_catalog`.`default`.`parquet_ds1`, as its associated location > 'file:/Users/bruce/github/spark_upstream/spark-warehouse/parquet_ds1' already > exists. Please pick a different table name, or remove the existing location > first. > org.apache.spark.SparkRuntimeException: [LOCATION_ALREADY_EXISTS] Cannot name > the managed table as `spark_catalog`.`default`.`parquet_ds1`, as its > associated location > 'file:/Users/bruce/github/spark_upstream/spark-warehouse/parquet_ds1' already > exists. Please pick a different table name, or remove the existing location > first. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.locationAlreadyExists(QueryExecutionErrors.scala:2804) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateTableLocation(SessionCatalog.scala:414) > at > org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:176) > ... > {noformat} > One must manually remove the directory {{spark-warehouse/parquet_ds1}} before > the {{create table}} command will succeed. > It seems that datasource table creation runs the data-creation job first, > then stores the metadata into the metastore. > When using Spark to create Hive tables, the issue does not happen: > {noformat} > drop table if exists parquet_hive1; > -- try creating table with invalid column name, > -- but use 'stored as parquet' instead of 'using' > create table parquet_hive1 stored as parquet as > select id, date'2018-01-01' + make_dt_interval(0, id) > from range(0, 10); > Cannot create a table having a column whose name contains commas in Hive > metastore. Table: `spark_catalog`.`default`.`parquet_hive1`; Column: DATE > '2018-01-01' + make_dt_interval(0, id, 0, 0.00) > -- try again with valid column name. This will succeed; > create table parquet_hive1 stored as parquet as > select id, date'2018-01-01' + make_dt_interval(0, id) as ts > from range(0, 10); > {noformat} > It seems that Hive table creation stores metadata into the metastore first, > then runs the data-creation job. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43095) Avoid Once strategy's idempotence is broken for batch: Infer Filters
[ https://issues.apache.org/jira/browse/SPARK-43095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-43095. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40742 [https://github.com/apache/spark/pull/40742] > Avoid Once strategy's idempotence is broken for batch: Infer Filters > > > Key: SPARK-43095 > URL: https://issues.apache.org/jira/browse/SPARK-43095 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43095) Avoid Once strategy's idempotence is broken for batch: Infer Filters
[ https://issues.apache.org/jira/browse/SPARK-43095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-43095: --- Assignee: Yuming Wang > Avoid Once strategy's idempotence is broken for batch: Infer Filters > > > Key: SPARK-43095 > URL: https://issues.apache.org/jira/browse/SPARK-43095 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42926) Upgrade Parquet to 1.13.0
[ https://issues.apache.org/jira/browse/SPARK-42926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-42926. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40555 [https://github.com/apache/spark/pull/40555] > Upgrade Parquet to 1.13.0 > - > > Key: SPARK-42926 > URL: https://issues.apache.org/jira/browse/SPARK-42926 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.5.0 > > > This release includes PARQUET-2160. So we no longer need SPARK-41952. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42926) Upgrade Parquet to 1.13.0
[ https://issues.apache.org/jira/browse/SPARK-42926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-42926: --- Assignee: Yuming Wang > Upgrade Parquet to 1.13.0 > - > > Key: SPARK-42926 > URL: https://issues.apache.org/jira/browse/SPARK-42926 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > This release includes PARQUET-2160. So we no longer need SPARK-41952. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43107) Coalesce buckets in join applied on broadcast join stream side
[ https://issues.apache.org/jira/browse/SPARK-43107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-43107. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40756 [https://github.com/apache/spark/pull/40756] > Coalesce buckets in join applied on broadcast join stream side > -- > > Key: SPARK-43107 > URL: https://issues.apache.org/jira/browse/SPARK-43107 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43148) Add official image dockerfile for Spark v3.4.0
Yikun Jiang created SPARK-43148: --- Summary: Add official image dockerfile for Spark v3.4.0 Key: SPARK-43148 URL: https://issues.apache.org/jira/browse/SPARK-43148 Project: Spark Issue Type: Sub-task Components: Spark Docker Affects Versions: 3.5.0 Reporter: Yikun Jiang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43107) Coalesce buckets in join applied on broadcast join stream side
[ https://issues.apache.org/jira/browse/SPARK-43107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-43107: --- Assignee: Yuming Wang > Coalesce buckets in join applied on broadcast join stream side > -- > > Key: SPARK-43107 > URL: https://issues.apache.org/jira/browse/SPARK-43107 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43147) Python lint local config
Wei Liu created SPARK-43147: --- Summary: Python lint local config Key: SPARK-43147 URL: https://issues.apache.org/jira/browse/SPARK-43147 Project: Spark Issue Type: Task Components: PySpark, python Affects Versions: 3.5.0 Reporter: Wei Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43146) Implement eager evaluation.
Takuya Ueshin created SPARK-43146: - Summary: Implement eager evaluation. Key: SPARK-43146 URL: https://issues.apache.org/jira/browse/SPARK-43146 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43145) Reduce ClassNotFound of hive storage handler table
[ https://issues.apache.org/jira/browse/SPARK-43145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712569#comment-17712569 ] Yi Zhang commented on SPARK-43145: -- PR https://github.com/apache/spark/pull/40799 > Reduce ClassNotFound of hive storage handler table > -- > > Key: SPARK-43145 > URL: https://issues.apache.org/jira/browse/SPARK-43145 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.2 >Reporter: Yi Zhang >Priority: Minor > > For desc table, show create table, or just need to load the > HiveTableRelation, do not need to initialize the storagehandler class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43145) Reduce ClassNotFound of hive storage handler table
Yi Zhang created SPARK-43145: Summary: Reduce ClassNotFound of hive storage handler table Key: SPARK-43145 URL: https://issues.apache.org/jira/browse/SPARK-43145 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.2 Reporter: Yi Zhang For desc table, show create table, or just need to load the HiveTableRelation, do not need to initialize the storagehandler class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43144) Scala: DataStreamReader table() API
Raghu Angadi created SPARK-43144: Summary: Scala: DataStreamReader table() API Key: SPARK-43144 URL: https://issues.apache.org/jira/browse/SPARK-43144 Project: Spark Issue Type: Task Components: Connect, Structured Streaming Affects Versions: 3.5.0 Reporter: Raghu Angadi -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43143) Scala: Add StreamingQuery awaitTermination() API
Raghu Angadi created SPARK-43143: Summary: Scala: Add StreamingQuery awaitTermination() API Key: SPARK-43143 URL: https://issues.apache.org/jira/browse/SPARK-43143 Project: Spark Issue Type: Task Components: Connect, Structured Streaming Affects Versions: 3.5.0 Reporter: Raghu Angadi -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43022) protobuf functions
[ https://issues.apache.org/jira/browse/SPARK-43022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712468#comment-17712468 ] Ignite TC Bot commented on SPARK-43022: --- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40654 > protobuf functions > -- > > Key: SPARK-43022 > URL: https://issues.apache.org/jira/browse/SPARK-43022 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39892) Use ArrowType.Decimal(precision, scale, bitWidth) instead of ArrowType.Decimal(precision, scale)
[ https://issues.apache.org/jira/browse/SPARK-39892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-39892: - Fix Version/s: 3.5.0 (was: 3.4.0) > Use ArrowType.Decimal(precision, scale, bitWidth) instead of > ArrowType.Decimal(precision, scale) > > > Key: SPARK-39892 > URL: https://issues.apache.org/jira/browse/SPARK-39892 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > > [warn] > /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala:48:49: > [deprecation @ org.apache.spark.sql.util.ArrowUtils.toArrowType | > origin=org.apache.arrow.vector.types.pojo.ArrowType.Decimal. | > version=] constructor Decimal in class Decimal is deprecated -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41259) Spark-sql cli query results should correspond to schema
[ https://issues.apache.org/jira/browse/SPARK-41259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41259: - Fix Version/s: 3.5.0 (was: 3.4.0) > Spark-sql cli query results should correspond to schema > --- > > Key: SPARK-41259 > URL: https://issues.apache.org/jira/browse/SPARK-41259 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: yikaifei >Priority: Minor > Fix For: 3.5.0 > > > When using the spark-sql cli, Spark outputs only one column in the `show > tables` and `show views` commands to be compatible with Hive output, but the > output schema is still the three columns of Spark -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39814) Use AmazonKinesisClientBuilder.withCredentials instead of new AmazonKinesisClient(credentials)
[ https://issues.apache.org/jira/browse/SPARK-39814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-39814: - Fix Version/s: 3.5.0 (was: 3.4.0) > Use AmazonKinesisClientBuilder.withCredentials instead of new > AmazonKinesisClient(credentials) > -- > > Key: SPARK-39814 > URL: https://issues.apache.org/jira/browse/SPARK-39814 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > > [warn] > /home/runner/work/spark/spark/connector/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala:108:25: > [deprecation @ > org.apache.spark.examples.streaming.KinesisWordCountASL.main.kinesisClient | > origin=com.amazonaws.services.kinesis.AmazonKinesisClient. | version=] > constructor AmazonKinesisClient in class AmazonKinesisClient is deprecated > [warn] > /home/runner/work/spark/spark/connector/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala:224:25: > [deprecation @ > org.apache.spark.examples.streaming.KinesisWordProducerASL.generate.kinesisClient > | origin=com.amazonaws.services.kinesis.AmazonKinesisClient. | > version=] constructor AmazonKinesisClient in class AmazonKinesisClient is > deprecated > [warn] > /home/runner/work/spark/spark/connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:142:24: > [deprecation @ > org.apache.spark.streaming.kinesis.KinesisSequenceRangeIterator.client | > origin=com.amazonaws.services.kinesis.AmazonKinesisClient. | version=] > constructor AmazonKinesisClient in class AmazonKinesisClient is deprecated > [warn] > /home/runner/work/spark/spark/connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:58:18: > [deprecation @ > org.apache.spark.streaming.kinesis.KinesisTestUtils.kinesisClient.client | > origin=com.amazonaws.services.kinesis.AmazonKinesisClient. | version=] > constructor AmazonKinesisClient in class AmazonKinesisClient is deprecated -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39136) JDBCTable support properties
[ https://issues.apache.org/jira/browse/SPARK-39136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-39136: - Fix Version/s: 3.5.0 (was: 3.4.0) > JDBCTable support properties > > > Key: SPARK-39136 > URL: https://issues.apache.org/jira/browse/SPARK-39136 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: angerszhu >Priority: Major > Fix For: 3.5.0 > > > {code:java} > > > > desc formatted jdbc.test.people; > NAME string > IDint > # Partitioning > Not partitioned > # Detailed Table Information > Name test.people > Table Properties [] > Time taken: 0.048 seconds, Fetched 9 row(s) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37935) Migrate onto error classes
[ https://issues.apache.org/jira/browse/SPARK-37935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-37935: - Fix Version/s: 3.5.0 (was: 3.4.0) > Migrate onto error classes > -- > > Key: SPARK-37935 > URL: https://issues.apache.org/jira/browse/SPARK-37935 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.5.0 > > > The PR https://github.com/apache/spark/pull/32850 introduced error classes as > a part of the error messages framework > (https://issues.apache.org/jira/browse/SPARK-33539). Need to migrate all > exceptions from QueryExecutionErrors, QueryCompilationErrors and > QueryParsingErrors on the error classes using instances of SparkThrowable, > and carefully test every error class by writing tests in dedicated test > suites: > * QueryExecutionErrorsSuite for the errors that are occurred during query > execution > * QueryCompilationErrorsSuite ... query compilation or eagerly executing > commands > * QueryParsingErrorsSuite ... parsing errors > Here is an example https://github.com/apache/spark/pull/35157 of how an > existing Java exception can be replaced, and testing of related error > classes.At the end, we should migrate all exceptions from the files > Query.*Errors.scala and cover all error classes from the error-classes.json > file by tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42169) Implement code generation for `to_csv` function (StructsToCsv)
[ https://issues.apache.org/jira/browse/SPARK-42169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42169: - Fix Version/s: 3.5.0 (was: 3.4.0) > Implement code generation for `to_csv` function (StructsToCsv) > -- > > Key: SPARK-42169 > URL: https://issues.apache.org/jira/browse/SPARK-42169 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Narek Karapetian >Priority: Minor > Labels: csv, sql > Fix For: 3.5.0 > > > Implement code generation for `to_csv` function instead of extending it from > CodegenFallback trait. > {code:java} > org.apache.spark.sql.catalyst.expressions.StructsToCsv.doGenCode(...){code} > > This is good to have from performance point of view. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38945) simply KEYTAB and PRINCIPAL in KerberosConfDriverFeatureStep
[ https://issues.apache.org/jira/browse/SPARK-38945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-38945: - Fix Version/s: 3.5.0 (was: 3.4.0) > simply KEYTAB and PRINCIPAL in KerberosConfDriverFeatureStep > > > Key: SPARK-38945 > URL: https://issues.apache.org/jira/browse/SPARK-38945 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.1 >Reporter: Qian Sun >Priority: Minor > Fix For: 3.5.0 > > > Simply KEYTAB and PRINCIPAL in KerberosConfDriverFeatureStep, because already > imported -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43064) Spark SQL CLI SQL tab should only show once statement once
[ https://issues.apache.org/jira/browse/SPARK-43064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-43064. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40701 [https://github.com/apache/spark/pull/40701] > Spark SQL CLI SQL tab should only show once statement once > -- > > Key: SPARK-43064 > URL: https://issues.apache.org/jira/browse/SPARK-43064 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.5.0 > > Attachments: screenshot-1.png > > > !screenshot-1.png|width=996,height=554! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43064) Spark SQL CLI SQL tab should only show once statement once
[ https://issues.apache.org/jira/browse/SPARK-43064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-43064: Assignee: angerszhu > Spark SQL CLI SQL tab should only show once statement once > -- > > Key: SPARK-43064 > URL: https://issues.apache.org/jira/browse/SPARK-43064 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Attachments: screenshot-1.png > > > !screenshot-1.png|width=996,height=554! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43104) Set `shadeTestJar` of protobuf module to false
[ https://issues.apache.org/jira/browse/SPARK-43104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-43104: Assignee: Yang Jie > Set `shadeTestJar` of protobuf module to false > -- > > Key: SPARK-43104 > URL: https://issues.apache.org/jira/browse/SPARK-43104 > Project: Spark > Issue Type: Improvement > Components: Protobuf >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43104) Set `shadeTestJar` of protobuf module to false
[ https://issues.apache.org/jira/browse/SPARK-43104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-43104. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40753 [https://github.com/apache/spark/pull/40753] > Set `shadeTestJar` of protobuf module to false > -- > > Key: SPARK-43104 > URL: https://issues.apache.org/jira/browse/SPARK-43104 > Project: Spark > Issue Type: Improvement > Components: Protobuf >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43142) DSL expressions fail on attribute with special characters
[ https://issues.apache.org/jira/browse/SPARK-43142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712395#comment-17712395 ] Willi Raschkowski commented on SPARK-43142: --- https://github.com/apache/spark/pull/40794 > DSL expressions fail on attribute with special characters > - > > Key: SPARK-43142 > URL: https://issues.apache.org/jira/browse/SPARK-43142 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Willi Raschkowski >Priority: Major > > Expressions on implicitly converted attributes fail if the attributes have > names containing special characters. They fail even if the attributes are > backtick-quoted: > {code:java} > scala> import org.apache.spark.sql.catalyst.dsl.expressions._ > import org.apache.spark.sql.catalyst.dsl.expressions._ > scala> "`slashed/col`".attr > res0: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = > 'slashed/col > scala> "`slashed/col`".attr.asc > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '/' expecting {, '.', '-'}(line 1, pos 7) > == SQL == > slashed/col > ---^^^ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43142) DSL expressions fail on attribute with special characters
[ https://issues.apache.org/jira/browse/SPARK-43142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712383#comment-17712383 ] Willi Raschkowski commented on SPARK-43142: --- The solution I'd propose is to have {{DslAttr.attr}} return the attribute it's wrapping instead of creating a new attribute. > DSL expressions fail on attribute with special characters > - > > Key: SPARK-43142 > URL: https://issues.apache.org/jira/browse/SPARK-43142 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Willi Raschkowski >Priority: Major > > Expressions on implicitly converted attributes fail if the attributes have > names containing special characters. They fail even if the attributes are > backtick-quoted: > {code:java} > scala> import org.apache.spark.sql.catalyst.dsl.expressions._ > import org.apache.spark.sql.catalyst.dsl.expressions._ > scala> "`slashed/col`".attr > res0: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = > 'slashed/col > scala> "`slashed/col`".attr.asc > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '/' expecting {, '.', '-'}(line 1, pos 7) > == SQL == > slashed/col > ---^^^ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43142) DSL expressions fail on attribute with special characters
[ https://issues.apache.org/jira/browse/SPARK-43142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712383#comment-17712383 ] Willi Raschkowski edited comment on SPARK-43142 at 4/14/23 1:18 PM: The solution I'd propose is to have {{DslAttr.attr}} return the attribute it's wrapping instead of creating a new attribute. I'll put up a PR. was (Author: raschkowski): The solution I'd propose is to have {{DslAttr.attr}} return the attribute it's wrapping instead of creating a new attribute. > DSL expressions fail on attribute with special characters > - > > Key: SPARK-43142 > URL: https://issues.apache.org/jira/browse/SPARK-43142 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Willi Raschkowski >Priority: Major > > Expressions on implicitly converted attributes fail if the attributes have > names containing special characters. They fail even if the attributes are > backtick-quoted: > {code:java} > scala> import org.apache.spark.sql.catalyst.dsl.expressions._ > import org.apache.spark.sql.catalyst.dsl.expressions._ > scala> "`slashed/col`".attr > res0: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = > 'slashed/col > scala> "`slashed/col`".attr.asc > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '/' expecting {, '.', '-'}(line 1, pos 7) > == SQL == > slashed/col > ---^^^ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43142) DSL expressions fail on attribute with special characters
[ https://issues.apache.org/jira/browse/SPARK-43142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712382#comment-17712382 ] Willi Raschkowski commented on SPARK-43142: --- Here's what's happening: {{ImplicitOperators}} methods like {{asc}} rely on a call to {{expr}} [(Github)|https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L149]. The {{UnresolvedAttribute}} returned by {{.attr}} is implicitly converted to {{DslAttr}}. But {{DslAttr}} does not implement {{expr}} by returning the attribute it's already wrapping. Instead, it only implements how to convert the attribute it's wrapping to a string name [(Github)|https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L273-L275]. Returning an attribute for an implicitly wrapped attribute is implemented on the super class {{ImplicitAttribute}} by creating a new {{UnresolvedAttribute}} on the string name return by {{DslAttr}} (the method call {{s}}, [Github|https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L278-L280]). The problem is that this string name returned by {{DslAttr}} no longer has the quotes and thus the new {{UnresolvedAttribute}} parses an unquoted identifier. {code} scala> "`col/slash`".attr.name res1: String = col/slash {code} > DSL expressions fail on attribute with special characters > - > > Key: SPARK-43142 > URL: https://issues.apache.org/jira/browse/SPARK-43142 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Willi Raschkowski >Priority: Major > > Expressions on implicitly converted attributes fail if the attributes have > names containing special characters. They fail even if the attributes are > backtick-quoted: > {code:java} > scala> import org.apache.spark.sql.catalyst.dsl.expressions._ > import org.apache.spark.sql.catalyst.dsl.expressions._ > scala> "`slashed/col`".attr > res0: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = > 'slashed/col > scala> "`slashed/col`".attr.asc > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '/' expecting {, '.', '-'}(line 1, pos 7) > == SQL == > slashed/col > ---^^^ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43142) DSL expressions fail on attribute with special characters
Willi Raschkowski created SPARK-43142: - Summary: DSL expressions fail on attribute with special characters Key: SPARK-43142 URL: https://issues.apache.org/jira/browse/SPARK-43142 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Willi Raschkowski Expressions on implicitly converted attributes fail if the attributes have names containing special characters. They fail even if the attributes are backtick-quoted: {code:java} scala> import org.apache.spark.sql.catalyst.dsl.expressions._ import org.apache.spark.sql.catalyst.dsl.expressions._ scala> "`slashed/col`".attr res0: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = 'slashed/col scala> "`slashed/col`".attr.asc org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '/' expecting {, '.', '-'}(line 1, pos 7) == SQL == slashed/col ---^^^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43141) Ignore generated Java files in checkstyle
Hyukjin Kwon created SPARK-43141: Summary: Ignore generated Java files in checkstyle Key: SPARK-43141 URL: https://issues.apache.org/jira/browse/SPARK-43141 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.4.1 Reporter: Hyukjin Kwon Files such as {{.../spark/core/target/scala-2.12/src_managed/main/org/apache/spark/status/protobuf/StoreTypes.java}} are checked in checkstyle. We shouldn't check them in the linter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30552) Chained spark column expressions with distinct windows specs produce inefficient DAG
[ https://issues.apache.org/jira/browse/SPARK-30552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712339#comment-17712339 ] Dovi Joel commented on SPARK-30552: --- Is this resolved with [SPARK-41805] Reuse expressions in WindowSpecDefinition - ASF JIRA (apache.org)? > Chained spark column expressions with distinct windows specs produce > inefficient DAG > > > Key: SPARK-30552 > URL: https://issues.apache.org/jira/browse/SPARK-30552 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 2.4.4 > Environment: python : 3.6.9.final.0 > python-bits : 64 > OS : Windows > OS-release : 10 > machine : AMD64 > processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel > pyspark: 2.4.4 > pandas : 0.25.3 > numpy : 1.17.4 > pyarrow : 0.15.1 >Reporter: Franz >Priority: Major > > h2. Context > Let's say you deal with time series data. Your desired outcome relies on > multiple window functions with distinct window specifications. The result may > resemble a single spark column expression, like an identifier for intervals. > h2. Status Quo > Usually, I don't store intermediate results with `df.withColumn` but rather > chain/stack column expressions and trust Spark to find the most effective DAG > (when dealing with DataFrame). > h2. Reproducible example > However, in the following example (PySpark 2.4.4 standalone), storing an > intermediate result with `df.withColumn` reduces the DAG complexity. Let's > consider following test setup: > {code:python} > import pandas as pd > import numpy as np > from pyspark.sql import SparkSession, Window > from pyspark.sql import functions as F > spark = SparkSession.builder.getOrCreate() > dfp = pd.DataFrame( > { > "col1": np.random.randint(0, 5, size=100), > "col2": np.random.randint(0, 5, size=100), > "col3": np.random.randint(0, 5, size=100), > "col4": np.random.randint(0, 5, size=100), > } > ) > df = spark.createDataFrame(dfp) > df.show(5) > +++++ > |col1|col2|col3|col4| > +++++ > | 1| 2| 4| 1| > | 0| 2| 3| 0| > | 2| 0| 1| 0| > | 4| 1| 1| 2| > | 1| 3| 0| 4| > +++++ > only showing top 5 rows > {code} > The computation is arbitrary. Basically we have 2 window specs and 3 > computational steps. The 3 computational steps are dependend on each other > and use alternating window specs: > {code:python} > w1 = Window.partitionBy("col1").orderBy("col2") > w2 = Window.partitionBy("col3").orderBy("col4") > # first step, arbitrary window func over 1st window > step1 = F.lag("col3").over(w1) > # second step, arbitrary window func over 2nd window with step 1 > step2 = F.lag(step1).over(w2) > # third step, arbitrary window func over 1st window with step 2 > step3 = F.when(step2 > 1, F.max(step2).over(w1)) > df_result = df.withColumn("result", step3) > {code} > Inspecting the phyiscal plan via `df_result.explain()` reveals 4 exchanges > and sorts! However, only 3 should be necessary here because we change the > window spec only twice. > {code:python} > df_result.explain() > == Physical Plan == > *(7) Project [col1#0L, col2#1L, col3#2L, col4#3L, CASE WHEN (_we0#25L > 1) > THEN _we1#26L END AS result#22L] > +- Window [lag(_w0#23L, 1, null) windowspecdefinition(col3#2L, col4#3L ASC > NULLS FIRST, specifiedwindowframe(RowFrame, -1, -1)) AS _we0#25L], [col3#2L], > [col4#3L ASC NULLS FIRST] >+- *(6) Sort [col3#2L ASC NULLS FIRST, col4#3L ASC NULLS FIRST], false, 0 > +- Exchange hashpartitioning(col3#2L, 200) > +- *(5) Project [col1#0L, col2#1L, col3#2L, col4#3L, _w0#23L, > _we1#26L] > +- Window [max(_w1#24L) windowspecdefinition(col1#0L, col2#1L ASC > NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), > currentrow$())) AS _we1#26L], [col1#0L], [col2#1L ASC NULLS FIRST] >+- *(4) Sort [col1#0L ASC NULLS FIRST, col2#1L ASC NULLS > FIRST], false, 0 > +- Exchange hashpartitioning(col1#0L, 200) > +- *(3) Project [col1#0L, col2#1L, col3#2L, col4#3L, > _w0#23L, _w1#24L] > +- Window [lag(_w0#27L, 1, null) > windowspecdefinition(col3#2L, col4#3L ASC NULLS FIRST, > specifiedwindowframe(RowFrame, -1, -1)) AS _w1#24L], [col3#2L], [col4#3L ASC > NULLS FIRST] >+- *(2) Sort [col3#2L ASC NULLS FIRST, col4#3L ASC > NULLS FIRST], false, 0 > +- Exchange hashpartitioning(col3#2L, 200) > +- Window [lag(col3#2L, 1, null) > windowspecdefinition(col1#0L, col2#1L ASC NULLS FIRST, > specifiedwindowframe(RowFrame, -1, -1)) AS _w0#27L, lag(col3#2L, 1, null) >
[jira] [Commented] (SPARK-43140) Override computeStats in DummyLeafNode
[ https://issues.apache.org/jira/browse/SPARK-43140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712315#comment-17712315 ] Yuming Wang commented on SPARK-43140: - https://github.com/apache/spark/pull/40791 > Override computeStats in DummyLeafNode > -- > > Key: SPARK-43140 > URL: https://issues.apache.org/jira/browse/SPARK-43140 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43140) Override computeStats in DummyLeafNode
Yuming Wang created SPARK-43140: --- Summary: Override computeStats in DummyLeafNode Key: SPARK-43140 URL: https://issues.apache.org/jira/browse/SPARK-43140 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 3.5.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43123) special internal field metadata should not be leaked to catalogs
[ https://issues.apache.org/jira/browse/SPARK-43123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-43123. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40776 [https://github.com/apache/spark/pull/40776] > special internal field metadata should not be leaked to catalogs > > > Key: SPARK-43123 > URL: https://issues.apache.org/jira/browse/SPARK-43123 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43123) special internal field metadata should not be leaked to catalogs
[ https://issues.apache.org/jira/browse/SPARK-43123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-43123: --- Assignee: Wenchen Fan > special internal field metadata should not be leaked to catalogs > > > Key: SPARK-43123 > URL: https://issues.apache.org/jira/browse/SPARK-43123 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43139) Bug in INSERT INTO documentation
Bjorn Olsen created SPARK-43139: --- Summary: Bug in INSERT INTO documentation Key: SPARK-43139 URL: https://issues.apache.org/jira/browse/SPARK-43139 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 3.3.2 Reporter: Bjorn Olsen I think there is a bug in this page [https://spark.apache.org/docs/3.1.2/sql-ref-syntax-dml-insert-into.html] The following SQL statement does not look valid based on the contents of the "applicants" table. {code:java} INSERT INTO students FROM applicants SELECT name, address, id applicants WHERE qualified = true; {code} Specifically, "id applicants" should possibly be changed to "student_id" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43138) ClassNotFoundException during RDD block replication/migration
[ https://issues.apache.org/jira/browse/SPARK-43138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emil Ejbyfeldt updated SPARK-43138: --- Summary: ClassNotFoundException during RDD block replication/migration (was: ClassNotFound during RDD block replication/migration) > ClassNotFoundException during RDD block replication/migration > - > > Key: SPARK-43138 > URL: https://issues.apache.org/jira/browse/SPARK-43138 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0, 3.5.0 >Reporter: Emil Ejbyfeldt >Priority: Major > > During RDD block migration during decommissioning we are seeing > `ClassNotFoundException` on the receiving Executor. This seems to happen when > the blocks contain classes that are from the user jars. > ``` > 2023-04-08 04:15:11,791 ERROR server.TransportRequestHandler: Error while > invoking RpcHandler#receive() on RPC id 6425687122551756860 > java.lang.ClassNotFoundException: com.class.from.user.jar.ClassName > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:398) > at > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:71) > at > java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2003) > at > java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1870) > at > java.base/java.io.ObjectInputStream.readClass(ObjectInputStream.java:1833) > at > java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1658) > at > java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2496) > at > java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2390) > at > java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2228) > at > java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687) > at > java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2496) > at > java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2390) > at > java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2228) > at > java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687) > at > java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:489) > at > java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:447) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:123) > at > org.apache.spark.network.netty.NettyBlockRpcServer.deserializeMetadata(NettyBlockRpcServer.scala:180) > at > org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:119) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at >
[jira] [Commented] (SPARK-43138) ClassNotFound during RDD block replication/migration
[ https://issues.apache.org/jira/browse/SPARK-43138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712212#comment-17712212 ] Yuming Wang commented on SPARK-43138: - Did you set some config to `com.class.from.user.jar.ClassName`? > ClassNotFound during RDD block replication/migration > > > Key: SPARK-43138 > URL: https://issues.apache.org/jira/browse/SPARK-43138 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0, 3.5.0 >Reporter: Emil Ejbyfeldt >Priority: Major > > During RDD block migration during decommissioning we are seeing > `ClassNotFoundException` on the receiving Executor. This seems to happen when > the blocks contain classes that are from the user jars. > ``` > 2023-04-08 04:15:11,791 ERROR server.TransportRequestHandler: Error while > invoking RpcHandler#receive() on RPC id 6425687122551756860 > java.lang.ClassNotFoundException: com.class.from.user.jar.ClassName > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:398) > at > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:71) > at > java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2003) > at > java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1870) > at > java.base/java.io.ObjectInputStream.readClass(ObjectInputStream.java:1833) > at > java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1658) > at > java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2496) > at > java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2390) > at > java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2228) > at > java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687) > at > java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2496) > at > java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2390) > at > java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2228) > at > java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687) > at > java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:489) > at > java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:447) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:123) > at > org.apache.spark.network.netty.NettyBlockRpcServer.deserializeMetadata(NettyBlockRpcServer.scala:180) > at > org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:119) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at >
[jira] [Created] (SPARK-43138) ClassNotFound during RDD block replication/migration
Emil Ejbyfeldt created SPARK-43138: -- Summary: ClassNotFound during RDD block replication/migration Key: SPARK-43138 URL: https://issues.apache.org/jira/browse/SPARK-43138 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.2, 3.4.0, 3.5.0 Reporter: Emil Ejbyfeldt During RDD block migration during decommissioning we are seeing `ClassNotFoundException` on the receiving Executor. This seems to happen when the blocks contain classes that are from the user jars. ``` 2023-04-08 04:15:11,791 ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 6425687122551756860 java.lang.ClassNotFoundException: com.class.from.user.jar.ClassName at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:398) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:71) at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2003) at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1870) at java.base/java.io.ObjectInputStream.readClass(ObjectInputStream.java:1833) at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1658) at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2496) at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2390) at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2228) at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687) at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2496) at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2390) at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2228) at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687) at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:489) at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:447) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:123) at org.apache.spark.network.netty.NettyBlockRpcServer.deserializeMetadata(NettyBlockRpcServer.scala:180) at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:119) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at
[jira] [Comment Edited] (SPARK-43113) Codegen error when full outer join's bound condition has multiple references to the same stream-side column
[ https://issues.apache.org/jira/browse/SPARK-43113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711614#comment-17711614 ] Bruce Robbins edited comment on SPARK-43113 at 4/14/23 6:02 AM: PR here: https://github.com/apache/spark/pull/40766 was (Author: bersprockets): PR here: https://github.com/apache/spark/pull/40766/files > Codegen error when full outer join's bound condition has multiple references > to the same stream-side column > --- > > Key: SPARK-43113 > URL: https://issues.apache.org/jira/browse/SPARK-43113 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.0, 3.5.0 >Reporter: Bruce Robbins >Priority: Major > > Example # 1 (sort merge join): > {noformat} > create or replace temp view v1 as > select * from values > (1, 1), > (2, 2), > (3, 1) > as v1(key, value); > create or replace temp view v2 as > select * from values > (1, 22, 22), > (3, -1, -1), > (7, null, null) > as v2(a, b, c); > select * > from v1 > full outer join v2 > on key = a > and value > b > and value > c; > {noformat} > The join's generated code causes the following compilation error: > {noformat} > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 277, Column 9: Redefinition of local variable "smj_isNull_7" > {noformat} > Example #2 (shuffle hash join): > {noformat} > select /*+ SHUFFLE_HASH(v2) */ * > from v1 > full outer join v2 > on key = a > and value > b > and value > c; > {noformat} > The shuffle hash join's generated code causes the following compilation error: > {noformat} > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 174, Column 5: Redefinition of local variable "shj_value_1" > {noformat} > With default configuration, both queries end up succeeding, since Spark falls > back to running each query with whole-stage codegen disabled. > The issue happens only when the join's bound condition refers to the same > stream-side column more than once. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org