[jira] [Commented] (SPARK-36086) The case of the delta table is inconsistent with parquet
[ https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395858#comment-17395858 ] Apache Spark commented on SPARK-36086: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33686 > The case of the delta table is inconsistent with parquet > > > Key: SPARK-36086 > URL: https://issues.apache.org/jira/browse/SPARK-36086 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.1 >Reporter: Yuming Wang >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > How to reproduce this issue: > {noformat} > 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars. > 2. bin/spark-shell --conf > spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf > spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog > {noformat} > {code:scala} > spark.sql("create table t1 using parquet as select id, id as lower_id from > range(5)") > spark.sql("CREATE VIEW v1 as SELECT * FROM t1") > spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("desc extended t2").show(false) > spark.sql("desc extended t3").show(false) > {code} > {noformat} > scala> spark.sql("desc extended t2").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |lower_id|bigint > | | > |id |bigint > | | > || > | | > |# Partitioning | > | | > |Part 0 |lower_id > | | > || > | | > |# Detailed Table Information| > | | > |Name|default.t2 > | | > |Location > |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2| > | > |Provider|delta > | | > |Table Properties > |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2] | > | > ++--+---+ > scala> spark.sql("desc extended t3").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |ID |bigint > |null | > |LOWER_ID|bigint > |null | > |# Partition Information | > | | > |# col_name |data_type > |comment| > |LOWER_ID|bigint > |null | > || > | | > |# Detailed Table Information| > | | > |Database|default > | | > |Table |t3 > | | > |Owner |yumwang
[jira] [Commented] (SPARK-36086) The case of the delta table is inconsistent with parquet
[ https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395856#comment-17395856 ] Apache Spark commented on SPARK-36086: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33685 > The case of the delta table is inconsistent with parquet > > > Key: SPARK-36086 > URL: https://issues.apache.org/jira/browse/SPARK-36086 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.1 >Reporter: Yuming Wang >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > How to reproduce this issue: > {noformat} > 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars. > 2. bin/spark-shell --conf > spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf > spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog > {noformat} > {code:scala} > spark.sql("create table t1 using parquet as select id, id as lower_id from > range(5)") > spark.sql("CREATE VIEW v1 as SELECT * FROM t1") > spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("desc extended t2").show(false) > spark.sql("desc extended t3").show(false) > {code} > {noformat} > scala> spark.sql("desc extended t2").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |lower_id|bigint > | | > |id |bigint > | | > || > | | > |# Partitioning | > | | > |Part 0 |lower_id > | | > || > | | > |# Detailed Table Information| > | | > |Name|default.t2 > | | > |Location > |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2| > | > |Provider|delta > | | > |Table Properties > |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2] | > | > ++--+---+ > scala> spark.sql("desc extended t3").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |ID |bigint > |null | > |LOWER_ID|bigint > |null | > |# Partition Information | > | | > |# col_name |data_type > |comment| > |LOWER_ID|bigint > |null | > || > | | > |# Detailed Table Information| > | | > |Database|default > | | > |Table |t3 > | | > |Owner |yumwang
[jira] [Commented] (SPARK-36429) JacksonParser should throw exception when data type unsupported.
[ https://issues.apache.org/jira/browse/SPARK-36429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395857#comment-17395857 ] Apache Spark commented on SPARK-36429: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/33684 > JacksonParser should throw exception when data type unsupported. > > > Key: SPARK-36429 > URL: https://issues.apache.org/jira/browse/SPARK-36429 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.2.0 > > > Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is > different between from_json and from_csv. > {code:java} > -- !query > select from_json('{"t":"26/October/2015"}', 't Timestamp', > map('timestampFormat', 'dd/M/')) > -- !query schema > struct> > -- !query output > {"t":null} > {code} > {code:java} > -- !query > select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', > 'dd/M/')) > -- !query schema > struct<> > -- !query output > java.lang.Exception > Unsupported type: timestamp_ntz > {code} > We should make from_json throws exception too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36457) Review and fix issues in API docs
[ https://issues.apache.org/jira/browse/SPARK-36457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395848#comment-17395848 ] Gengliang Wang commented on SPARK-36457: [~beliefer][~linhongliu-db] are you interested in this one? > Review and fix issues in API docs > - > > Key: SPARK-36457 > URL: https://issues.apache.org/jira/browse/SPARK-36457 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Blocker > > Compare the 3.2.0 API doc with the latest release version 3.1.2. Fix the > following issues: > * Add missing `Since` annotation for new APIs > * Remove the leaking class/object in API doc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36457) Review and fix issues in API docs
[ https://issues.apache.org/jira/browse/SPARK-36457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395848#comment-17395848 ] Gengliang Wang edited comment on SPARK-36457 at 8/9/21, 6:21 AM: - [~beliefer] [~linhongliu-db] are you interested in this one? was (Author: gengliang.wang): [~beliefer][~linhongliu-db] are you interested in this one? > Review and fix issues in API docs > - > > Key: SPARK-36457 > URL: https://issues.apache.org/jira/browse/SPARK-36457 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Blocker > > Compare the 3.2.0 API doc with the latest release version 3.1.2. Fix the > following issues: > * Add missing `Since` annotation for new APIs > * Remove the leaking class/object in API doc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36457) Review and fix issues in API docs
[ https://issues.apache.org/jira/browse/SPARK-36457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-36457: Target Version/s: 3.2.0 > Review and fix issues in API docs > - > > Key: SPARK-36457 > URL: https://issues.apache.org/jira/browse/SPARK-36457 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Blocker > > Compare the 3.2.0 API doc with the latest release version 3.1.2. Fix the > following issues: > * Add missing `Since` annotation for new APIs > * Remove the leaking class/object in API doc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36457) Review and fix issues in API docs
[ https://issues.apache.org/jira/browse/SPARK-36457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-36457: Priority: Blocker (was: Major) > Review and fix issues in API docs > - > > Key: SPARK-36457 > URL: https://issues.apache.org/jira/browse/SPARK-36457 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Blocker > > Compare the 3.2.0 API doc with the latest release version 3.1.2. Fix the > following issues: > * Add missing `Since` annotation for new APIs > * Remove the leaking class/object in API doc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36041) Introduce the RocksDBStateStoreProvider in the programming guide
[ https://issues.apache.org/jira/browse/SPARK-36041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395841#comment-17395841 ] Apache Spark commented on SPARK-36041: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/33683 > Introduce the RocksDBStateStoreProvider in the programming guide > > > Key: SPARK-36041 > URL: https://issues.apache.org/jira/browse/SPARK-36041 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36041) Introduce the RocksDBStateStoreProvider in the programming guide
[ https://issues.apache.org/jira/browse/SPARK-36041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36041: Assignee: Apache Spark > Introduce the RocksDBStateStoreProvider in the programming guide > > > Key: SPARK-36041 > URL: https://issues.apache.org/jira/browse/SPARK-36041 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Assignee: Apache Spark >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36041) Introduce the RocksDBStateStoreProvider in the programming guide
[ https://issues.apache.org/jira/browse/SPARK-36041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395840#comment-17395840 ] Apache Spark commented on SPARK-36041: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/33683 > Introduce the RocksDBStateStoreProvider in the programming guide > > > Key: SPARK-36041 > URL: https://issues.apache.org/jira/browse/SPARK-36041 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35050) Deprecate Apache Mesos as resource manager
[ https://issues.apache.org/jira/browse/SPARK-35050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-35050: Labels: release-notes (was: ) > Deprecate Apache Mesos as resource manager > -- > > Key: SPARK-35050 > URL: https://issues.apache.org/jira/browse/SPARK-35050 > Project: Spark > Issue Type: Task > Components: Mesos, Spark Core >Affects Versions: 3.2.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Major > Labels: release-notes > Fix For: 3.2.0 > > > As highlighted in > https://lists.apache.org/thread.html/rab2a820507f7c846e54a847398ab20f47698ec5bce0c8e182bfe51ba%40%3Cdev.mesos.apache.org%3E > , Apache Mesos is moving to the attic and ceasing development. > We can/should maintain support for some time, but, can probably go ahead and > deprecate it now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36041) Introduce the RocksDBStateStoreProvider in the programming guide
[ https://issues.apache.org/jira/browse/SPARK-36041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36041: Assignee: (was: Apache Spark) > Introduce the RocksDBStateStoreProvider in the programming guide > > > Key: SPARK-36041 > URL: https://issues.apache.org/jira/browse/SPARK-36041 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29330) Allow users to chose the name of Spark Shuffle service
[ https://issues.apache.org/jira/browse/SPARK-29330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-29330. - Fix Version/s: 3.2.0 Resolution: Duplicate > Allow users to chose the name of Spark Shuffle service > -- > > Key: SPARK-29330 > URL: https://issues.apache.org/jira/browse/SPARK-29330 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.1.0 >Reporter: Alexander Bessonov >Priority: Minor > Fix For: 3.2.0 > > > As of now, Spark uses hardcoded value {{spark_shuffle}} as the name of the > Shuffle Service. > HDP distribution of Spark, on the other hand, uses > [{{spark2_shuffle}}|https://github.com/hortonworks/spark2-release/blob/HDP-3.1.0.0-78-tag/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L117]. > This is done to be able to run both Spark 1.6 and Spark 2.x on the same > Hadoop cluster. > Running vanilla Spark on HDP cluster with only Spark 2.x shuffle service (HDP > favor) running becomes impossible due to the shuffle service name mismatch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36041) Introduce the RocksDBStateStoreProvider in the programming guide
[ https://issues.apache.org/jira/browse/SPARK-36041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395839#comment-17395839 ] Yuanjian Li commented on SPARK-36041: - [~Gengliang.Wang] Thanks for reminding, PR submitted. > Introduce the RocksDBStateStoreProvider in the programming guide > > > Key: SPARK-36041 > URL: https://issues.apache.org/jira/browse/SPARK-36041 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34828) YARN Shuffle Service: Support configurability of aux service name and service-specific config overrides
[ https://issues.apache.org/jira/browse/SPARK-34828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-34828: Labels: release-notes (was: ) > YARN Shuffle Service: Support configurability of aux service name and > service-specific config overrides > --- > > Key: SPARK-34828 > URL: https://issues.apache.org/jira/browse/SPARK-34828 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Labels: release-notes > Fix For: 3.2.0 > > > In some cases it may be desirable to run multiple instances of the Spark > Shuffle Service which are using different versions of Spark. This can be > helpful, for example, when running a YARN cluster with a mixed workload of > applications running multiple Spark versions, since a given version of the > shuffle service is not always compatible with other versions of Spark. (See > SPARK-27780 for more detail on this) > YARN versions since 2.9.0 support the ability to run shuffle services within > an isolated classloader (see YARN-4577), meaning multiple Spark versions can > coexist within a single NodeManager. > To support this from the Spark side, we need to make two enhancements: > * Make the name of the shuffle service configurable. Currently it is > hard-coded to be {{spark_shuffle}} on both the client and server side. The > server-side name is not actually used anywhere, as it is the value within the > {{yarn.nodemanager.aux-services}} which is considered by the NodeManager to > be definitive name. However, if you change this in the configs, the > hard-coded name within the client will no longer match. So, this needs to be > configurable. > * Add a way to separately configure the two shuffle service instances. Since > the configurations such as the port number are taken from the NodeManager > config, they will both try to use the same port, which obviously won't work. > So, we need to provide a way to selectively configure the two shuffle service > instances. I will go into details on my proposal for how to achieve this > within the PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34105) In addition to killing exlcuded/flakey executors which should support decommissioning
[ https://issues.apache.org/jira/browse/SPARK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395833#comment-17395833 ] Gengliang Wang commented on SPARK-34105: [~holden][~hyukjin.kwon]Shall we mark this as done? > In addition to killing exlcuded/flakey executors which should support > decommissioning > - > > Key: SPARK-34105 > URL: https://issues.apache.org/jira/browse/SPARK-34105 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > Decommissioning will give the executor a chance to migrate it's files to a > more stable node. > > Note: we want SPARK-34104 to be integrated as well so that flaky executors > which can not decommission are eventually killed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34104) Allow users to specify a maximum decommissioning time
[ https://issues.apache.org/jira/browse/SPARK-34104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395832#comment-17395832 ] Gengliang Wang commented on SPARK-34104: [~holden][~hyukjin.kwon]Shall we mark this as done? > Allow users to specify a maximum decommissioning time > - > > Key: SPARK-34104 > URL: https://issues.apache.org/jira/browse/SPARK-34104 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0, 3.1.1, 3.2.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > We currently have the ability for users to set the predicted time of the > cluster manager or cloud provider to terminate a decommissioning executor, > but for nodes where Spark it's self is triggering decommissioning we should > add the ability of users to specify a maximum time we want to allow the > executor to decommission. > > This is important especially if we start to in more places (like with > excluded executors that are found to be flaky, that may or may not be able to > decommission successfully). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36457) Review and fix issues in API docs
Gengliang Wang created SPARK-36457: -- Summary: Review and fix issues in API docs Key: SPARK-36457 URL: https://issues.apache.org/jira/browse/SPARK-36457 Project: Spark Issue Type: Improvement Components: docs Affects Versions: 3.2.0 Reporter: Gengliang Wang Compare the 3.2.0 API doc with the latest release version 3.1.2. Fix the following issues: * Add missing `Since` annotation for new APIs * Remove the leaking class/object in API doc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34198) Add RocksDB StateStore implementation
[ https://issues.apache.org/jira/browse/SPARK-34198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395822#comment-17395822 ] Yuanjian Li commented on SPARK-34198: - [~Gengliang.Wang] Thanks for reminding. I'll submit the document PR now. > Add RocksDB StateStore implementation > - > > Key: SPARK-34198 > URL: https://issues.apache.org/jira/browse/SPARK-34198 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Currently Spark SS only has one built-in StateStore implementation > HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As > there are more and more streaming applications, some of them requires to use > large state in stateful operations such as streaming aggregation and join. > Several other major streaming frameworks already use RocksDB for state > management. So it is proven to be good choice for large state usage. But > Spark SS still lacks of a built-in state store for the requirement. > We would like to explore the possibility to add RocksDB-based StateStore into > Spark SS. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36456) Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly
[ https://issues.apache.org/jira/browse/SPARK-36456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395737#comment-17395737 ] Apache Spark commented on SPARK-36456: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/33682 > Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly > > > Key: SPARK-36456 > URL: https://issues.apache.org/jira/browse/SPARK-36456 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.2 >Reporter: Yang Jie >Priority: Minor > > Compilation warnings related to `method closeQuietly in class IOUtils is > deprecated` are as follows: > {code:java} > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344: > [deprecation @ > org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307: > [deprecation @ > org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: > [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: > [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97: > [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98: > [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383: > [deprecation @ > org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: > [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: > [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150: > [deprecation @ > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66: > [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545: > [deprecation @ > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461: > [deprecation @ > org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is depreca
[jira] [Assigned] (SPARK-36456) Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly
[ https://issues.apache.org/jira/browse/SPARK-36456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36456: Assignee: (was: Apache Spark) > Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly > > > Key: SPARK-36456 > URL: https://issues.apache.org/jira/browse/SPARK-36456 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.2 >Reporter: Yang Jie >Priority: Minor > > Compilation warnings related to `method closeQuietly in class IOUtils is > deprecated` are as follows: > {code:java} > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344: > [deprecation @ > org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307: > [deprecation @ > org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: > [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: > [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97: > [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98: > [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383: > [deprecation @ > org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: > [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: > [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150: > [deprecation @ > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66: > [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545: > [deprecation @ > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461: > [deprecation @ > org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileMan
[jira] [Assigned] (SPARK-36456) Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly
[ https://issues.apache.org/jira/browse/SPARK-36456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36456: Assignee: Apache Spark > Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly > > > Key: SPARK-36456 > URL: https://issues.apache.org/jira/browse/SPARK-36456 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.2 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > Compilation warnings related to `method closeQuietly in class IOUtils is > deprecated` are as follows: > {code:java} > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344: > [deprecation @ > org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307: > [deprecation @ > org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: > [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: > [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97: > [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98: > [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383: > [deprecation @ > org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: > [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: > [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150: > [deprecation @ > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66: > [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545: > [deprecation @ > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461: > [deprecation @ > org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/strea
[jira] [Assigned] (SPARK-36455) Provide an example of complex session window via flatMapGroupsWithState
[ https://issues.apache.org/jira/browse/SPARK-36455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36455: Assignee: (was: Apache Spark) > Provide an example of complex session window via flatMapGroupsWithState > --- > > Key: SPARK-36455 > URL: https://issues.apache.org/jira/browse/SPARK-36455 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Priority: Major > > Now that we replaced sessionization example with native support of session > window, we may want to provide another example of session window which can > only be dealt with flatMapGroupsWithState. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36455) Provide an example of complex session window via flatMapGroupsWithState
[ https://issues.apache.org/jira/browse/SPARK-36455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395736#comment-17395736 ] Apache Spark commented on SPARK-36455: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/33681 > Provide an example of complex session window via flatMapGroupsWithState > --- > > Key: SPARK-36455 > URL: https://issues.apache.org/jira/browse/SPARK-36455 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Priority: Major > > Now that we replaced sessionization example with native support of session > window, we may want to provide another example of session window which can > only be dealt with flatMapGroupsWithState. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36455) Provide an example of complex session window via flatMapGroupsWithState
[ https://issues.apache.org/jira/browse/SPARK-36455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36455: Assignee: Apache Spark > Provide an example of complex session window via flatMapGroupsWithState > --- > > Key: SPARK-36455 > URL: https://issues.apache.org/jira/browse/SPARK-36455 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Assignee: Apache Spark >Priority: Major > > Now that we replaced sessionization example with native support of session > window, we may want to provide another example of session window which can > only be dealt with flatMapGroupsWithState. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36456) Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly
[ https://issues.apache.org/jira/browse/SPARK-36456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-36456: - Summary: Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly (was: Clean up the depredation use of o.a.c.io.IOUtils.closeQuietly) > Clean up the depredation usage of o.a.c.io.IOUtils.closeQuietly > > > Key: SPARK-36456 > URL: https://issues.apache.org/jira/browse/SPARK-36456 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.2 >Reporter: Yang Jie >Priority: Minor > > Compilation warnings related to `method closeQuietly in class IOUtils is > deprecated` are as follows: > {code:java} > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344: > [deprecation @ > org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307: > [deprecation @ > org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: > [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: > [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97: > [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98: > [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383: > [deprecation @ > org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: > [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: > [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | > origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150: > [deprecation @ > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66: > [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545: > [deprecation @ > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461: > [deprecation @ > org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile > | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method > closeQuietly in class IOUtils is deprecated > [WARNING] > /s
[jira] [Created] (SPARK-36456) Clean up the depredation use of o.a.c.io.IOUtils.closeQuietly
Yang Jie created SPARK-36456: Summary: Clean up the depredation use of o.a.c.io.IOUtils.closeQuietly Key: SPARK-36456 URL: https://issues.apache.org/jira/browse/SPARK-36456 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 3.1.2 Reporter: Yang Jie Compilation warnings related to `method closeQuietly in class IOUtils is deprecated` are as follows: {code:java} [WARNING] /spark-source/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:344: [deprecation @ org.apache.spark.storage.BlockManager.BlockStoreUpdater.saveDeserializedValuesToMemoryStore | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:1307: [deprecation @ org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3142: [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:3143: [deprecation @ org.apache.spark.util.Utils.unzipFilesFromFile | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:97: [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala:98: [deprecation @ org.apache.spark.util.logging.RollingFileAppender.rotateFile | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala:383: [deprecation @ org.apache.spark.util.FileAppenderSuite.testRolling.allText.$anonfun | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:248: [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala:249: [deprecation @ org.apache.spark.util.UtilsSuite..$anonfun | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala:150: [deprecation @ org.apache.spark.sql.execution.streaming.HDFSMetadataLog.applyFnToBatchByStream | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala:66: [deprecation @ org.apache.spark.sql.execution.streaming.StreamMetadata.read | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:545: [deprecation @ org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.cancelDeltaFile | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:461: [deprecation @ org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated [WARNING] /spark-source/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:462: [deprecation @ org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile | origin=org.apache.commons.io.IOUtils.closeQuietly | version=] method closeQuietly in class IOUtils is deprecated {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.
[jira] [Updated] (SPARK-36455) Provide an example of complex session window via flatMapGroupsWithState
[ https://issues.apache.org/jira/browse/SPARK-36455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-36455: - Summary: Provide an example of complex session window via flatMapGroupsWithState (was: Provide an example of complex session window) > Provide an example of complex session window via flatMapGroupsWithState > --- > > Key: SPARK-36455 > URL: https://issues.apache.org/jira/browse/SPARK-36455 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Priority: Major > > Now that we replaced sessionization example with native support of session > window, we may want to provide another example of session window which can > only be dealt with flatMapGroupsWithState. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36455) Provide an example of complex session window
Jungtaek Lim created SPARK-36455: Summary: Provide an example of complex session window Key: SPARK-36455 URL: https://issues.apache.org/jira/browse/SPARK-36455 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.2.0 Reporter: Jungtaek Lim Now that we replaced sessionization example with native support of session window, we may want to provide another example of session window which can only be dealt with flatMapGroupsWithState. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36369) Fix Index.union to follow pandas 1.3
[ https://issues.apache.org/jira/browse/SPARK-36369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36369. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33634 [https://github.com/apache/spark/pull/33634] > Fix Index.union to follow pandas 1.3 > > > Key: SPARK-36369 > URL: https://issues.apache.org/jira/browse/SPARK-36369 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36369) Fix Index.union to follow pandas 1.3
[ https://issues.apache.org/jira/browse/SPARK-36369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36369: Assignee: Haejoon Lee > Fix Index.union to follow pandas 1.3 > > > Key: SPARK-36369 > URL: https://issues.apache.org/jira/browse/SPARK-36369 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32953) Lower memory usage in toPandas with Arrow self_destruct
[ https://issues.apache.org/jira/browse/SPARK-32953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-32953: Description: As described on the mailing list: [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Reducing-memory-usage-of-toPandas-with-Arrow-quot-self-destruct-quot-option-td30149.html] [https://lists.apache.org/thread.html/r581d7c82ada1c2ac3f0584615785cc60cf5ac231e1f29737d3a6569f%40%3Cdev.spark.apache.org%3E] toPandas() can as much as double memory usage as both Arrow and Pandas retain a copy of a dataframe in memory during the conversion. Arrow >= 0.16 offers a self_destruct mode that avoids this with some caveats. was: As described on the mailing list: [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Reducing-memory-usage-of-toPandas-with-Arrow-quot-self-destruct-quot-option-td30149.html] toPandas() can as much as double memory usage as both Arrow and Pandas retain a copy of a dataframe in memory during the conversion. Arrow >= 0.16 offers a self_destruct mode that avoids this with some caveats. > Lower memory usage in toPandas with Arrow self_destruct > --- > > Key: SPARK-32953 > URL: https://issues.apache.org/jira/browse/SPARK-32953 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.1 >Reporter: David Li >Assignee: David Li >Priority: Major > Fix For: 3.2.0 > > > As described on the mailing list: > > [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Reducing-memory-usage-of-toPandas-with-Arrow-quot-self-destruct-quot-option-td30149.html] > > [https://lists.apache.org/thread.html/r581d7c82ada1c2ac3f0584615785cc60cf5ac231e1f29737d3a6569f%40%3Cdev.spark.apache.org%3E] > toPandas() can as much as double memory usage as both Arrow and Pandas retain > a copy of a dataframe in memory during the conversion. Arrow >= 0.16 offers a > self_destruct mode that avoids this with some caveats. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36454) Not push down partition filter to ORCScan for DSv2
[ https://issues.apache.org/jira/browse/SPARK-36454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36454: Assignee: Apache Spark > Not push down partition filter to ORCScan for DSv2 > -- > > Key: SPARK-36454 > URL: https://issues.apache.org/jira/browse/SPARK-36454 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Minor > > Seems to me that partition filter is only used for partition pruning and > shouldn't be pushed down to ORCScan. We don't push down partition filter to > ORCScan in DSv1, and we don't push down partition filter for parquet in both > DSv1 and DSv2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36454) Not push down partition filter to ORCScan for DSv2
[ https://issues.apache.org/jira/browse/SPARK-36454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395683#comment-17395683 ] Apache Spark commented on SPARK-36454: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/33680 > Not push down partition filter to ORCScan for DSv2 > -- > > Key: SPARK-36454 > URL: https://issues.apache.org/jira/browse/SPARK-36454 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Minor > > Seems to me that partition filter is only used for partition pruning and > shouldn't be pushed down to ORCScan. We don't push down partition filter to > ORCScan in DSv1, and we don't push down partition filter for parquet in both > DSv1 and DSv2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36454) Not push down partition filter to ORCScan for DSv2
[ https://issues.apache.org/jira/browse/SPARK-36454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395684#comment-17395684 ] Apache Spark commented on SPARK-36454: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/33680 > Not push down partition filter to ORCScan for DSv2 > -- > > Key: SPARK-36454 > URL: https://issues.apache.org/jira/browse/SPARK-36454 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Minor > > Seems to me that partition filter is only used for partition pruning and > shouldn't be pushed down to ORCScan. We don't push down partition filter to > ORCScan in DSv1, and we don't push down partition filter for parquet in both > DSv1 and DSv2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36454) Not push down partition filter to ORCScan for DSv2
[ https://issues.apache.org/jira/browse/SPARK-36454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36454: Assignee: Apache Spark > Not push down partition filter to ORCScan for DSv2 > -- > > Key: SPARK-36454 > URL: https://issues.apache.org/jira/browse/SPARK-36454 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Minor > > Seems to me that partition filter is only used for partition pruning and > shouldn't be pushed down to ORCScan. We don't push down partition filter to > ORCScan in DSv1, and we don't push down partition filter for parquet in both > DSv1 and DSv2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36454) Not push down partition filter to ORCScan for DSv2
[ https://issues.apache.org/jira/browse/SPARK-36454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36454: Assignee: (was: Apache Spark) > Not push down partition filter to ORCScan for DSv2 > -- > > Key: SPARK-36454 > URL: https://issues.apache.org/jira/browse/SPARK-36454 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Minor > > Seems to me that partition filter is only used for partition pruning and > shouldn't be pushed down to ORCScan. We don't push down partition filter to > ORCScan in DSv1, and we don't push down partition filter for parquet in both > DSv1 and DSv2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36432) Upgrade Jetty version to 9.4.43
[ https://issues.apache.org/jira/browse/SPARK-36432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36432: Assignee: Sajith A > Upgrade Jetty version to 9.4.43 > --- > > Key: SPARK-36432 > URL: https://issues.apache.org/jira/browse/SPARK-36432 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Sajith A >Assignee: Sajith A >Priority: Minor > Fix For: 3.2.0 > > > Upgrade Jetty version to 9.4.43.v20210629 in current Spark master in order to > fix vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-34429. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36432) Upgrade Jetty version to 9.4.43
[ https://issues.apache.org/jira/browse/SPARK-36432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36432. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33656 [https://github.com/apache/spark/pull/33656] > Upgrade Jetty version to 9.4.43 > --- > > Key: SPARK-36432 > URL: https://issues.apache.org/jira/browse/SPARK-36432 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Sajith A >Priority: Minor > Fix For: 3.2.0 > > > Upgrade Jetty version to 9.4.43.v20210629 in current Spark master in order to > fix vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-34429. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36454) Not push down partition filter to ORCScan for DSv2
Huaxin Gao created SPARK-36454: -- Summary: Not push down partition filter to ORCScan for DSv2 Key: SPARK-36454 URL: https://issues.apache.org/jira/browse/SPARK-36454 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Huaxin Gao Seems to me that partition filter is only used for partition pruning and shouldn't be pushed down to ORCScan. We don't push down partition filter to ORCScan in DSv1, and we don't push down partition filter for parquet in both DSv1 and DSv2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36425) PySpark: support CrossValidatorModel get standard deviation of metrics for each paramMap
[ https://issues.apache.org/jira/browse/SPARK-36425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36425. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33652 [https://github.com/apache/spark/pull/33652] > PySpark: support CrossValidatorModel get standard deviation of metrics for > each paramMap > - > > Key: SPARK-36425 > URL: https://issues.apache.org/jira/browse/SPARK-36425 > Project: Spark > Issue Type: New Feature > Components: ML, PySpark >Affects Versions: 3.2.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > Fix For: 3.3.0 > > > PySpark: support CrossValidatorModel get standard deviation of metrics for > each paramMap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36453) Improve consistency processing floating point special literals
[ https://issues.apache.org/jira/browse/SPARK-36453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395658#comment-17395658 ] Pablo Langa Blanco commented on SPARK-36453: I'm working on it > Improve consistency processing floating point special literals > -- > > Key: SPARK-36453 > URL: https://issues.apache.org/jira/browse/SPARK-36453 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Pablo Langa Blanco >Priority: Minor > > Special literal in floating point are not consistent between cast and json > expressions > > {code:java} > scala> spark.sql("SELECT CAST('+Inf' as Double)").show > ++ > |CAST(+Inf AS DOUBLE)| > ++ > | Infinity| > ++ > {code} > > {code:java} > scala> val schema = StructType(StructField("a", DoubleType) :: Nil) > scala> Seq("""{"a" : > "+Inf"}""").toDF("col1").select(from_json(col("col1"),schema)).show > +---+ > |from_json(col1)| > +---+ > | {null}| > +---+ > scala> Seq("""{"a" : "+Inf"}""").toDF("col").withColumn("col", > from_json(col("col"), StructType.fromDDL("a > DOUBLE"))).write.json("/tmp/jsontests12345") > scala> > spark.read.schema(StructType(Seq(StructField("col",schema.json("/tmp/jsontests12345").show > +--+ > | col| > +--+ > |{null}| > +--+ > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36453) Improve consistency processing floating point special literals
Pablo Langa Blanco created SPARK-36453: -- Summary: Improve consistency processing floating point special literals Key: SPARK-36453 URL: https://issues.apache.org/jira/browse/SPARK-36453 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Pablo Langa Blanco Special literal in floating point are not consistent between cast and json expressions {code:java} scala> spark.sql("SELECT CAST('+Inf' as Double)").show ++ |CAST(+Inf AS DOUBLE)| ++ | Infinity| ++ {code} {code:java} scala> val schema = StructType(StructField("a", DoubleType) :: Nil) scala> Seq("""{"a" : "+Inf"}""").toDF("col1").select(from_json(col("col1"),schema)).show +---+ |from_json(col1)| +---+ | {null}| +---+ scala> Seq("""{"a" : "+Inf"}""").toDF("col").withColumn("col", from_json(col("col"), StructType.fromDDL("a DOUBLE"))).write.json("/tmp/jsontests12345") scala> spark.read.schema(StructType(Seq(StructField("col",schema.json("/tmp/jsontests12345").show +--+ | col| +--+ |{null}| +--+ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive
[ https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saurabh Chawla updated SPARK-36452: --- Description: Add the support in Spark for having group by map datatype column for the scenario that works in Hive. In hive the below scenario works {code:java} describe extended complex2; OK id string c1 map Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), FieldSchema(name:c1, type:map, comment:null)], location:/user/hive/warehouse/complex2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}) select * from complex2; OK 1 {1:"u"} 2 {1:"u",2:"uo"} 1 {1:"u",2:"uo"} Time taken: 0.363 seconds, Fetched: 3 row(s) Working Scenario in Hive -: select id, c1, count(*) from complex2 group by id, c1; OK 1 {1:"u"} 1 1 {1:"u",2:"uo"} 1 2 {1:"u",2:"uo"} 1 Time taken: 1.621 seconds, Fetched: 3 row(s) Failed Scenario in Hive -: failed when map type is present in aggregated expression select id, max(c1), count(*) from complex2 group by id, c1; FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or complex type containing map<>. {code} But in spark this scenario where the group by map column failed for this scenario where the map column is used in the select without any aggregation {code:java} scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show org.apache.spark.sql.AnalysisException: expression spark_catalog.default.complex2.`c1` cannot be used as a grouping expression because its data type map is not an orderable data type.; Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L] +- SubqueryAlias spark_catalog.default.complex2 +- HiveTableRelation [`default`.`complex2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], Partition Cols: []] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50) {code} There is need to add the this scenario where grouping expression can have map type if aggregated expression does not have the that map type reference. This helps in migrating the user from hive to Spark. After the code change {code:java} scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show +---+-++ | id| c1|count(1)| +---+-++ | 1| {1 -> u}| 1| | 2|{1 -> u, 2 -> uo}| 1| | 1|{1 -> u, 2 -> uo}| 1| +---+-++ {code} was: Add the support in Spark for having group by map datatype column for the scenario that works in Hive. In hive the below scenario works {code:java} describe extended complex2; OK id string c1 map Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), FieldSchema(name:c1, type:map, comment:null)], location:/user/hive/warehouse/complex2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}) select * from complex2; OK 1 {1:"u"} 2 {1:"u",2:"uo"} 1 {1:"u",2:"uo"} Time taken: 0.363 seconds, Fetched: 3 row(s) select id, c1, count(*) from complex2 group by id, c1; OK 1 {1:"u"} 1 1 {1:"u",2:"uo"} 1 2 {1:"u",2:"uo"} 1 Time taken: 1.621 seconds, Fetched: 3 row(s) failed when map type is present in aggregated expression select id, max(c1), count(*) from complex2 group by id, c1; FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or complex type containing map<>. {code} But in spark this scenario where the group by map column failed for this scenario where the map column is used in the select without any aggregation {code:java} scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show org.apache.spark.sql.AnalysisException: expression spark_catalog.default.complex2.`c1` cannot be used as a grouping expression because its data type map is not an orderable data type.; Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L] +- SubqueryAlias spark_catalog.default.complex2 +- HiveTableRelation [`default`.`complex2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], Partition Cols: []] at org.apache.spark.s
[jira] [Commented] (SPARK-34198) Add RocksDB StateStore implementation
[ https://issues.apache.org/jira/browse/SPARK-34198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395526#comment-17395526 ] Gengliang Wang commented on SPARK-34198: [~XuanYuan][~kabhwan][~viirya][~vkorukanti] Thanks for the great work. I will cut 3.2.0 RC1 next week. Please help add documentation for the feature and check if there is any remaining work before Spark 3.2.0. Thanks! > Add RocksDB StateStore implementation > - > > Key: SPARK-34198 > URL: https://issues.apache.org/jira/browse/SPARK-34198 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Currently Spark SS only has one built-in StateStore implementation > HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As > there are more and more streaming applications, some of them requires to use > large state in stateful operations such as streaming aggregation and join. > Several other major streaming frameworks already use RocksDB for state > management. So it is proven to be good choice for large state usage. But > Spark SS still lacks of a built-in state store for the requirement. > We would like to explore the possibility to add RocksDB-based StateStore into > Spark SS. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36041) Introduce the RocksDBStateStoreProvider in the programming guide
[ https://issues.apache.org/jira/browse/SPARK-36041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395525#comment-17395525 ] Gengliang Wang commented on SPARK-36041: [~XuanYuan][~kabhwan]What is the status of this one? > Introduce the RocksDBStateStoreProvider in the programming guide > > > Key: SPARK-36041 > URL: https://issues.apache.org/jira/browse/SPARK-36041 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive
[ https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395524#comment-17395524 ] Apache Spark commented on SPARK-36452: -- User 'SaurabhChawla100' has created a pull request for this issue: https://github.com/apache/spark/pull/33679 > Add the support in Spark for having group by map datatype column for the > scenario that works in Hive > > > Key: SPARK-36452 > URL: https://issues.apache.org/jira/browse/SPARK-36452 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Saurabh Chawla >Priority: Major > > Add the support in Spark for having group by map datatype column for the > scenario that works in Hive. > In hive the below scenario works > {code:java} > describe extended complex2; > OK > id string > c1 map > Detailed Table Information Table(tableName:complex2, dbName:default, > owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), > FieldSchema(name:c1, type:map, comment:null)], > location:/user/hive/warehouse/complex2, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=1}) > select * from complex2; > OK > 1 {1:"u"} > 2 {1:"u",2:"uo"} > 1 {1:"u",2:"uo"} > Time taken: 0.363 seconds, Fetched: 3 row(s) > select id, c1, count(*) from complex2 group by id, c1; > OK > 1 {1:"u"} 1 > 1 {1:"u",2:"uo"} 1 > 2 {1:"u",2:"uo"} 1 > Time taken: 1.621 seconds, Fetched: 3 row(s) > failed when map type is present in aggregated expression > select id, max(c1), count(*) from complex2 group by id, c1; > FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or > complex type containing map<>. > {code} > But in spark this scenario where the group by map column failed for this > scenario where the map column is used in the select without any aggregation > {code:java} > scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show > org.apache.spark.sql.AnalysisException: expression > spark_catalog.default.complex2.`c1` cannot be used as a grouping expression > because its data type map is not an orderable data type.; > Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L] > +- SubqueryAlias spark_catalog.default.complex2 > +- HiveTableRelation [`default`.`complex2`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50) > {code} > There is need to add the this scenario where grouping expression can have map > type if aggregated expression does not have the that map type reference. This > helps in migrating the user from hive to Spark. > After the code change > {code:java} > scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show > +---+-++ > > | id| c1|count(1)| > +---+-++ > | 1| {1 -> u}| 1| > | 2|{1 -> u, 2 -> uo}| 1| > | 1|{1 -> u, 2 -> uo}| 1| > +---+-++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive
[ https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36452: Assignee: Apache Spark > Add the support in Spark for having group by map datatype column for the > scenario that works in Hive > > > Key: SPARK-36452 > URL: https://issues.apache.org/jira/browse/SPARK-36452 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Saurabh Chawla >Assignee: Apache Spark >Priority: Major > > Add the support in Spark for having group by map datatype column for the > scenario that works in Hive. > In hive the below scenario works > {code:java} > describe extended complex2; > OK > id string > c1 map > Detailed Table Information Table(tableName:complex2, dbName:default, > owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), > FieldSchema(name:c1, type:map, comment:null)], > location:/user/hive/warehouse/complex2, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=1}) > select * from complex2; > OK > 1 {1:"u"} > 2 {1:"u",2:"uo"} > 1 {1:"u",2:"uo"} > Time taken: 0.363 seconds, Fetched: 3 row(s) > select id, c1, count(*) from complex2 group by id, c1; > OK > 1 {1:"u"} 1 > 1 {1:"u",2:"uo"} 1 > 2 {1:"u",2:"uo"} 1 > Time taken: 1.621 seconds, Fetched: 3 row(s) > failed when map type is present in aggregated expression > select id, max(c1), count(*) from complex2 group by id, c1; > FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or > complex type containing map<>. > {code} > But in spark this scenario where the group by map column failed for this > scenario where the map column is used in the select without any aggregation > {code:java} > scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show > org.apache.spark.sql.AnalysisException: expression > spark_catalog.default.complex2.`c1` cannot be used as a grouping expression > because its data type map is not an orderable data type.; > Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L] > +- SubqueryAlias spark_catalog.default.complex2 > +- HiveTableRelation [`default`.`complex2`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50) > {code} > There is need to add the this scenario where grouping expression can have map > type if aggregated expression does not have the that map type reference. This > helps in migrating the user from hive to Spark. > After the code change > {code:java} > scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show > +---+-++ > > | id| c1|count(1)| > +---+-++ > | 1| {1 -> u}| 1| > | 2|{1 -> u, 2 -> uo}| 1| > | 1|{1 -> u, 2 -> uo}| 1| > +---+-++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive
[ https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395523#comment-17395523 ] Apache Spark commented on SPARK-36452: -- User 'SaurabhChawla100' has created a pull request for this issue: https://github.com/apache/spark/pull/33679 > Add the support in Spark for having group by map datatype column for the > scenario that works in Hive > > > Key: SPARK-36452 > URL: https://issues.apache.org/jira/browse/SPARK-36452 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Saurabh Chawla >Priority: Major > > Add the support in Spark for having group by map datatype column for the > scenario that works in Hive. > In hive the below scenario works > {code:java} > describe extended complex2; > OK > id string > c1 map > Detailed Table Information Table(tableName:complex2, dbName:default, > owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), > FieldSchema(name:c1, type:map, comment:null)], > location:/user/hive/warehouse/complex2, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=1}) > select * from complex2; > OK > 1 {1:"u"} > 2 {1:"u",2:"uo"} > 1 {1:"u",2:"uo"} > Time taken: 0.363 seconds, Fetched: 3 row(s) > select id, c1, count(*) from complex2 group by id, c1; > OK > 1 {1:"u"} 1 > 1 {1:"u",2:"uo"} 1 > 2 {1:"u",2:"uo"} 1 > Time taken: 1.621 seconds, Fetched: 3 row(s) > failed when map type is present in aggregated expression > select id, max(c1), count(*) from complex2 group by id, c1; > FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or > complex type containing map<>. > {code} > But in spark this scenario where the group by map column failed for this > scenario where the map column is used in the select without any aggregation > {code:java} > scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show > org.apache.spark.sql.AnalysisException: expression > spark_catalog.default.complex2.`c1` cannot be used as a grouping expression > because its data type map is not an orderable data type.; > Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L] > +- SubqueryAlias spark_catalog.default.complex2 > +- HiveTableRelation [`default`.`complex2`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50) > {code} > There is need to add the this scenario where grouping expression can have map > type if aggregated expression does not have the that map type reference. This > helps in migrating the user from hive to Spark. > After the code change > {code:java} > scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show > +---+-++ > > | id| c1|count(1)| > +---+-++ > | 1| {1 -> u}| 1| > | 2|{1 -> u, 2 -> uo}| 1| > | 1|{1 -> u, 2 -> uo}| 1| > +---+-++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive
[ https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36452: Assignee: (was: Apache Spark) > Add the support in Spark for having group by map datatype column for the > scenario that works in Hive > > > Key: SPARK-36452 > URL: https://issues.apache.org/jira/browse/SPARK-36452 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Saurabh Chawla >Priority: Major > > Add the support in Spark for having group by map datatype column for the > scenario that works in Hive. > In hive the below scenario works > {code:java} > describe extended complex2; > OK > id string > c1 map > Detailed Table Information Table(tableName:complex2, dbName:default, > owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), > FieldSchema(name:c1, type:map, comment:null)], > location:/user/hive/warehouse/complex2, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=1}) > select * from complex2; > OK > 1 {1:"u"} > 2 {1:"u",2:"uo"} > 1 {1:"u",2:"uo"} > Time taken: 0.363 seconds, Fetched: 3 row(s) > select id, c1, count(*) from complex2 group by id, c1; > OK > 1 {1:"u"} 1 > 1 {1:"u",2:"uo"} 1 > 2 {1:"u",2:"uo"} 1 > Time taken: 1.621 seconds, Fetched: 3 row(s) > failed when map type is present in aggregated expression > select id, max(c1), count(*) from complex2 group by id, c1; > FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or > complex type containing map<>. > {code} > But in spark this scenario where the group by map column failed for this > scenario where the map column is used in the select without any aggregation > {code:java} > scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show > org.apache.spark.sql.AnalysisException: expression > spark_catalog.default.complex2.`c1` cannot be used as a grouping expression > because its data type map is not an orderable data type.; > Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L] > +- SubqueryAlias spark_catalog.default.complex2 > +- HiveTableRelation [`default`.`complex2`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50) > {code} > There is need to add the this scenario where grouping expression can have map > type if aggregated expression does not have the that map type reference. This > helps in migrating the user from hive to Spark. > After the code change > {code:java} > scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show > +---+-++ > > | id| c1|count(1)| > +---+-++ > | 1| {1 -> u}| 1| > | 2|{1 -> u, 2 -> uo}| 1| > | 1|{1 -> u, 2 -> uo}| 1| > +---+-++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive
[ https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saurabh Chawla updated SPARK-36452: --- Description: Add the support in Spark for having group by map datatype column for the scenario that works in Hive. In hive the below scenario works {code:java} describe extended complex2; OK id string c1 map Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), FieldSchema(name:c1, type:map, comment:null)], location:/user/hive/warehouse/complex2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}) select * from complex2; OK 1 {1:"u"} 2 {1:"u",2:"uo"} 1 {1:"u",2:"uo"} Time taken: 0.363 seconds, Fetched: 3 row(s) select id, c1, count(*) from complex2 group by id, c1; OK 1 {1:"u"} 1 1 {1:"u",2:"uo"} 1 2 {1:"u",2:"uo"} 1 Time taken: 1.621 seconds, Fetched: 3 row(s) failed when map type is present in aggregated expression select id, max(c1), count(*) from complex2 group by id, c1; FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or complex type containing map<>. {code} But in spark this scenario where the group by map column failed for this scenario where the map column is used in the select without any aggregation {code:java} scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show org.apache.spark.sql.AnalysisException: expression spark_catalog.default.complex2.`c1` cannot be used as a grouping expression because its data type map is not an orderable data type.; Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L] +- SubqueryAlias spark_catalog.default.complex2 +- HiveTableRelation [`default`.`complex2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], Partition Cols: []] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50) {code} There is need to add the this scenario where grouping expression can have map type if aggregated expression does not have the that map type reference. This helps in migrating the user from hive to Spark. After the code change {code:java} scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show +---+-++ | id| c1|count(1)| +---+-++ | 1| {1 -> u}| 1| | 2|{1 -> u, 2 -> uo}| 1| | 1|{1 -> u, 2 -> uo}| 1| +---+-++ {code} was: Add the support in Spark for having group by map datatype column for the scenario that works in Hive. In hive the below scenario works {code:java} describe extended complex2; OK id string c1 map Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), FieldSchema(name:c1, type:map, comment:null)], location:/user/hive/warehouse/complex2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}) select * from complex2; OK 1 {1:"u"} 2 {1:"u",2:"uo"} 1 {1:"u",2:"uo"} Time taken: 0.363 seconds, Fetched: 3 row(s) select id, c1, count(*) from complex2 group by id, c1; OK 1 {1:"u"} 1 1 {1:"u",2:"uo"} 1 2 {1:"u",2:"uo"} 1 Time taken: 1.621 seconds, Fetched: 3 row(s) failed when map type is present in aggregated expression select id, max(c1), count(*) from complex2 group by id, c1; FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or complex type containing map<>. {code} But in spark this scenario where the group by map column failed for this scenario where the map column is used in the select without any aggregation {code:java} scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show org.apache.spark.sql.AnalysisException: expression spark_catalog.default.complex2.`c1` cannot be used as a grouping expression because its data type map is not an orderable data type.; Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L] +- SubqueryAlias spark_catalog.default.complex2 +- HiveTableRelation [`default`.`complex2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], Partition Cols: []] at org.apache.spark.sql.catalyst.analysis.Che
[jira] [Updated] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive
[ https://issues.apache.org/jira/browse/SPARK-36452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saurabh Chawla updated SPARK-36452: --- Description: Add the support in Spark for having group by map datatype column for the scenario that works in Hive. In hive the below scenario works {code:java} describe extended complex2; OK id string c1 map Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), FieldSchema(name:c1, type:map, comment:null)], location:/user/hive/warehouse/complex2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}) select * from complex2; OK 1 {1:"u"} 2 {1:"u",2:"uo"} 1 {1:"u",2:"uo"} Time taken: 0.363 seconds, Fetched: 3 row(s) select id, c1, count(*) from complex2 group by id, c1; OK 1 {1:"u"} 1 1 {1:"u",2:"uo"} 1 2 {1:"u",2:"uo"} 1 Time taken: 1.621 seconds, Fetched: 3 row(s) failed when map type is present in aggregated expression select id, max(c1), count(*) from complex2 group by id, c1; FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or complex type containing map<>. {code} But in spark this scenario where the group by map column failed for this scenario where the map column is used in the select without any aggregation {code:java} scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show org.apache.spark.sql.AnalysisException: expression spark_catalog.default.complex2.`c1` cannot be used as a grouping expression because its data type map is not an orderable data type.; Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L] +- SubqueryAlias spark_catalog.default.complex2 +- HiveTableRelation [`default`.`complex2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], Partition Cols: []] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50) {code} There is need to add the this scenario where grouping expression can have map type if aggregated expression does not have the that map type reference. This helps in migrating the user from hive to Spark. After the code change {code:java} scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show +---+-++ | id| c1|count(1)| +---+-++ | 1| {1 -> u}| 1| | 2|{1 -> u, 2 -> uo}| 1| | 1|{1 -> u, 2 -> uo}| 1| +---+-++ {code} was: Add the support in Spark for having group by map datatype column for the scenario that works in Hive. In hive the below scenario works {code:java} describe extended complex2; OK id string c1 map Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), FieldSchema(name:c1, type:map, comment:null)], location:/user/hive/warehouse/complex2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}) select * from complex2; OK 1 {1:"u"} 2 {1:"u",2:"uo"} 1 {1:"u",2:"uo"} Time taken: 0.363 seconds, Fetched: 3 row(s) select id, c1, count(*) from complex2 group by id, c1; OK 1 {1:"u"} 1 1 {1:"u",2:"uo"} 1 2 {1:"u",2:"uo"} 1 Time taken: 1.621 seconds, Fetched: 3 row(s) failed when map type is present in aggregated expression select id, max(c1), count(*) from complex2 group by id, c1; FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or complex type containing map<>. {code} But in spark this scenario where the group by map column failed for this scenario where the map column is used in the select without any aggregation {code:java} scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show org.apache.spark.sql.AnalysisException: expression spark_catalog.default.complex2.`c1` cannot be used as a grouping expression because its data type map is not an orderable data type.; Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L] +- SubqueryAlias spark_catalog.default.complex2 +- HiveTableRelation [`default`.`complex2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], Partition Cols: []] at o
[jira] [Created] (SPARK-36452) Add the support in Spark for having group by map datatype column for the scenario that works in Hive
Saurabh Chawla created SPARK-36452: -- Summary: Add the support in Spark for having group by map datatype column for the scenario that works in Hive Key: SPARK-36452 URL: https://issues.apache.org/jira/browse/SPARK-36452 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.2, 3.0.3, 3.2.0 Reporter: Saurabh Chawla Add the support in Spark for having group by map datatype column for the scenario that works in Hive. In hive the below scenario works {code:java} describe extended complex2; OK id string c1 map Detailed Table Information Table(tableName:complex2, dbName:default, owner:abc, createTime:1627994412, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), FieldSchema(name:c1, type:map, comment:null)], location:/user/hive/warehouse/complex2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}) select * from complex2; OK 1 {1:"u"} 2 {1:"u",2:"uo"} 1 {1:"u",2:"uo"} Time taken: 0.363 seconds, Fetched: 3 row(s) select id, c1, count(*) from complex2 group by id, c1; OK 1 {1:"u"} 1 1 {1:"u",2:"uo"} 1 2 {1:"u",2:"uo"} 1 Time taken: 1.621 seconds, Fetched: 3 row(s) failed when map type is present in aggregated expression select id, max(c1), count(*) from complex2 group by id, c1; FAILED: UDFArgumentTypeException Cannot support comparison of map<> type or complex type containing map<>. {code} But in spark this scenario where the group by map column failed for this scenario where the map column is used in the select without any aggregation {code:java} scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show org.apache.spark.sql.AnalysisException: expression spark_catalog.default.complex2.`c1` cannot be used as a grouping expression because its data type map is not an orderable data type.; Aggregate [id#1, c1#2], [id#1, c1#2, count(1) AS count(1)#3L] +- SubqueryAlias spark_catalog.default.complex2 +- HiveTableRelation [`default`.`complex2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#1, c1#2], Partition Cols: []] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50) {code} There is need to add the this scenario where grouping expression can have map type if aggregated expression does not have the that map type reference. This helps in migrating the user from hive to Spark. After the code change {code:java} scala> spark.sql("select id,c1, count(*) from complex2 group by id, c1").show +---+-++ | id| c1|count(1)| +---+-++ | 1| {1 -> u}| 1| | 2|{1 -> u, 2 -> uo}| 1| | 1|{1 -> u, 2 -> uo}| 1| +---+-++ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36451) Ivy skips looking for source and doc pom
[ https://issues.apache.org/jira/browse/SPARK-36451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36451: Assignee: (was: Apache Spark) > Ivy skips looking for source and doc pom > > > Key: SPARK-36451 > URL: https://issues.apache.org/jira/browse/SPARK-36451 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 3.2.0 >Reporter: dzcxzl >Priority: Trivial > > Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the > source and doc pom, but the remote repo will still be queried at present. > > org.apache.ivy.plugins.parser.m2.PomModuleDescriptorParser#addSourcesAndJavadocArtifactsIfPresent > {code:java} > boolean sourcesLookup = !"false" > .equals(ivySettings.getVariable("ivy.maven.lookup.sources")); > boolean javadocLookup = !"false" > .equals(ivySettings.getVariable("ivy.maven.lookup.javadoc")); > if (!sourcesLookup && !javadocLookup) { > Message.debug("Sources and javadocs lookup disabled"); > return; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36451) Ivy skips looking for source and doc pom
[ https://issues.apache.org/jira/browse/SPARK-36451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36451: Assignee: Apache Spark > Ivy skips looking for source and doc pom > > > Key: SPARK-36451 > URL: https://issues.apache.org/jira/browse/SPARK-36451 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 3.2.0 >Reporter: dzcxzl >Assignee: Apache Spark >Priority: Trivial > > Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the > source and doc pom, but the remote repo will still be queried at present. > > org.apache.ivy.plugins.parser.m2.PomModuleDescriptorParser#addSourcesAndJavadocArtifactsIfPresent > {code:java} > boolean sourcesLookup = !"false" > .equals(ivySettings.getVariable("ivy.maven.lookup.sources")); > boolean javadocLookup = !"false" > .equals(ivySettings.getVariable("ivy.maven.lookup.javadoc")); > if (!sourcesLookup && !javadocLookup) { > Message.debug("Sources and javadocs lookup disabled"); > return; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36451) Ivy skips looking for source and doc pom
[ https://issues.apache.org/jira/browse/SPARK-36451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395516#comment-17395516 ] Apache Spark commented on SPARK-36451: -- User 'cxzl25' has created a pull request for this issue: https://github.com/apache/spark/pull/33678 > Ivy skips looking for source and doc pom > > > Key: SPARK-36451 > URL: https://issues.apache.org/jira/browse/SPARK-36451 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 3.2.0 >Reporter: dzcxzl >Priority: Trivial > > Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the > source and doc pom, but the remote repo will still be queried at present. > > org.apache.ivy.plugins.parser.m2.PomModuleDescriptorParser#addSourcesAndJavadocArtifactsIfPresent > {code:java} > boolean sourcesLookup = !"false" > .equals(ivySettings.getVariable("ivy.maven.lookup.sources")); > boolean javadocLookup = !"false" > .equals(ivySettings.getVariable("ivy.maven.lookup.javadoc")); > if (!sourcesLookup && !javadocLookup) { > Message.debug("Sources and javadocs lookup disabled"); > return; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36451) Ivy skips looking for source and doc pom
[ https://issues.apache.org/jira/browse/SPARK-36451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated SPARK-36451: --- Description: Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the source and doc pom, but the remote repo will still be queried at present. org.apache.ivy.plugins.parser.m2.PomModuleDescriptorParser#addSourcesAndJavadocArtifactsIfPresent {code:java} boolean sourcesLookup = !"false" .equals(ivySettings.getVariable("ivy.maven.lookup.sources")); boolean javadocLookup = !"false" .equals(ivySettings.getVariable("ivy.maven.lookup.javadoc")); if (!sourcesLookup && !javadocLookup) { Message.debug("Sources and javadocs lookup disabled"); return; } {code} was:Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the source and doc pom, but the remote repo will still be queried at present. > Ivy skips looking for source and doc pom > > > Key: SPARK-36451 > URL: https://issues.apache.org/jira/browse/SPARK-36451 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 3.2.0 >Reporter: dzcxzl >Priority: Trivial > > Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the > source and doc pom, but the remote repo will still be queried at present. > > org.apache.ivy.plugins.parser.m2.PomModuleDescriptorParser#addSourcesAndJavadocArtifactsIfPresent > {code:java} > boolean sourcesLookup = !"false" > .equals(ivySettings.getVariable("ivy.maven.lookup.sources")); > boolean javadocLookup = !"false" > .equals(ivySettings.getVariable("ivy.maven.lookup.javadoc")); > if (!sourcesLookup && !javadocLookup) { > Message.debug("Sources and javadocs lookup disabled"); > return; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36451) Ivy skips looking for source and doc pom
dzcxzl created SPARK-36451: -- Summary: Ivy skips looking for source and doc pom Key: SPARK-36451 URL: https://issues.apache.org/jira/browse/SPARK-36451 Project: Spark Issue Type: Improvement Components: Spark Submit Affects Versions: 3.2.0 Reporter: dzcxzl Because SPARK-35863 Upgrade Ivy to 2.5.0, it supports skip searching the source and doc pom, but the remote repo will still be queried at present. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org