[jira] [Resolved] (SPARK-46194) Clean up the TODO comments left in SPARK-33775
[ https://issues.apache.org/jira/browse/SPARK-46194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46194. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44103 [https://github.com/apache/spark/pull/44103] > Clean up the TODO comments left in SPARK-33775 > -- > > Key: SPARK-46194 > URL: https://issues.apache.org/jira/browse/SPARK-46194 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46194) Clean up the TODO comments left in SPARK-33775
[ https://issues.apache.org/jira/browse/SPARK-46194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46194: - Assignee: Yang Jie > Clean up the TODO comments left in SPARK-33775 > -- > > Key: SPARK-46194 > URL: https://issues.apache.org/jira/browse/SPARK-46194 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46198) Unexpected Shuffle Read when using cached DataFrame
[ https://issues.apache.org/jira/browse/SPARK-46198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitaliy Savkin updated SPARK-46198: --- Description: When a computation is base on a cached data frames, I expect to see no Shuffle Reads, but it happens under certain circumstances. *Reproduction* {code:scala} val ctx: SQLContext = // init context val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce" def populateAndRead(tag: String): DataFrame = { val path = s"$root/numbers_$tag" // import ctx.implicits._ // import org.apache.spark.sql.functions.lit // (0 to 10 * 1000 * 1000) //.toDF("id") //.withColumn(tag, lit(tag.toUpperCase)) //.repartition(100) //.write //.option("header", "true") //.mode("ignore") //.csv(path) ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag + "_id") } val dfa = populateAndRead("a1") val dfb = populateAndRead("b1") val res = dfa.join(dfb, dfa("a1_id") === dfb("b1_id")) .unionByName(dfa.join(dfb, dfa("a1") === dfb("b1"))) .cache() println(res.count()) res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers") {code} Relevant configs {code:scala} spark.executor.instances=10 spark.executor.cores=7 spark.executor.memory=40g spark.executor.memoryOverhead=5g spark.shuffle.service.enabled=true spark.sql.adaptive.enabled=false spark.sql.autoBroadcastJoinThreshold=-1 {code} Spark Plan says that cache is used {code:scala} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (27) +- Coalesce (26) +- InMemoryTableScan (1) +- InMemoryRelation (2) +- Union (25) :- * SortMergeJoin Inner (13) : :- * Sort (7) : : +- Exchange (6) : : +- * Project (5) : : +- * Filter (4) : : +- Scan csv (3) : +- * Sort (12) : +- Exchange (11) : +- * Project (10) : +- * Filter (9) : +- Scan csv (8) +- * SortMergeJoin Inner (24) :- * Sort (18) : +- Exchange (17) : +- * Project (16) : +- * Filter (15) : +- Scan csv (14) +- * Sort (23) +- Exchange (22) +- * Project (21) +- * Filter (20) +- Scan csv (19) {code} But when running on YARN, the csv job has shuffle reads. !shuffle.png! *Additional info* - I was unable to reproduce it with local Spark. - If {{.withColumnRenamed("id", tag + "_id")}} is dropped and the join conditions are changed to just {{{}"id"{}}}, the issue disappears! - This behaviour is stable - it's not a result of failed instances. *Production impact* Without cache saving data in production takes much longer (30 seconds vs 18 seconds). To avoid shuffle reads, we had to add a {{repartition}} step before {{cache}} as a workaround, which reduced time from 18 seconds to 10. was: When a computation is base on a cached data frames, I expect to see no Shuffle Reads, but it happens under certain circumstances. *Reproduction* {code:scala} val ctx: SQLContext = // init context val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce" def populateAndRead(tag: String): DataFrame = { val path = s"$root/numbers_$tag" // import ctx.implicits._ // import org.apache.spark.sql.functions.lit // (0 to 10 * 1000 * 1000) //.toDF("id") //.withColumn(tag, lit(tag.toUpperCase)) //.repartition(100) //.write //.option("header", "true") //.mode("ignore") //.csv(path) ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag + "_id") } val dfa = populateAndRead("a1") val dfb = populateAndRead("b1") val res = dfa.join(dfb, dfa("a1_id") === dfb("b1_id")) .unionByName(dfa.join(dfb, dfa("a1") === dfb("b1"))) .cache() println(res.count()) res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers") {code} Relevant configs {code:scala} spark.executor.instances=10 spark.executor.cores=7 spark.executor.memory=40g spark.executor.memoryOverhead=5g spark.shuffle.service.enabled=true spark.sql.adaptive.enabled=false spark.sql.autoBroadcastJoinThreshold=-1 {code} Spark Plan says that cache is used {code:scala} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (27) +- Coalesce (26) +- InMemoryTableScan (1) +- InMemoryRelation (2) +- Union (25) :- * SortMergeJoin Inner (13)
[jira] [Created] (SPARK-46198) Unexpected Shuffle Read when using cached DataFrame
Vitaliy Savkin created SPARK-46198: -- Summary: Unexpected Shuffle Read when using cached DataFrame Key: SPARK-46198 URL: https://issues.apache.org/jira/browse/SPARK-46198 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: Vitaliy Savkin Attachments: shuffle.png When a computation is base on a cached data frames, I expect to see no Shuffle Reads, but it happens under certain circumstances. *Reproduction* {code:scala} val ctx: SQLContext = // init context val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce" def populateAndRead(tag: String): DataFrame = { val path = s"$root/numbers_$tag" // import ctx.implicits._ // import org.apache.spark.sql.functions.lit // (0 to 10 * 1000 * 1000) //.toDF("id") //.withColumn(tag, lit(tag.toUpperCase)) //.repartition(100) //.write //.option("header", "true") //.mode("ignore") //.csv(path) ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag + "_id") } val dfa = populateAndRead("a1") val dfb = populateAndRead("b1") val res = dfa.join(dfb, dfa("a1_id") === dfb("b1_id")) .unionByName(dfa.join(dfb, dfa("a1") === dfb("b1"))) .cache() println(res.count()) res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers") {code} Relevant configs {code:scala} spark.executor.instances=10 spark.executor.cores=7 spark.executor.memory=40g spark.executor.memoryOverhead=5g spark.shuffle.service.enabled=true spark.sql.adaptive.enabled=false spark.sql.autoBroadcastJoinThreshold=-1 {code} Spark Plan says that cache is used {code:scala} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (27) +- Coalesce (26) +- InMemoryTableScan (1) +- InMemoryRelation (2) +- Union (25) :- * SortMergeJoin Inner (13) : :- * Sort (7) : : +- Exchange (6) : : +- * Project (5) : : +- * Filter (4) : : +- Scan csv (3) : +- * Sort (12) : +- Exchange (11) : +- * Project (10) : +- * Filter (9) : +- Scan csv (8) +- * SortMergeJoin Inner (24) :- * Sort (18) : +- Exchange (17) : +- * Project (16) : +- * Filter (15) : +- Scan csv (14) +- * Sort (23) +- Exchange (22) +- * Project (21) +- * Filter (20) +- Scan csv (19) {code} But when running on YARN, the csv job has shuffle reads. !image-2023-12-01-09-27-39-463.png! *Additional info* - I was unable to reproduce it with local Spark. - If {{.withColumnRenamed("id", tag + "_id")}} is dropped and the join conditions are changed to just {{{}"id"{}}}, the issue disappears! - This behaviour is stable - it's not a result of failed instances. *Production impact* Without cache saving data in production takes much longer (30 seconds vs 18 seconds). To avoid shuffle reads, we had to add a {{repartition}} step before {{cache}} as a workaround, which reduced time from 18 seconds to 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46198) Unexpected Shuffle Read when using cached DataFrame
[ https://issues.apache.org/jira/browse/SPARK-46198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitaliy Savkin updated SPARK-46198: --- Attachment: shuffle.png > Unexpected Shuffle Read when using cached DataFrame > --- > > Key: SPARK-46198 > URL: https://issues.apache.org/jira/browse/SPARK-46198 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Vitaliy Savkin >Priority: Major > Attachments: shuffle.png > > > When a computation is base on a cached data frames, I expect to see no > Shuffle Reads, but it happens under certain circumstances. > *Reproduction* > {code:scala} > val ctx: SQLContext = // init context > val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce" > def populateAndRead(tag: String): DataFrame = { > val path = s"$root/numbers_$tag" > // import ctx.implicits._ > // import org.apache.spark.sql.functions.lit > // (0 to 10 * 1000 * 1000) > //.toDF("id") > //.withColumn(tag, lit(tag.toUpperCase)) > //.repartition(100) > //.write > //.option("header", "true") > //.mode("ignore") > //.csv(path) > ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag > + "_id") > } > val dfa = populateAndRead("a1") > val dfb = populateAndRead("b1") > val res = > dfa.join(dfb, dfa("a1_id") === dfb("b1_id")) > .unionByName(dfa.join(dfb, dfa("a1") === dfb("b1"))) > .cache() > println(res.count()) > res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers") > {code} > Relevant configs > {code:scala} > spark.executor.instances=10 > spark.executor.cores=7 > spark.executor.memory=40g > spark.executor.memoryOverhead=5g > spark.shuffle.service.enabled=true > spark.sql.adaptive.enabled=false > spark.sql.autoBroadcastJoinThreshold=-1 > {code} > Spark Plan says that cache is used > {code:scala} > == Physical Plan == > Execute InsertIntoHadoopFsRelationCommand (27) > +- Coalesce (26) > +- InMemoryTableScan (1) > +- InMemoryRelation (2) > +- Union (25) > :- * SortMergeJoin Inner (13) > : :- * Sort (7) > : : +- Exchange (6) > : : +- * Project (5) > : : +- * Filter (4) > : : +- Scan csv (3) > : +- * Sort (12) > : +- Exchange (11) > : +- * Project (10) > : +- * Filter (9) > : +- Scan csv (8) > +- * SortMergeJoin Inner (24) > :- * Sort (18) > : +- Exchange (17) > : +- * Project (16) > : +- * Filter (15) > : +- Scan csv (14) > +- * Sort (23) > +- Exchange (22) > +- * Project (21) > +- * Filter (20) > +- Scan csv (19) > {code} > But when running on YARN, the csv job has shuffle reads. > !image-2023-12-01-09-27-39-463.png! > *Additional info* > - I was unable to reproduce it with local Spark. > - If {{.withColumnRenamed("id", tag + "_id")}} is dropped and the join > conditions are changed to just {{{}"id"{}}}, the issue disappears! > - This behaviour is stable - it's not a result of failed instances. > *Production impact* > Without cache saving data in production takes much longer (30 seconds vs 18 > seconds). To avoid shuffle reads, we had to add a {{repartition}} step before > {{cache}} as a workaround, which reduced time from 18 seconds to 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46196) Add missing function descriptions
[ https://issues.apache.org/jira/browse/SPARK-46196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46196: --- Labels: pull-request-available (was: ) > Add missing function descriptions > - > > Key: SPARK-46196 > URL: https://issues.apache.org/jira/browse/SPARK-46196 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46196) Add missing function descriptions
Ruifeng Zheng created SPARK-46196: - Summary: Add missing function descriptions Key: SPARK-46196 URL: https://issues.apache.org/jira/browse/SPARK-46196 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33775) Suppress unimportant compilation warnings in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791920#comment-17791920 ] Snoot.io commented on SPARK-33775: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/44103 > Suppress unimportant compilation warnings in Scala 2.13 > > > Key: SPARK-33775 > URL: https://issues.apache.org/jira/browse/SPARK-33775 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.1.0 > > > Too many compilation warnings in Scala 2.13, add some `-Wconf:msg=regex` > rules to suppress unimportants. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46192) failed to insert the table using the default value of union
[ https://issues.apache.org/jira/browse/SPARK-46192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zengxl updated SPARK-46192: --- Description: Obtain the following tables and data {code:java} create table test_spark(k string default null,v int default null) stored as orc; create table test_spark_1(k string default null,v int default null) stored as orc; insert into table test_spark_1 values('k1',1),('k2',2),('k3',3); create table test_spark_2(k string default null,v int default null) stored as orc; insert into table test_spark_2 values('k3',3),('k4',4),('k5',5); {code} Execute the following SQL {code:java} insert into table test_spark (k) select k from test_spark_1 union select k from test_spark_2 {code} exception: {code:java} 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is CatalogAndIdentifier 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is CatalogAndIdentifier 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is CatalogAndIdentifier 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.userSpecifiedCols.size is 1 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.userSpecifiedCols.size is 1 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 ,resolved :1 , i.query 1 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: here is ResolveUserSpecifiedColumns tableOutoyt: 2---nameToQueryExpr : 1Error in query: `default`.`test_spark` requires that the data to be inserted have the same number of columns as the target table: target table has 2 column(s) but the inserted data has 1 column(s), including 0 partition column(s) having constant value(s). {code} was: Obtain the following tables and data {code:java} create table test_spark(k string default null,v int default null) stored as orc; create table test_spark_1(k string default null,v int default null) stored as orc; insert into table test_spark_1 values('k1',1),('k2',2),('k3',3); create table test_spark_2(k string default null,v int default null) stored as orc; insert into table test_spark_2 values('k3',3),('k4',4),('k5',5); {code} Execute the following SQL {code:java} insert into table test_spark (k) select k from test_spark_1 union select k from test_spark_2 {code} exception: {code:java} 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is CatalogAndIdentifier23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is CatalogAndIdentifier23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is CatalogAndIdentifier23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.userSpecifiedCols.size is 123/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.userSpecifiedCols.size is 123/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 ,resolved :1 , i.query 123/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: here is ResolveUserSpecifiedColumns tableOutoyt: 2---nameToQueryExpr : 1Error in query: `default`.`test_spark` requires that the data to be inserted have the same number of columns as the target table: target table has 2 column(s) but the inserted data has 1 column(s), including 0 partition column(s) having constant value(s). {code} > failed to insert the table using the default value of union > --- > > Key: SPARK-46192 > URL: https://issues.apache.org/jira/browse/SPARK-46192 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.4.1 >Reporter: zengxl >Priority: Major > > > Obtain the following tables and data > {code:java} > create table test_spark(k string default null,v int default null) stored as > orc; > create table test_spark_1(k string default null,v int default null) stored as > orc; > insert into table test_spark_1 values('k1',1),('k2',2),('k3',3); > create table test_spark_2(k string default null,v int default null) stored as > orc; > insert into table test_spark_2 values('k3',3),('k4',4),('k5',5); > {code} > Execute the following SQL > {code:java} > insert into table test_spark (k) > select k from test_spark_1 > union > select k from test_spark_2 > {code} > exception: > {code:java} > 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is > CatalogAndIdentifier > 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is > CatalogAndIdentifier > 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is > CatalogAndIdentifier > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: > i.userSpecifiedCols.size is 1 > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: > i.userSpecifiedCols.size is 1 > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 > ,resolved :1 , i.query 1 > 23/12/01 10:44:26 INFO
[jira] [Created] (SPARK-46195) Supports parse multiple sql statements
melin created SPARK-46195: - Summary: Supports parse multiple sql statements Key: SPARK-46195 URL: https://issues.apache.org/jira/browse/SPARK-46195 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: melin In the SqlBaseParser.g4 file, add the following code to support the parsing of multiple sql. select * from (select * from test), which resolves into two statements. Need to add alias {code:java} sqlStatements : singleStatement* EOF ; singleStatement : statement SEMICOLON? ; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46194) Clean up the TODO comments left in SPARK-33775
[ https://issues.apache.org/jira/browse/SPARK-46194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46194: --- Labels: pull-request-available (was: ) > Clean up the TODO comments left in SPARK-33775 > -- > > Key: SPARK-46194 > URL: https://issues.apache.org/jira/browse/SPARK-46194 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46194) Clean up the TODO comments left in SPARK-33775
[ https://issues.apache.org/jira/browse/SPARK-46194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-46194: - Summary: Clean up the TODO comments left in SPARK-33775 (was: Remove completed TODO(SPARK-33805) ) > Clean up the TODO comments left in SPARK-33775 > -- > > Key: SPARK-46194 > URL: https://issues.apache.org/jira/browse/SPARK-46194 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46194) Remove completed TODO(SPARK-33805)
Yang Jie created SPARK-46194: Summary: Remove completed TODO(SPARK-33805) Key: SPARK-46194 URL: https://issues.apache.org/jira/browse/SPARK-46194 Project: Spark Issue Type: Task Components: Build Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33805) Eliminate deprecated usage since Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-33805. -- Resolution: Fixed > Eliminate deprecated usage since Scala 2.13 > --- > > Key: SPARK-33805 > URL: https://issues.apache.org/jira/browse/SPARK-33805 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Minor > > SPARK-33775 Suppress compilation warnings about method, value, type, object, > trait, inheritance class deprecated usage since Scala 2.13 in SparkBuild.scala > > We should fix them step by step, and then remove the suppression rules. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46193) Add PersistenceEngineBenchmark
[ https://issues.apache.org/jira/browse/SPARK-46193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46193: - Assignee: Dongjoon Hyun > Add PersistenceEngineBenchmark > -- > > Key: SPARK-46193 > URL: https://issues.apache.org/jira/browse/SPARK-46193 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45629) Fix `Implicit definition should have explicit type`
[ https://issues.apache.org/jira/browse/SPARK-45629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45629. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43526 [https://github.com/apache/spark/pull/43526] > Fix `Implicit definition should have explicit type` > --- > > Key: SPARK-45629 > URL: https://issues.apache.org/jira/browse/SPARK-45629 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: tangjiafu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala:343:16: > Implicit definition should have explicit type (inferred > org.json4s.DefaultFormats.type) [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=other-implicit-type, > site=org.apache.spark.deploy.TestMasterInfo.formats > [error] implicit val formats = org.json4s.DefaultFormats > [error] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46193) Add PersistenceEngineBenchmark
[ https://issues.apache.org/jira/browse/SPARK-46193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46193: --- Labels: pull-request-available (was: ) > Add PersistenceEngineBenchmark > -- > > Key: SPARK-46193 > URL: https://issues.apache.org/jira/browse/SPARK-46193 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46193) Add PersistenceEngineBenchmark
Dongjoon Hyun created SPARK-46193: - Summary: Add PersistenceEngineBenchmark Key: SPARK-46193 URL: https://issues.apache.org/jira/browse/SPARK-46193 Project: Spark Issue Type: Sub-task Components: Spark Core, Tests Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46192) failed to insert the table using the default value of union
zengxl created SPARK-46192: -- Summary: failed to insert the table using the default value of union Key: SPARK-46192 URL: https://issues.apache.org/jira/browse/SPARK-46192 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.1, 3.4.0 Reporter: zengxl Obtain the following tables and data {code:java} create table test_spark(k string default null,v int default null) stored as orc; create table test_spark_1(k string default null,v int default null) stored as orc; insert into table test_spark_1 values('k1',1),('k2',2),('k3',3); create table test_spark_2(k string default null,v int default null) stored as orc; insert into table test_spark_2 values('k3',3),('k4',4),('k5',5); {code} Execute the following SQL {code:java} insert into table test_spark (k) select k from test_spark_1 union select k from test_spark_2 {code} exception: {code:java} 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is CatalogAndIdentifier23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is CatalogAndIdentifier23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is CatalogAndIdentifier23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.userSpecifiedCols.size is 123/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.userSpecifiedCols.size is 123/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 ,resolved :1 , i.query 123/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: here is ResolveUserSpecifiedColumns tableOutoyt: 2---nameToQueryExpr : 1Error in query: `default`.`test_spark` requires that the data to be inserted have the same number of columns as the target table: target table has 2 column(s) but the inserted data has 1 column(s), including 0 partition column(s) having constant value(s). {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46189) Various Pandas functions fail in interpreted mode
[ https://issues.apache.org/jira/browse/SPARK-46189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-46189: -- Fix Version/s: 3.4.3 > Various Pandas functions fail in interpreted mode > - > > Key: SPARK-46189 > URL: https://issues.apache.org/jira/browse/SPARK-46189 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.1, 3.4.3 > > > Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and > {{stddev}}) fail with an unboxing-related exception when run in interpreted > mode. > Here are some reproduction cases for pyspark interactive mode: > {noformat} > spark.sql("set spark.sql.codegen.wholeStage=false") > spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > import numpy as np > import pandas as pd > import pyspark.pandas as ps > pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a") > psser = ps.from_pandas(pser) > # each of the following actions gets an unboxing error > psser.kurt() > psser.var() > psser.skew() > # set up for covariance test > pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"]) > psdf = ps.from_pandas(pdf) > # this gets an unboxing error > psdf.cov() > # set up for stddev resr > from pyspark.pandas.spark import functions as SF > from pyspark.sql.functions import col > from pyspark.sql import Row > df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), > Row(a=8)]) > # this gets an unboxing error > df.select(SF.stddev(col("a"), 1)).collect() > {noformat} > Exception from the first case ({{psser.kurt()}}) is > {noformat} > java.lang.ClassCastException: class java.lang.Integer cannot be cast to class > java.lang.Double (java.lang.Integer and java.lang.Double are in module > java.base of loader 'bootstrap') > at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112) > at > org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184) > at scala.math.Ordering.lt(Ordering.scala:98) > at scala.math.Ordering.lt$(Ordering.scala:98) > at > org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184) > at > org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46189) Various Pandas functions fail in interpreted mode
[ https://issues.apache.org/jira/browse/SPARK-46189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-46189: - Assignee: Bruce Robbins > Various Pandas functions fail in interpreted mode > - > > Key: SPARK-46189 > URL: https://issues.apache.org/jira/browse/SPARK-46189 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: pull-request-available > > Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and > {{stddev}}) fail with an unboxing-related exception when run in interpreted > mode. > Here are some reproduction cases for pyspark interactive mode: > {noformat} > spark.sql("set spark.sql.codegen.wholeStage=false") > spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > import numpy as np > import pandas as pd > import pyspark.pandas as ps > pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a") > psser = ps.from_pandas(pser) > # each of the following actions gets an unboxing error > psser.kurt() > psser.var() > psser.skew() > # set up for covariance test > pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"]) > psdf = ps.from_pandas(pdf) > # this gets an unboxing error > psdf.cov() > # set up for stddev resr > from pyspark.pandas.spark import functions as SF > from pyspark.sql.functions import col > from pyspark.sql import Row > df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), > Row(a=8)]) > # this gets an unboxing error > df.select(SF.stddev(col("a"), 1)).collect() > {noformat} > Exception from the first case ({{psser.kurt()}}) is > {noformat} > java.lang.ClassCastException: class java.lang.Integer cannot be cast to class > java.lang.Double (java.lang.Integer and java.lang.Double are in module > java.base of loader 'bootstrap') > at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112) > at > org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184) > at scala.math.Ordering.lt(Ordering.scala:98) > at scala.math.Ordering.lt$(Ordering.scala:98) > at > org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184) > at > org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46189) Various Pandas functions fail in interpreted mode
[ https://issues.apache.org/jira/browse/SPARK-46189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-46189. --- Fix Version/s: 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 44099 [https://github.com/apache/spark/pull/44099] > Various Pandas functions fail in interpreted mode > - > > Key: SPARK-46189 > URL: https://issues.apache.org/jira/browse/SPARK-46189 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: pull-request-available > Fix For: 3.5.1, 4.0.0 > > > Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and > {{stddev}}) fail with an unboxing-related exception when run in interpreted > mode. > Here are some reproduction cases for pyspark interactive mode: > {noformat} > spark.sql("set spark.sql.codegen.wholeStage=false") > spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > import numpy as np > import pandas as pd > import pyspark.pandas as ps > pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a") > psser = ps.from_pandas(pser) > # each of the following actions gets an unboxing error > psser.kurt() > psser.var() > psser.skew() > # set up for covariance test > pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"]) > psdf = ps.from_pandas(pdf) > # this gets an unboxing error > psdf.cov() > # set up for stddev resr > from pyspark.pandas.spark import functions as SF > from pyspark.sql.functions import col > from pyspark.sql import Row > df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), > Row(a=8)]) > # this gets an unboxing error > df.select(SF.stddev(col("a"), 1)).collect() > {noformat} > Exception from the first case ({{psser.kurt()}}) is > {noformat} > java.lang.ClassCastException: class java.lang.Integer cannot be cast to class > java.lang.Double (java.lang.Integer and java.lang.Double are in module > java.base of loader 'bootstrap') > at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112) > at > org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184) > at scala.math.Ordering.lt(Ordering.scala:98) > at scala.math.Ordering.lt$(Ordering.scala:98) > at > org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184) > at > org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46191) Improve `FileSystemPersistenceEngine.persist` error message in case of the existing file
[ https://issues.apache.org/jira/browse/SPARK-46191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46191. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44100 [https://github.com/apache/spark/pull/44100] > Improve `FileSystemPersistenceEngine.persist` error message in case of the > existing file > > > Key: SPARK-46191 > URL: https://issues.apache.org/jira/browse/SPARK-46191 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46135) Fix table format error in ipynb docs
[ https://issues.apache.org/jira/browse/SPARK-46135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46135: Assignee: BingKun Pan > Fix table format error in ipynb docs > > > Key: SPARK-46135 > URL: https://issues.apache.org/jira/browse/SPARK-46135 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46135) Fix table format error in ipynb docs
[ https://issues.apache.org/jira/browse/SPARK-46135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46135. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44049 [https://github.com/apache/spark/pull/44049] > Fix table format error in ipynb docs > > > Key: SPARK-46135 > URL: https://issues.apache.org/jira/browse/SPARK-46135 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46191) Improve `FileSystemPersistenceEngine.persist` error message in case of the existing file
[ https://issues.apache.org/jira/browse/SPARK-46191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46191: - Assignee: Dongjoon Hyun > Improve `FileSystemPersistenceEngine.persist` error message in case of the > existing file > > > Key: SPARK-46191 > URL: https://issues.apache.org/jira/browse/SPARK-46191 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42551) Support more subexpression elimination cases
[ https://issues.apache.org/jira/browse/SPARK-42551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42551: --- Labels: pull-request-available (was: ) > Support more subexpression elimination cases > > > Key: SPARK-42551 > URL: https://issues.apache.org/jira/browse/SPARK-42551 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.2 >Reporter: Wan Kun >Priority: Major > Labels: pull-request-available > > h1. *Design Sketch* > h2. How to support more subexpressions elimination cases > * Get all common expressions from input expressions of the current physical > operator to current CodeGenContext. Recursively visits all subexpressions > regardless of whether the current expression is a conditional expression. > * For each common expression: > ** Add a new boolean variable *subExprInit* to indicate whether it has > already been evaluated. > ** Add a new code block in CodeGenSupport trait, and reset those > *subExprInit* variables to *false* before the physical operators begin to > evaluate the input row. > ** Add a new wrapper subExpr function for each common subexpression. > |private void subExpr_n(${argList}) { > if (!subExprInit) { > ${eval.code} > subExprInit_n = true; > subExprIsNull_n = ${eval.isNull}; > subExprValue_n = ${eval.value}; > } > }| > > * When generating the input expression code, if the input expression is a > common expression, the expression code will be replaced with the > corresponding subExpr function. When the subExpr function is called for the > first time, *subExprInit* will be set to true, and the subsequent function > calls will do nothing. > h2. Why should we support whole-stage subexpression elimination > Right now each spark physical operator shares nothing but the input row, so > the same expressions may be evaluated multiple times across different > operators. For example, the expression udf(c1, c2) in plan Project [udf(c1, > c2)] - Filter [udf(c1, c2) > 0] - Relation will be evaluated both in Project > and Filter operators. We can reuse the expression results across different > operators such as Project and Filter. > h2. How to support whole-stage subexpression elimination > * Add two properties in CodegenSupport trait, the reusable expressions and > the the output attributes, we can reuse the expression results only if the > output attributes are the same. > * Visit all operators from top to bottom, bound the candidate expressions > with the output attributes and add to the current candidate reusable > expressions. > * Visit all operators from bottom to top, collect all the common expressions > to the current operator, and add the initialize code to the current operator > if the common expressions have not been initialized. > * Replace the common expressions code when generating codes for the > physical operators. > h1. *New support subexpression elimination patterns* > * > h2. *Support subexpression elimination with conditional expressions* > {code:java} > SELECT case when v + 2 > 1 then 1 > when v + 1 > 2 then 2 > when v + 1 > 3 then 3 END vv > FROM values(1) as t2(v) > {code} > We can reuse the result of expression *v + 1* > {code:java} > SELECT a, max(if(a > 0, b + c, null)) max_bc, min(if(a > 1, b + c, null)) > min_bc > FROM values(1, 1, 1) as t(a, b, c) > GROUP BY a > {code} > We can reuse the result of expression b + c > * > h2. *Support subexpression elimination in FilterExec* > > {code:java} > SELECT * FROM ( > SELECT v * v + 1 v1 from values(1) as t2(v) > ) t > where v1 > 5 and v1 < 10 > {code} > We can reuse the result of expression *v* * *v* *+* *1* > * > h2. *Support subexpression elimination in JoinExec* > > {code:java} > SELECT * > FROM values(1, 1) as t1(a, b) > join values(1, 2) as t2(x, y) > ON b * y between 2 and 3{code} > > We can reuse the result of expression *b* * *y* > * > h2. *Support subexpression elimination in ExpandExec* > {code:java} > SELECT a, count(b), > count(distinct case when b > 1 then b + c else null end) as count_bc_1, > count(distinct case when b < 0 then b + c else null end) as count_bc_2 > FROM values(1, 1, 1) as t(a, b, c) > GROUP BY a > {code} > We can reuse the result of expression b + c -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43403) GET /history//1/jobs/ failed: java.lang.IllegalStateException: DB is closed
[ https://issues.apache.org/jira/browse/SPARK-43403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43403: --- Labels: pull-request-available (was: ) > GET /history//1/jobs/ failed: java.lang.IllegalStateException: DB is > closed > -- > > Key: SPARK-43403 > URL: https://issues.apache.org/jira/browse/SPARK-43403 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Zhou Yifan >Priority: Major > Labels: pull-request-available > Attachments: image-2023-05-08-11-33-13-634.png > > > !image-2023-05-08-11-33-13-634.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44773) Code-gen CodegenFallback expression in WholeStageCodegen if possible
[ https://issues.apache.org/jira/browse/SPARK-44773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44773: --- Labels: pull-request-available (was: ) > Code-gen CodegenFallback expression in WholeStageCodegen if possible > > > Key: SPARK-44773 > URL: https://issues.apache.org/jira/browse/SPARK-44773 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Wan Kun >Priority: Major > Labels: pull-request-available > > Now both WholeStageCodegen framework and SubExpressionElimination framework > does not support CodegenFallback expression, but the CodegenFallback > expression which contains nullSafeEval method could gen-code just like common > expressions, now they are always be executed in a new > SpecificUnsafeProjection class, and we can not eliminate the sub expressions. > For example: > SQL: > {code:sql} > SELECT from_json(regexp_replace(s, 'a', 'x'), 'x INT, b DOUBLE').x, >from_json(regexp_replace(s, 'a', 'x'), 'x INT, b DOUBLE').b > FROM values('{"a":1, "b":0.8}') t(s) > {code} > plan: > {code:java} > *(1) Project [from_json(StructField(x,IntegerType,true), > regexp_replace(s#218, a, x, 1), Some(America/Los_Angeles)).x AS > from_json(regexp_replace(s, a, x, 1)).x#219, > from_json(StructField(b,DoubleType,true), regexp_replace(s#218, a, x, 1), > Some(America/Los_Angeles)).b AS from_json(regexp_replace(s, a, x, 1)).b#220] > +- *(1) LocalTableScan [s#218] > {code} > Due to expression org.apache.spark.sql.catalyst.expressions.JsonToStructs is > CodegenFallback expression, so we can not reuse the result of > {*}regexp_replace(s, 'a', 'x'){*}. > We can support expression > org.apache.spark.sql.catalyst.expressions.JsonToStructs code-gen in > WholeStageCodegen framework, and then reuse the result of > {*}regexp_replace(s, 'a', 'x'){*}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46191) Improve `FileSystemPersistenceEngine.persist` error message in case of the existing file
[ https://issues.apache.org/jira/browse/SPARK-46191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46191: --- Labels: pull-request-available (was: ) > Improve `FileSystemPersistenceEngine.persist` error message in case of the > existing file > > > Key: SPARK-46191 > URL: https://issues.apache.org/jira/browse/SPARK-46191 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46191) Improve `FileSystemPersistenceEngine.persist` error message in case of the existing file
Dongjoon Hyun created SPARK-46191: - Summary: Improve `FileSystemPersistenceEngine.persist` error message in case of the existing file Key: SPARK-46191 URL: https://issues.apache.org/jira/browse/SPARK-46191 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45940) Add InputPartition to DataSourceReader interface
[ https://issues.apache.org/jira/browse/SPARK-45940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45940: Assignee: Allison Wang > Add InputPartition to DataSourceReader interface > > > Key: SPARK-45940 > URL: https://issues.apache.org/jira/browse/SPARK-45940 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Add InputPartition class and make the partitions method return a list of > input partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45940) Add InputPartition to DataSourceReader interface
[ https://issues.apache.org/jira/browse/SPARK-45940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45940. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44085 [https://github.com/apache/spark/pull/44085] > Add InputPartition to DataSourceReader interface > > > Key: SPARK-45940 > URL: https://issues.apache.org/jira/browse/SPARK-45940 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add InputPartition class and make the partitions method return a list of > input partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46189) Various Pandas functions fail in interpreted mode
[ https://issues.apache.org/jira/browse/SPARK-46189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46189: --- Labels: pull-request-available (was: ) > Various Pandas functions fail in interpreted mode > - > > Key: SPARK-46189 > URL: https://issues.apache.org/jira/browse/SPARK-46189 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Bruce Robbins >Priority: Major > Labels: pull-request-available > > Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and > {{stddev}}) fail with an unboxing-related exception when run in interpreted > mode. > Here are some reproduction cases for pyspark interactive mode: > {noformat} > spark.sql("set spark.sql.codegen.wholeStage=false") > spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > import numpy as np > import pandas as pd > import pyspark.pandas as ps > pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a") > psser = ps.from_pandas(pser) > # each of the following actions gets an unboxing error > psser.kurt() > psser.var() > psser.skew() > # set up for covariance test > pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"]) > psdf = ps.from_pandas(pdf) > # this gets an unboxing error > psdf.cov() > # set up for stddev resr > from pyspark.pandas.spark import functions as SF > from pyspark.sql.functions import col > from pyspark.sql import Row > df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), > Row(a=8)]) > # this gets an unboxing error > df.select(SF.stddev(col("a"), 1)).collect() > {noformat} > Exception from the first case ({{psser.kurt()}}) is > {noformat} > java.lang.ClassCastException: class java.lang.Integer cannot be cast to class > java.lang.Double (java.lang.Integer and java.lang.Double are in module > java.base of loader 'bootstrap') > at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112) > at > org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184) > at scala.math.Ordering.lt(Ordering.scala:98) > at scala.math.Ordering.lt$(Ordering.scala:98) > at > org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184) > at > org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46188) Fix the CSS of Spark doc's generated tables
[ https://issues.apache.org/jira/browse/SPARK-46188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-46188: --- Fix Version/s: 4.0.0 > Fix the CSS of Spark doc's generated tables > --- > > Key: SPARK-46188 > URL: https://issues.apache.org/jira/browse/SPARK-46188 > Project: Spark > Issue Type: Task > Components: Documentation >Affects Versions: 4.0.0, 3.5.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.1 > > Attachments: image-2023-11-30-13-11-01-796.png > > > After [https://github.com/apache/spark/pull/40269], there is no boarder in > the generated tables of Spark doc. We should fix it. > !image-2023-11-30-13-11-01-796.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46188) Fix the CSS of Spark doc's generated tables
[ https://issues.apache.org/jira/browse/SPARK-46188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-46188. Fix Version/s: 3.5.1 Resolution: Fixed Issue resolved by pull request 44097 [https://github.com/apache/spark/pull/44097] > Fix the CSS of Spark doc's generated tables > --- > > Key: SPARK-46188 > URL: https://issues.apache.org/jira/browse/SPARK-46188 > Project: Spark > Issue Type: Task > Components: Documentation >Affects Versions: 4.0.0, 3.5.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.5.1 > > Attachments: image-2023-11-30-13-11-01-796.png > > > After [https://github.com/apache/spark/pull/40269], there is no boarder in > the generated tables of Spark doc. We should fix it. > !image-2023-11-30-13-11-01-796.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46190) ANSI Double quoted identifiers do not work in Python threads
Max Payson created SPARK-46190: -- Summary: ANSI Double quoted identifiers do not work in Python threads Key: SPARK-46190 URL: https://issues.apache.org/jira/browse/SPARK-46190 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.5.0, 3.4.1, 3.4.0 Reporter: Max Payson Enabling and using `spark.sql.ansi.doubleQuotedIdentifiers` does not work correctly in Python threads The following example shows how applying a filter, "\"status\" = 'Unchanged'", leads to empty results when run in a thread. I believe this is because the "status" field is interpreted as a literal in the thread, but as an attribute outside of it. {code:python} from concurrent import futures from pyspark import sql spark = ( sql.SparkSession.builder.master("local[*]") .config("spark.sql.ansi.enabled", "true") .config("spark.sql.ansi.doubleQuotedIdentifiers", "true") .getOrCreate() ) def demonstrate_issue(spark): # Path to JSON file with contents: # [{"status": "Unchanged"}, {"status": "Changed"}] df = spark.read.json("data/example.json") df.filter("\"status\" = 'Unchanged'").show() # Shows 1 record, expected demonstrate_issue(spark) with futures.ThreadPoolExecutor(1) as executor: # Shows 0 records, unexpected executor.submit(demonstrate_issue, spark) {code} Additional testing notes: * When parsing the expression with `sql.functions.expr` in Java via Py4J, the "status" field is interpreted as a literal value from the thread, not an attribute * Using double quotes with `spark.sql` does work in the thread * Using a dataframe created in memory does work in the thread * Tested in versions 3.4.0, 3.4.1, & 3.5.0 on Windows and Mac The original PR that added this option is here: [https://github.com/apache/spark/pull/38022] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46189) Various Pandas functions fail in interpreted mode
[ https://issues.apache.org/jira/browse/SPARK-46189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-46189: -- Description: Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and {{stddev}}) fail with an unboxing-related exception when run in interpreted mode. Here are some reproduction cases for pyspark interactive mode: {noformat} spark.sql("set spark.sql.codegen.wholeStage=false") spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") import numpy as np import pandas as pd import pyspark.pandas as ps pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a") psser = ps.from_pandas(pser) # each of the following actions gets an unboxing error psser.kurt() psser.var() psser.skew() # set up for covariance test pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"]) psdf = ps.from_pandas(pdf) # this gets an unboxing error psdf.cov() # set up for stddev resr from pyspark.pandas.spark import functions as SF from pyspark.sql.functions import col from pyspark.sql import Row df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), Row(a=8)]) # this gets an unboxing error df.select(SF.stddev(col("a"), 1)).collect() {noformat} Exception from the first case ({{psser.kurt()}}) is {noformat} java.lang.ClassCastException: class java.lang.Integer cannot be cast to class java.lang.Double (java.lang.Integer and java.lang.Double are in module java.base of loader 'bootstrap') at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112) at org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184) at scala.math.Ordering.lt(Ordering.scala:98) at scala.math.Ordering.lt$(Ordering.scala:98) at org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184) at org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196) {noformat} was: Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and {{stddev}}) fail with an unboxing-related exception when run in interpreted mode. Here are some reproduction cases for pyspark interactive mode: {noformat} sql("set spark.sql.codegen.wholeStage=false") spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") import numpy as np import pandas as pd import pyspark.pandas as ps pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a") psser = ps.from_pandas(pser) # each of the following actions gets an unboxing error psser.kurt() psser.var() psser.skew() # set up for covariance test pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"]) psdf = ps.from_pandas(pdf) # this gets an unboxing error psdf.cov() # set up for stddev resr from pyspark.pandas.spark import functions as SF from pyspark.sql.functions import col from pyspark.sql import Row df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), Row(a=8)]) # this gets an unboxing error df.select(SF.stddev(col("a"), 1)).collect() {noformat} Exception from the first case ({{psser.kurt()}}) is {noformat} java.lang.ClassCastException: class java.lang.Integer cannot be cast to class java.lang.Double (java.lang.Integer and java.lang.Double are in module java.base of loader 'bootstrap') at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112) at org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184) at scala.math.Ordering.lt(Ordering.scala:98) at scala.math.Ordering.lt$(Ordering.scala:98) at org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184) at org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196) {noformat} > Various Pandas functions fail in interpreted mode > - > > Key: SPARK-46189 > URL: https://issues.apache.org/jira/browse/SPARK-46189 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Bruce Robbins >Priority: Major > > Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and > {{stddev}}) fail with an unboxing-related exception when run in interpreted > mode. > Here are some reproduction cases for pyspark interactive mode: > {noformat} > spark.sql("set spark.sql.codegen.wholeStage=false") > spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > import numpy as np > import pandas as pd > import pyspark.pandas as ps > pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a") > psser = ps.from_pandas(pser) > # each of the following actions gets an unboxing error > psser.kurt() > psser.var() > psser.skew() > # set up for
[jira] [Created] (SPARK-46189) Various Pandas functions fail in interpreted mode
Bruce Robbins created SPARK-46189: - Summary: Various Pandas functions fail in interpreted mode Key: SPARK-46189 URL: https://issues.apache.org/jira/browse/SPARK-46189 Project: Spark Issue Type: Bug Components: Pandas API on Spark, SQL Affects Versions: 3.5.0, 3.4.1 Reporter: Bruce Robbins Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and {{stddev}}) fail with an unboxing-related exception when run in interpreted mode. Here are some reproduction cases for pyspark interactive mode: {noformat} sql("set spark.sql.codegen.wholeStage=false") spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") import numpy as np import pandas as pd import pyspark.pandas as ps pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a") psser = ps.from_pandas(pser) # each of the following actions gets an unboxing error psser.kurt() psser.var() psser.skew() # set up for covariance test pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"]) psdf = ps.from_pandas(pdf) # this gets an unboxing error psdf.cov() # set up for stddev resr from pyspark.pandas.spark import functions as SF from pyspark.sql.functions import col from pyspark.sql import Row df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), Row(a=8)]) # this gets an unboxing error df.select(SF.stddev(col("a"), 1)).collect() {noformat} Exception from the first case ({{psser.kurt()}}) is {noformat} java.lang.ClassCastException: class java.lang.Integer cannot be cast to class java.lang.Double (java.lang.Integer and java.lang.Double are in module java.base of loader 'bootstrap') at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112) at org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184) at scala.math.Ordering.lt(Ordering.scala:98) at scala.math.Ordering.lt$(Ordering.scala:98) at org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184) at org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46188) Fix the CSS of Spark doc's generated tables
[ https://issues.apache.org/jira/browse/SPARK-46188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46188: --- Labels: pull-request-available (was: ) > Fix the CSS of Spark doc's generated tables > --- > > Key: SPARK-46188 > URL: https://issues.apache.org/jira/browse/SPARK-46188 > Project: Spark > Issue Type: Task > Components: Documentation >Affects Versions: 4.0.0, 3.5.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Attachments: image-2023-11-30-13-11-01-796.png > > > After [https://github.com/apache/spark/pull/40269], there is no boarder in > the generated tables of Spark doc. We should fix it. > !image-2023-11-30-13-11-01-796.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46188) Fix the CSS of Spark doc's generated tables
[ https://issues.apache.org/jira/browse/SPARK-46188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-46188: --- Description: After [https://github.com/apache/spark/pull/40269], there is no boarder in the generated tables of Spark doc. We should fix it. !image-2023-11-30-13-11-01-796.png! was: After https://github.com/apache/spark/pull/40269, there is no boarder in the generated tables of Spark doc. We should fix it. !image-2023-11-30-13-10-03-875.png! > Fix the CSS of Spark doc's generated tables > --- > > Key: SPARK-46188 > URL: https://issues.apache.org/jira/browse/SPARK-46188 > Project: Spark > Issue Type: Task > Components: Documentation >Affects Versions: 4.0.0, 3.5.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Attachments: image-2023-11-30-13-11-01-796.png > > > After [https://github.com/apache/spark/pull/40269], there is no boarder in > the generated tables of Spark doc. We should fix it. > !image-2023-11-30-13-11-01-796.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46188) Fix the CSS of Spark doc's generated tables
[ https://issues.apache.org/jira/browse/SPARK-46188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-46188: --- Attachment: image-2023-11-30-13-11-01-796.png > Fix the CSS of Spark doc's generated tables > --- > > Key: SPARK-46188 > URL: https://issues.apache.org/jira/browse/SPARK-46188 > Project: Spark > Issue Type: Task > Components: Documentation >Affects Versions: 4.0.0, 3.5.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Attachments: image-2023-11-30-13-11-01-796.png > > > After https://github.com/apache/spark/pull/40269, there is no boarder in the > generated tables of Spark doc. We should fix it. > !image-2023-11-30-13-10-03-875.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46188) Fix the CSS of Spark doc's generated tables
Gengliang Wang created SPARK-46188: -- Summary: Fix the CSS of Spark doc's generated tables Key: SPARK-46188 URL: https://issues.apache.org/jira/browse/SPARK-46188 Project: Spark Issue Type: Task Components: Documentation Affects Versions: 4.0.0, 3.5.1 Reporter: Gengliang Wang Assignee: Gengliang Wang After https://github.com/apache/spark/pull/40269, there is no boarder in the generated tables of Spark doc. We should fix it. !image-2023-11-30-13-10-03-875.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45315) Drop JDK 8/11 and make JDK 17 by default
[ https://issues.apache.org/jira/browse/SPARK-45315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45315. --- Fix Version/s: 4.0.0 Resolution: Fixed > Drop JDK 8/11 and make JDK 17 by default > > > Key: SPARK-45315 > URL: https://issues.apache.org/jira/browse/SPARK-45315 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Critical > Labels: releasenotes > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46186) Invalid Spark Connect execution state transition if interrupted before thread started
[ https://issues.apache.org/jira/browse/SPARK-46186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46186: --- Labels: pull-request-available (was: ) > Invalid Spark Connect execution state transition if interrupted before thread > started > - > > Key: SPARK-46186 > URL: https://issues.apache.org/jira/browse/SPARK-46186 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Juliusz Sompolski >Priority: Major > Labels: pull-request-available > > Fix edge case where interrupting execution before the ExecuteThreadRunner > started could lead to illegal state transition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38473) Use error classes in org.apache.spark.scheduler
[ https://issues.apache.org/jira/browse/SPARK-38473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-38473: --- Labels: pull-request-available (was: ) > Use error classes in org.apache.spark.scheduler > --- > > Key: SPARK-38473 > URL: https://issues.apache.org/jira/browse/SPARK-38473 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3
[ https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44573. --- Resolution: Invalid Thank you for the confirmation, [~siddaraju.g.c]. BTW, Apache Spark 3.4.2 is released today with several correctness patches. - https://spark.apache.org/releases/spark-release-3-4-2.html > Couldn't submit Spark application to Kubenetes in versions v1.27.3 > -- > > Key: SPARK-44573 > URL: https://issues.apache.org/jira/browse/SPARK-44573 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.4.1 >Reporter: Siddaraju G C >Priority: Major > > Spark-submit ( cluster mode on Kubernetes ) results error > *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s > cluster. > Steps followed: > * using IBM cloud, created 3 Instances > * 1st Instance act as master node and another two acts as worker nodes > > {noformat} > root@vsi-spark-master:/opt# kubectl get nodes > NAME STATUS ROLES AGE VERSION > vsi-spark-master Ready control-plane,master 2d v1.27.3+k3s1 > vsi-spark-worker-1 Ready 47h v1.27.3+k3s1 > vsi-spark-worker-2 Ready 47h > v1.27.3+k3s1{noformat} > * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder > * Ran spark by using below command > > {noformat} > root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master > k8s://http://:6443 --conf > spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode > cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf > spark.executor.instances=5 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB > local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat} > * And getting below error message. > {noformat} > 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. > 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified > a krb5.conf file locally or via a ConfigMap. Make sure that you have the > krb5.conf locally on the driver image. > 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" > first. It should be yes. > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122) > at > io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.IOException: Connection reset > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535) > at >
[jira] [Closed] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3
[ https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-44573. - > Couldn't submit Spark application to Kubenetes in versions v1.27.3 > -- > > Key: SPARK-44573 > URL: https://issues.apache.org/jira/browse/SPARK-44573 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.4.1 >Reporter: Siddaraju G C >Priority: Major > > Spark-submit ( cluster mode on Kubernetes ) results error > *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s > cluster. > Steps followed: > * using IBM cloud, created 3 Instances > * 1st Instance act as master node and another two acts as worker nodes > > {noformat} > root@vsi-spark-master:/opt# kubectl get nodes > NAME STATUS ROLES AGE VERSION > vsi-spark-master Ready control-plane,master 2d v1.27.3+k3s1 > vsi-spark-worker-1 Ready 47h v1.27.3+k3s1 > vsi-spark-worker-2 Ready 47h > v1.27.3+k3s1{noformat} > * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder > * Ran spark by using below command > > {noformat} > root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master > k8s://http://:6443 --conf > spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode > cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf > spark.executor.instances=5 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB > local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat} > * And getting below error message. > {noformat} > 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. > 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified > a krb5.conf file locally or via a ConfigMap. Make sure that you have the > krb5.conf locally on the driver image. > 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" > first. It should be yes. > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122) > at > io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.IOException: Connection reset > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535) > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558) > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:349) > at >
[jira] [Commented] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3
[ https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791660#comment-17791660 ] Siddaraju G C commented on SPARK-44573: --- [~dongjoon] After pointing to correct IKS cluster endpoint, Spark is doing good. We can close this ticket for now. > Couldn't submit Spark application to Kubenetes in versions v1.27.3 > -- > > Key: SPARK-44573 > URL: https://issues.apache.org/jira/browse/SPARK-44573 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.4.1 >Reporter: Siddaraju G C >Priority: Major > > Spark-submit ( cluster mode on Kubernetes ) results error > *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s > cluster. > Steps followed: > * using IBM cloud, created 3 Instances > * 1st Instance act as master node and another two acts as worker nodes > > {noformat} > root@vsi-spark-master:/opt# kubectl get nodes > NAME STATUS ROLES AGE VERSION > vsi-spark-master Ready control-plane,master 2d v1.27.3+k3s1 > vsi-spark-worker-1 Ready 47h v1.27.3+k3s1 > vsi-spark-worker-2 Ready 47h > v1.27.3+k3s1{noformat} > * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder > * Ran spark by using below command > > {noformat} > root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master > k8s://http://:6443 --conf > spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode > cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf > spark.executor.instances=5 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB > local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat} > * And getting below error message. > {noformat} > 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. > 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified > a krb5.conf file locally or via a ConfigMap. Make sure that you have the > krb5.conf locally on the driver image. > 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" > first. It should be yes. > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122) > at > io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.IOException: Connection reset > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535) > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558) >
[jira] [Commented] (SPARK-37358) Spark-on-K8S: Allow disabling of resources.limits.memory in executor pod spec
[ https://issues.apache.org/jira/browse/SPARK-37358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791638#comment-17791638 ] Björn Boschman commented on SPARK-37358: anybody ever looking into this? we can provide a patch > Spark-on-K8S: Allow disabling of resources.limits.memory in executor pod spec > - > > Key: SPARK-37358 > URL: https://issues.apache.org/jira/browse/SPARK-37358 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Andrew de Quincey >Priority: Major > > When spark creates an executor pod on my Kubernetes cluster, it adds the > following resources definition: > {{ resources:}} > {{ limits:}} > {{ memory: 896Mi}} > {{ requests:}} > {{ cpu: '4'}} > {{ memory: 896Mi}} > Note that resources.limits.cpu is not set. This is controlled by the > spark.kubernetes.driver.limit.cores setting (which we intentionally do not > set). > We'd like to be able to omit the resources.limit.memory setting as well to > let the spark worker expand its memory as necessary. > However, this isn't possible. The scala code in > BasicExecutorFeatureStep.scala is as follows: > {{{}.editOrNewResources(){}}}{{{}.addToRequests("memory", > executorMemoryQuantity){}}}{{{}.addToLimits("memory", > executorMemoryQuantity){}}}{{{}.addToRequests("cpu", > executorCpuQuantity){}}}{{{}.addToLimits(executorResourceQuantities.asJava){}}}{{{}.endResources(){}}}}}{}}} > > i.e. it always adds the memory limit, and there's no way to stop it. > Oh - most of our code is in python, so it is not bound by the JVM memory > settings, -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46186) Invalid Spark Connect execution state transition if interrupted before thread started
Juliusz Sompolski created SPARK-46186: - Summary: Invalid Spark Connect execution state transition if interrupted before thread started Key: SPARK-46186 URL: https://issues.apache.org/jira/browse/SPARK-46186 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Juliusz Sompolski Fix edge case where interrupting execution before the ExecuteThreadRunner started could lead to illegal state transition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46185) Add Apache Spark 3.4.2 Dockerfiles
[ https://issues.apache.org/jira/browse/SPARK-46185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46185: --- Labels: pull-request-available (was: ) > Add Apache Spark 3.4.2 Dockerfiles > -- > > Key: SPARK-46185 > URL: https://issues.apache.org/jira/browse/SPARK-46185 > Project: Spark > Issue Type: Bug > Components: Spark Docker >Affects Versions: 3.4.2 >Reporter: Yikun Jiang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46185) Add Apache Spark 3.4.2 Dockerfiles
Yikun Jiang created SPARK-46185: --- Summary: Add Apache Spark 3.4.2 Dockerfiles Key: SPARK-46185 URL: https://issues.apache.org/jira/browse/SPARK-46185 Project: Spark Issue Type: Bug Components: Spark Docker Affects Versions: 3.4.2 Reporter: Yikun Jiang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3
[ https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791564#comment-17791564 ] Dongjoon Hyun commented on SPARK-44573: --- To [~siddaraju.g.c], could you try other Apache Spark binaries and let us know the result? If it consistently fails on your environment across multiple Apache Spark binaries, it could be a setting issue like [~dcoliversun] mentioned in the above. > Couldn't submit Spark application to Kubenetes in versions v1.27.3 > -- > > Key: SPARK-44573 > URL: https://issues.apache.org/jira/browse/SPARK-44573 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.4.1 >Reporter: Siddaraju G C >Priority: Major > > Spark-submit ( cluster mode on Kubernetes ) results error > *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s > cluster. > Steps followed: > * using IBM cloud, created 3 Instances > * 1st Instance act as master node and another two acts as worker nodes > > {noformat} > root@vsi-spark-master:/opt# kubectl get nodes > NAME STATUS ROLES AGE VERSION > vsi-spark-master Ready control-plane,master 2d v1.27.3+k3s1 > vsi-spark-worker-1 Ready 47h v1.27.3+k3s1 > vsi-spark-worker-2 Ready 47h > v1.27.3+k3s1{noformat} > * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder > * Ran spark by using below command > > {noformat} > root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master > k8s://http://:6443 --conf > spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode > cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf > spark.executor.instances=5 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB > local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat} > * And getting below error message. > {noformat} > 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. > 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified > a krb5.conf file locally or via a ConfigMap. Make sure that you have the > krb5.conf locally on the driver image. > 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" > first. It should be yes. > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122) > at > io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.IOException: Connection reset > at >
[jira] [Updated] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3
[ https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44573: -- Priority: Major (was: Blocker) > Couldn't submit Spark application to Kubenetes in versions v1.27.3 > -- > > Key: SPARK-44573 > URL: https://issues.apache.org/jira/browse/SPARK-44573 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.4.1 >Reporter: Siddaraju G C >Priority: Major > > Spark-submit ( cluster mode on Kubernetes ) results error > *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s > cluster. > Steps followed: > * using IBM cloud, created 3 Instances > * 1st Instance act as master node and another two acts as worker nodes > > {noformat} > root@vsi-spark-master:/opt# kubectl get nodes > NAME STATUS ROLES AGE VERSION > vsi-spark-master Ready control-plane,master 2d v1.27.3+k3s1 > vsi-spark-worker-1 Ready 47h v1.27.3+k3s1 > vsi-spark-worker-2 Ready 47h > v1.27.3+k3s1{noformat} > * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder > * Ran spark by using below command > > {noformat} > root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master > k8s://http://:6443 --conf > spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode > cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf > spark.executor.instances=5 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB > local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat} > * And getting below error message. > {noformat} > 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. > 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified > a krb5.conf file locally or via a ConfigMap. Make sure that you have the > krb5.conf locally on the driver image. > 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" > first. It should be yes. > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122) > at > io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.IOException: Connection reset > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535) > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558) > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:349) > at >
[jira] [Comment Edited] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3
[ https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791534#comment-17791534 ] Qian Sun edited comment on SPARK-44573 at 11/30/23 9:48 AM: Did you bind role with your serviceaccount? ref: [https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac] cc [~dongjoon] was (Author: dcoliversun): Did you bind role with your serviceaccount? ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac > Couldn't submit Spark application to Kubenetes in versions v1.27.3 > -- > > Key: SPARK-44573 > URL: https://issues.apache.org/jira/browse/SPARK-44573 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.4.1 >Reporter: Siddaraju G C >Priority: Blocker > > Spark-submit ( cluster mode on Kubernetes ) results error > *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s > cluster. > Steps followed: > * using IBM cloud, created 3 Instances > * 1st Instance act as master node and another two acts as worker nodes > > {noformat} > root@vsi-spark-master:/opt# kubectl get nodes > NAME STATUS ROLES AGE VERSION > vsi-spark-master Ready control-plane,master 2d v1.27.3+k3s1 > vsi-spark-worker-1 Ready 47h v1.27.3+k3s1 > vsi-spark-worker-2 Ready 47h > v1.27.3+k3s1{noformat} > * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder > * Ran spark by using below command > > {noformat} > root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master > k8s://http://:6443 --conf > spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode > cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf > spark.executor.instances=5 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB > local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat} > * And getting below error message. > {noformat} > 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. > 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified > a krb5.conf file locally or via a ConfigMap. Make sure that you have the > krb5.conf locally on the driver image. > 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" > first. It should be yes. > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122) > at > io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.IOException: Connection reset > at >
[jira] [Commented] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3
[ https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791534#comment-17791534 ] Qian Sun commented on SPARK-44573: -- Did you bind role with your serviceaccount? ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac > Couldn't submit Spark application to Kubenetes in versions v1.27.3 > -- > > Key: SPARK-44573 > URL: https://issues.apache.org/jira/browse/SPARK-44573 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.4.1 >Reporter: Siddaraju G C >Priority: Blocker > > Spark-submit ( cluster mode on Kubernetes ) results error > *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s > cluster. > Steps followed: > * using IBM cloud, created 3 Instances > * 1st Instance act as master node and another two acts as worker nodes > > {noformat} > root@vsi-spark-master:/opt# kubectl get nodes > NAME STATUS ROLES AGE VERSION > vsi-spark-master Ready control-plane,master 2d v1.27.3+k3s1 > vsi-spark-worker-1 Ready 47h v1.27.3+k3s1 > vsi-spark-worker-2 Ready 47h > v1.27.3+k3s1{noformat} > * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder > * Ran spark by using below command > > {noformat} > root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master > k8s://http://:6443 --conf > spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode > cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf > spark.executor.instances=5 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB > local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat} > * And getting below error message. > {noformat} > 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. > 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified > a krb5.conf file locally or via a ConfigMap. Make sure that you have the > krb5.conf locally on the driver image. > 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" > first. It should be yes. > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122) > at > io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.IOException: Connection reset > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535) > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558) > at
[jira] [Updated] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg in spark3.5.0 website
[ https://issues.apache.org/jira/browse/SPARK-46183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-46183: - Summary: Incorrect path for spark-hero-thin-light.jpg in spark3.5.0 website (was: Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website) > Incorrect path for spark-hero-thin-light.jpg in spark3.5.0 website > -- > > Key: SPARK-46183 > URL: https://issues.apache.org/jira/browse/SPARK-46183 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Qian Sun >Priority: Minor > Attachments: network.png > > > When I visit [https://spark.apache.org/docs/3.5.0/,] > spark-hero-thin-light.jpg is not found caused by > [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,] > the path should be ../images/spark-hero-thin-light.jpg -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website
Qian Sun created SPARK-46183: Summary: Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website Key: SPARK-46183 URL: https://issues.apache.org/jira/browse/SPARK-46183 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 3.5.0 Reporter: Qian Sun When I visit [https://spark.apache.org/docs/3.5.0/,] spark-hero-thin-light.jpg is not found caused by [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,] the path should be ../images/spark-hero-thin-light.jpg -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website
[ https://issues.apache.org/jira/browse/SPARK-46183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-46183: - Attachment: network.png > Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website > --- > > Key: SPARK-46183 > URL: https://issues.apache.org/jira/browse/SPARK-46183 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Qian Sun >Priority: Minor > Attachments: network.png > > > When I visit [https://spark.apache.org/docs/3.5.0/,] > spark-hero-thin-light.jpg is not found caused by > [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,] > the path should be ../images/spark-hero-thin-light.jpg -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website
[ https://issues.apache.org/jira/browse/SPARK-46183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-46183: - Attachment: (was: L1VzZXJzL2hlbmd6aGVuLnNxL0xpYnJhcnkvQXBwbGljYXRpb24gU3VwcG9ydC9pRGluZ1RhbGsvNDUyMDQ5NjgwX3YyL0ltYWdlRmlsZXMvMTcwMTMzNjk5MjkzNF81QjRENEU2RC1FNUM2LTQxNEQtOERGRS0wOTIxRUUzMjY2OTcucG5n.png) > Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website > --- > > Key: SPARK-46183 > URL: https://issues.apache.org/jira/browse/SPARK-46183 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Qian Sun >Priority: Minor > > When I visit [https://spark.apache.org/docs/3.5.0/,] > spark-hero-thin-light.jpg is not found caused by > [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,] > the path should be ../images/spark-hero-thin-light.jpg -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl
[ https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-32246: -- Assignee: (was: Apache Spark) > Have a way to optionally run streaming-kinesis-asl > -- > > Key: SPARK-32246 > URL: https://issues.apache.org/jira/browse/SPARK-32246 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on > external Amazon kinesis service. > We should have a way to run it optionally. Currently, this is not being run > in Github Actions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl
[ https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-32246: -- Assignee: Apache Spark > Have a way to optionally run streaming-kinesis-asl > -- > > Key: SPARK-32246 > URL: https://issues.apache.org/jira/browse/SPARK-32246 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on > external Amazon kinesis service. > We should have a way to run it optionally. Currently, this is not being run > in Github Actions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46170) Support inject adaptive query post planner strategy rules in SparkSessionExtensions
[ https://issues.apache.org/jira/browse/SPARK-46170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You reassigned SPARK-46170: - Assignee: XiDuo You > Support inject adaptive query post planner strategy rules in > SparkSessionExtensions > --- > > Key: SPARK-46170 > URL: https://issues.apache.org/jira/browse/SPARK-46170 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46170) Support inject adaptive query post planner strategy rules in SparkSessionExtensions
[ https://issues.apache.org/jira/browse/SPARK-46170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You resolved SPARK-46170. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44074 [https://github.com/apache/spark/pull/44074] > Support inject adaptive query post planner strategy rules in > SparkSessionExtensions > --- > > Key: SPARK-46170 > URL: https://issues.apache.org/jira/browse/SPARK-46170 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl
[ https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-32246: -- Assignee: Apache Spark > Have a way to optionally run streaming-kinesis-asl > -- > > Key: SPARK-32246 > URL: https://issues.apache.org/jira/browse/SPARK-32246 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on > external Amazon kinesis service. > We should have a way to run it optionally. Currently, this is not being run > in Github Actions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl
[ https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-32246: -- Assignee: (was: Apache Spark) > Have a way to optionally run streaming-kinesis-asl > -- > > Key: SPARK-32246 > URL: https://issues.apache.org/jira/browse/SPARK-32246 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on > external Amazon kinesis service. > We should have a way to run it optionally. Currently, this is not being run > in Github Actions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl
[ https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-32246: -- Assignee: (was: Apache Spark) > Have a way to optionally run streaming-kinesis-asl > -- > > Key: SPARK-32246 > URL: https://issues.apache.org/jira/browse/SPARK-32246 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on > external Amazon kinesis service. > We should have a way to run it optionally. Currently, this is not being run > in Github Actions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl
[ https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-32246: -- Assignee: Apache Spark > Have a way to optionally run streaming-kinesis-asl > -- > > Key: SPARK-32246 > URL: https://issues.apache.org/jira/browse/SPARK-32246 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on > external Amazon kinesis service. > We should have a way to run it optionally. Currently, this is not being run > in Github Actions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45825) Fix these issue in module sql/catalyst
[ https://issues.apache.org/jira/browse/SPARK-45825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45825: -- Assignee: Apache Spark (was: Jiaan Geng) > Fix these issue in module sql/catalyst > -- > > Key: SPARK-45825 > URL: https://issues.apache.org/jira/browse/SPARK-45825 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45825) Fix these issue in module sql/catalyst
[ https://issues.apache.org/jira/browse/SPARK-45825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45825: -- Assignee: Jiaan Geng (was: Apache Spark) > Fix these issue in module sql/catalyst > -- > > Key: SPARK-45825 > URL: https://issues.apache.org/jira/browse/SPARK-45825 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12105) Add a DataFrame.show() with argument for output PrintStream
[ https://issues.apache.org/jira/browse/SPARK-12105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-12105: --- Labels: bulk-closed pull-request-available (was: bulk-closed) > Add a DataFrame.show() with argument for output PrintStream > --- > > Key: SPARK-12105 > URL: https://issues.apache.org/jira/browse/SPARK-12105 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2 >Reporter: Dean Wampler >Priority: Minor > Labels: bulk-closed, pull-request-available > > It would be nice to send the output of DataFrame.show(...) to a different > output stream than stdout, including just capturing the string itself. This > is useful, e.g., for testing. Actually, it would be sufficient and perhaps > better to just make DataFrame.showString a public method, -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46179) Generate golden files for SQLQueryTestSuites with Postgres
[ https://issues.apache.org/jira/browse/SPARK-46179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46179: --- Labels: pull-request-available (was: ) > Generate golden files for SQLQueryTestSuites with Postgres > -- > > Key: SPARK-46179 > URL: https://issues.apache.org/jira/browse/SPARK-46179 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 4.0.0 >Reporter: Andy Lam >Priority: Major > Labels: pull-request-available > > For correctness checking of our SQLQueryTestSuites, we want to run > SQLQueryTestSuites with other DBMS as a reference DBMS to generate golden > files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44881) Executor stucked on retrying to fetch shuffle data when `java.lang.OutOfMemoryError. unable to create native thread` exception occurred.
[ https://issues.apache.org/jira/browse/SPARK-44881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44881: --- Labels: pull-request-available (was: ) > Executor stucked on retrying to fetch shuffle data when > `java.lang.OutOfMemoryError. unable to create native thread` exception > occurred. > > > Key: SPARK-44881 > URL: https://issues.apache.org/jira/browse/SPARK-44881 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: hgs >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org