[jira] [Resolved] (SPARK-46194) Clean up the TODO comments left in SPARK-33775

2023-11-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46194.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44103
[https://github.com/apache/spark/pull/44103]

> Clean up the TODO comments left in SPARK-33775
> --
>
> Key: SPARK-46194
> URL: https://issues.apache.org/jira/browse/SPARK-46194
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46194) Clean up the TODO comments left in SPARK-33775

2023-11-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46194:
-

Assignee: Yang Jie

> Clean up the TODO comments left in SPARK-33775
> --
>
> Key: SPARK-46194
> URL: https://issues.apache.org/jira/browse/SPARK-46194
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46198) Unexpected Shuffle Read when using cached DataFrame

2023-11-30 Thread Vitaliy Savkin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitaliy Savkin updated SPARK-46198:
---
Description: 
When a computation is base on a cached data frames, I expect to see no Shuffle 
Reads, but it happens under certain circumstances.

*Reproduction*
{code:scala}
val ctx: SQLContext = // init context 
val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce"

def populateAndRead(tag: String): DataFrame = {
  val path = s"$root/numbers_$tag"
//  import ctx.implicits._
//  import org.apache.spark.sql.functions.lit
//  (0 to 10 * 1000 * 1000)
//.toDF("id")
//.withColumn(tag, lit(tag.toUpperCase))
//.repartition(100)
//.write
//.option("header", "true")
//.mode("ignore")
//.csv(path)

  ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag + 
"_id")
}

val dfa = populateAndRead("a1")
val dfb = populateAndRead("b1")
val res =
  dfa.join(dfb, dfa("a1_id") === dfb("b1_id"))
.unionByName(dfa.join(dfb, dfa("a1") === dfb("b1")))
.cache()

println(res.count())
res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers")
{code}
Relevant configs
{code:scala}
spark.executor.instances=10
spark.executor.cores=7
spark.executor.memory=40g
spark.executor.memoryOverhead=5g

spark.shuffle.service.enabled=true
spark.sql.adaptive.enabled=false
spark.sql.autoBroadcastJoinThreshold=-1
{code}
Spark Plan says that cache is used
{code:scala}
== Physical Plan ==
Execute InsertIntoHadoopFsRelationCommand (27)
+- Coalesce (26)
   +- InMemoryTableScan (1)
         +- InMemoryRelation (2)
               +- Union (25)
                  :- * SortMergeJoin Inner (13)
                  :  :- * Sort (7)
                  :  :  +- Exchange (6)
                  :  :     +- * Project (5)
                  :  :        +- * Filter (4)
                  :  :           +- Scan csv  (3)
                  :  +- * Sort (12)
                  :     +- Exchange (11)
                  :        +- * Project (10)
                  :           +- * Filter (9)
                  :              +- Scan csv  (8)
                  +- * SortMergeJoin Inner (24)
                     :- * Sort (18)
                     :  +- Exchange (17)
                     :     +- * Project (16)
                     :        +- * Filter (15)
                     :           +- Scan csv  (14)
                     +- * Sort (23)
                        +- Exchange (22)
                           +- * Project (21)
                              +- * Filter (20)
                                 +- Scan csv  (19)
{code}
But when running on YARN, the csv job has shuffle reads.

!shuffle.png!

*Additional info*
 - I was unable to reproduce it with local Spark.
 - If {{.withColumnRenamed("id", tag + "_id")}} is dropped and the join 
conditions are changed to just {{{}"id"{}}}, the issue disappears!
 - This behaviour is stable - it's not a result of failed instances.

*Production impact*

Without cache saving data in production takes much longer (30 seconds vs 18 
seconds). To avoid shuffle reads, we had to add a {{repartition}} step before 
{{cache}} as a workaround, which reduced time from 18 seconds to 10.

  was:
When a computation is base on a cached data frames, I expect to see no Shuffle 
Reads, but it happens under certain circumstances.

*Reproduction*
{code:scala}
val ctx: SQLContext = // init context 
val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce"

def populateAndRead(tag: String): DataFrame = {
  val path = s"$root/numbers_$tag"
//  import ctx.implicits._
//  import org.apache.spark.sql.functions.lit
//  (0 to 10 * 1000 * 1000)
//.toDF("id")
//.withColumn(tag, lit(tag.toUpperCase))
//.repartition(100)
//.write
//.option("header", "true")
//.mode("ignore")
//.csv(path)

  ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag + 
"_id")
}

val dfa = populateAndRead("a1")
val dfb = populateAndRead("b1")
val res =
  dfa.join(dfb, dfa("a1_id") === dfb("b1_id"))
.unionByName(dfa.join(dfb, dfa("a1") === dfb("b1")))
.cache()

println(res.count())
res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers")
{code}
Relevant configs
{code:scala}
spark.executor.instances=10
spark.executor.cores=7
spark.executor.memory=40g
spark.executor.memoryOverhead=5g

spark.shuffle.service.enabled=true
spark.sql.adaptive.enabled=false
spark.sql.autoBroadcastJoinThreshold=-1
{code}
Spark Plan says that cache is used
{code:scala}
== Physical Plan ==
Execute InsertIntoHadoopFsRelationCommand (27)
+- Coalesce (26)
   +- InMemoryTableScan (1)
         +- InMemoryRelation (2)
               +- Union (25)
                  :- * SortMergeJoin Inner (13)
          

[jira] [Created] (SPARK-46198) Unexpected Shuffle Read when using cached DataFrame

2023-11-30 Thread Vitaliy Savkin (Jira)
Vitaliy Savkin created SPARK-46198:
--

 Summary: Unexpected Shuffle Read when using cached DataFrame
 Key: SPARK-46198
 URL: https://issues.apache.org/jira/browse/SPARK-46198
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.1
Reporter: Vitaliy Savkin
 Attachments: shuffle.png

When a computation is base on a cached data frames, I expect to see no Shuffle 
Reads, but it happens under certain circumstances.

*Reproduction*
{code:scala}
val ctx: SQLContext = // init context 
val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce"

def populateAndRead(tag: String): DataFrame = {
  val path = s"$root/numbers_$tag"
//  import ctx.implicits._
//  import org.apache.spark.sql.functions.lit
//  (0 to 10 * 1000 * 1000)
//.toDF("id")
//.withColumn(tag, lit(tag.toUpperCase))
//.repartition(100)
//.write
//.option("header", "true")
//.mode("ignore")
//.csv(path)

  ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag + 
"_id")
}

val dfa = populateAndRead("a1")
val dfb = populateAndRead("b1")
val res =
  dfa.join(dfb, dfa("a1_id") === dfb("b1_id"))
.unionByName(dfa.join(dfb, dfa("a1") === dfb("b1")))
.cache()

println(res.count())
res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers")
{code}
Relevant configs
{code:scala}
spark.executor.instances=10
spark.executor.cores=7
spark.executor.memory=40g
spark.executor.memoryOverhead=5g

spark.shuffle.service.enabled=true
spark.sql.adaptive.enabled=false
spark.sql.autoBroadcastJoinThreshold=-1
{code}
Spark Plan says that cache is used
{code:scala}
== Physical Plan ==
Execute InsertIntoHadoopFsRelationCommand (27)
+- Coalesce (26)
   +- InMemoryTableScan (1)
         +- InMemoryRelation (2)
               +- Union (25)
                  :- * SortMergeJoin Inner (13)
                  :  :- * Sort (7)
                  :  :  +- Exchange (6)
                  :  :     +- * Project (5)
                  :  :        +- * Filter (4)
                  :  :           +- Scan csv  (3)
                  :  +- * Sort (12)
                  :     +- Exchange (11)
                  :        +- * Project (10)
                  :           +- * Filter (9)
                  :              +- Scan csv  (8)
                  +- * SortMergeJoin Inner (24)
                     :- * Sort (18)
                     :  +- Exchange (17)
                     :     +- * Project (16)
                     :        +- * Filter (15)
                     :           +- Scan csv  (14)
                     +- * Sort (23)
                        +- Exchange (22)
                           +- * Project (21)
                              +- * Filter (20)
                                 +- Scan csv  (19)
{code}
But when running on YARN, the csv job has shuffle reads.

!image-2023-12-01-09-27-39-463.png!

*Additional info*
 - I was unable to reproduce it with local Spark.
 - If {{.withColumnRenamed("id", tag + "_id")}} is dropped and the join 
conditions are changed to just {{{}"id"{}}}, the issue disappears!
 - This behaviour is stable - it's not a result of failed instances.

*Production impact*

Without cache saving data in production takes much longer (30 seconds vs 18 
seconds). To avoid shuffle reads, we had to add a {{repartition}} step before 
{{cache}} as a workaround, which reduced time from 18 seconds to 10.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46198) Unexpected Shuffle Read when using cached DataFrame

2023-11-30 Thread Vitaliy Savkin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitaliy Savkin updated SPARK-46198:
---
Attachment: shuffle.png

> Unexpected Shuffle Read when using cached DataFrame
> ---
>
> Key: SPARK-46198
> URL: https://issues.apache.org/jira/browse/SPARK-46198
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Vitaliy Savkin
>Priority: Major
> Attachments: shuffle.png
>
>
> When a computation is base on a cached data frames, I expect to see no 
> Shuffle Reads, but it happens under certain circumstances.
> *Reproduction*
> {code:scala}
> val ctx: SQLContext = // init context 
> val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce"
> def populateAndRead(tag: String): DataFrame = {
>   val path = s"$root/numbers_$tag"
> //  import ctx.implicits._
> //  import org.apache.spark.sql.functions.lit
> //  (0 to 10 * 1000 * 1000)
> //.toDF("id")
> //.withColumn(tag, lit(tag.toUpperCase))
> //.repartition(100)
> //.write
> //.option("header", "true")
> //.mode("ignore")
> //.csv(path)
>   ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag 
> + "_id")
> }
> val dfa = populateAndRead("a1")
> val dfb = populateAndRead("b1")
> val res =
>   dfa.join(dfb, dfa("a1_id") === dfb("b1_id"))
> .unionByName(dfa.join(dfb, dfa("a1") === dfb("b1")))
> .cache()
> println(res.count())
> res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers")
> {code}
> Relevant configs
> {code:scala}
> spark.executor.instances=10
> spark.executor.cores=7
> spark.executor.memory=40g
> spark.executor.memoryOverhead=5g
> spark.shuffle.service.enabled=true
> spark.sql.adaptive.enabled=false
> spark.sql.autoBroadcastJoinThreshold=-1
> {code}
> Spark Plan says that cache is used
> {code:scala}
> == Physical Plan ==
> Execute InsertIntoHadoopFsRelationCommand (27)
> +- Coalesce (26)
>    +- InMemoryTableScan (1)
>          +- InMemoryRelation (2)
>                +- Union (25)
>                   :- * SortMergeJoin Inner (13)
>                   :  :- * Sort (7)
>                   :  :  +- Exchange (6)
>                   :  :     +- * Project (5)
>                   :  :        +- * Filter (4)
>                   :  :           +- Scan csv  (3)
>                   :  +- * Sort (12)
>                   :     +- Exchange (11)
>                   :        +- * Project (10)
>                   :           +- * Filter (9)
>                   :              +- Scan csv  (8)
>                   +- * SortMergeJoin Inner (24)
>                      :- * Sort (18)
>                      :  +- Exchange (17)
>                      :     +- * Project (16)
>                      :        +- * Filter (15)
>                      :           +- Scan csv  (14)
>                      +- * Sort (23)
>                         +- Exchange (22)
>                            +- * Project (21)
>                               +- * Filter (20)
>                                  +- Scan csv  (19)
> {code}
> But when running on YARN, the csv job has shuffle reads.
> !image-2023-12-01-09-27-39-463.png!
> *Additional info*
>  - I was unable to reproduce it with local Spark.
>  - If {{.withColumnRenamed("id", tag + "_id")}} is dropped and the join 
> conditions are changed to just {{{}"id"{}}}, the issue disappears!
>  - This behaviour is stable - it's not a result of failed instances.
> *Production impact*
> Without cache saving data in production takes much longer (30 seconds vs 18 
> seconds). To avoid shuffle reads, we had to add a {{repartition}} step before 
> {{cache}} as a workaround, which reduced time from 18 seconds to 10.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46196) Add missing function descriptions

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46196:
---
Labels: pull-request-available  (was: )

> Add missing function descriptions
> -
>
> Key: SPARK-46196
> URL: https://issues.apache.org/jira/browse/SPARK-46196
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46196) Add missing function descriptions

2023-11-30 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46196:
-

 Summary: Add missing function descriptions
 Key: SPARK-46196
 URL: https://issues.apache.org/jira/browse/SPARK-46196
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33775) Suppress unimportant compilation warnings in Scala 2.13

2023-11-30 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791920#comment-17791920
 ] 

Snoot.io commented on SPARK-33775:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/44103

> Suppress unimportant compilation warnings in Scala 2.13 
> 
>
> Key: SPARK-33775
> URL: https://issues.apache.org/jira/browse/SPARK-33775
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.1.0
>
>
> Too many compilation warnings in Scala 2.13, add some `-Wconf:msg=regex` 
> rules to suppress unimportants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46192) failed to insert the table using the default value of union

2023-11-30 Thread zengxl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zengxl updated SPARK-46192:
---
Description: 
 

Obtain the following tables and data
{code:java}
create table test_spark(k string default null,v int default null) stored as orc;
create table test_spark_1(k string default null,v int default null) stored as 
orc;
insert into table test_spark_1 values('k1',1),('k2',2),('k3',3);
create table test_spark_2(k string default null,v int default null) stored as 
orc; 
insert into table test_spark_2 values('k3',3),('k4',4),('k5',5);

{code}
Execute the following SQL
{code:java}
insert into table test_spark (k) 
select k from test_spark_1
union
select k from test_spark_2 

{code}
exception:
{code:java}
23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
CatalogAndIdentifier
23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
CatalogAndIdentifier
23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
CatalogAndIdentifier
23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: 
i.userSpecifiedCols.size is 1
23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: 
i.userSpecifiedCols.size is 1
23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 
,resolved :1 , i.query 1
23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: here is 
ResolveUserSpecifiedColumns tableOutoyt: 2---nameToQueryExpr : 1Error in query: 
`default`.`test_spark` requires that the data to be inserted have the same 
number of columns as the target table: target table has 2 column(s) but the 
inserted data has 1 column(s), including 0 partition column(s) having constant 
value(s). {code}
 

  was:
 

Obtain the following tables and data
{code:java}
create table test_spark(k string default null,v int default null) stored as orc;
create table test_spark_1(k string default null,v int default null) stored as 
orc;
insert into table test_spark_1 values('k1',1),('k2',2),('k3',3);
create table test_spark_2(k string default null,v int default null) stored as 
orc; 
insert into table test_spark_2 values('k3',3),('k4',4),('k5',5);

{code}
Execute the following SQL
{code:java}
insert into table test_spark (k) 
select k from test_spark_1
union
select k from test_spark_2 

{code}
exception:
{code:java}
23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
CatalogAndIdentifier23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: 
here is CatalogAndIdentifier23/12/01 10:44:25 INFO 
HiveSessionStateBuilder$$anon$1: here is CatalogAndIdentifier23/12/01 10:44:26 
INFO Analyzer$ResolveUserSpecifiedColumns: i.userSpecifiedCols.size is 
123/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: 
i.userSpecifiedCols.size is 123/12/01 10:44:26 INFO 
Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 ,resolved :1 , i.query 
123/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: here is 
ResolveUserSpecifiedColumns tableOutoyt: 2---nameToQueryExpr : 1Error in query: 
`default`.`test_spark` requires that the data to be inserted have the same 
number of columns as the target table: target table has 2 column(s) but the 
inserted data has 1 column(s), including 0 partition column(s) having constant 
value(s). {code}
 


> failed to insert the table using the default value of union
> ---
>
> Key: SPARK-46192
> URL: https://issues.apache.org/jira/browse/SPARK-46192
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.4.1
>Reporter: zengxl
>Priority: Major
>
>  
> Obtain the following tables and data
> {code:java}
> create table test_spark(k string default null,v int default null) stored as 
> orc;
> create table test_spark_1(k string default null,v int default null) stored as 
> orc;
> insert into table test_spark_1 values('k1',1),('k2',2),('k3',3);
> create table test_spark_2(k string default null,v int default null) stored as 
> orc; 
> insert into table test_spark_2 values('k3',3),('k4',4),('k5',5);
> {code}
> Execute the following SQL
> {code:java}
> insert into table test_spark (k) 
> select k from test_spark_1
> union
> select k from test_spark_2 
> {code}
> exception:
> {code:java}
> 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
> CatalogAndIdentifier
> 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
> CatalogAndIdentifier
> 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
> CatalogAndIdentifier
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: 
> i.userSpecifiedCols.size is 1
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: 
> i.userSpecifiedCols.size is 1
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 
> ,resolved :1 , i.query 1
> 23/12/01 10:44:26 INFO 

[jira] [Created] (SPARK-46195) Supports parse multiple sql statements

2023-11-30 Thread melin (Jira)
melin created SPARK-46195:
-

 Summary: Supports parse multiple sql statements
 Key: SPARK-46195
 URL: https://issues.apache.org/jira/browse/SPARK-46195
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: melin


 
In the SqlBaseParser.g4 file, add the following code to support the parsing of 
multiple sql. select * from (select * from test), which resolves into two 
statements. Need to add alias
{code:java}
sqlStatements
: singleStatement* EOF
;

singleStatement
: statement SEMICOLON?
; {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46194) Clean up the TODO comments left in SPARK-33775

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46194:
---
Labels: pull-request-available  (was: )

> Clean up the TODO comments left in SPARK-33775
> --
>
> Key: SPARK-46194
> URL: https://issues.apache.org/jira/browse/SPARK-46194
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46194) Clean up the TODO comments left in SPARK-33775

2023-11-30 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-46194:
-
Summary: Clean up the TODO comments left in SPARK-33775  (was: Remove 
completed TODO(SPARK-33805) )

> Clean up the TODO comments left in SPARK-33775
> --
>
> Key: SPARK-46194
> URL: https://issues.apache.org/jira/browse/SPARK-46194
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46194) Remove completed TODO(SPARK-33805)

2023-11-30 Thread Yang Jie (Jira)
Yang Jie created SPARK-46194:


 Summary: Remove completed TODO(SPARK-33805) 
 Key: SPARK-46194
 URL: https://issues.apache.org/jira/browse/SPARK-46194
 Project: Spark
  Issue Type: Task
  Components: Build
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33805) Eliminate deprecated usage since Scala 2.13

2023-11-30 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-33805.
--
Resolution: Fixed

> Eliminate deprecated usage since Scala 2.13
> ---
>
> Key: SPARK-33805
> URL: https://issues.apache.org/jira/browse/SPARK-33805
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Minor
>
> SPARK-33775 Suppress compilation warnings about method, value, type, object, 
> trait, inheritance class deprecated usage since Scala 2.13 in SparkBuild.scala
>  
> We should fix them step by step, and then remove the suppression rules.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46193) Add PersistenceEngineBenchmark

2023-11-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46193:
-

Assignee: Dongjoon Hyun

> Add PersistenceEngineBenchmark
> --
>
> Key: SPARK-46193
> URL: https://issues.apache.org/jira/browse/SPARK-46193
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45629) Fix `Implicit definition should have explicit type`

2023-11-30 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45629.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43526
[https://github.com/apache/spark/pull/43526]

> Fix `Implicit definition should have explicit type`
> ---
>
> Key: SPARK-45629
> URL: https://issues.apache.org/jira/browse/SPARK-45629
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: tangjiafu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala:343:16:
>  Implicit definition should have explicit type (inferred 
> org.json4s.DefaultFormats.type) [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=other-implicit-type, 
> site=org.apache.spark.deploy.TestMasterInfo.formats
> [error]   implicit val formats = org.json4s.DefaultFormats
> [error]   {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46193) Add PersistenceEngineBenchmark

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46193:
---
Labels: pull-request-available  (was: )

> Add PersistenceEngineBenchmark
> --
>
> Key: SPARK-46193
> URL: https://issues.apache.org/jira/browse/SPARK-46193
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46193) Add PersistenceEngineBenchmark

2023-11-30 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46193:
-

 Summary: Add PersistenceEngineBenchmark
 Key: SPARK-46193
 URL: https://issues.apache.org/jira/browse/SPARK-46193
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46192) failed to insert the table using the default value of union

2023-11-30 Thread zengxl (Jira)
zengxl created SPARK-46192:
--

 Summary: failed to insert the table using the default value of 
union
 Key: SPARK-46192
 URL: https://issues.apache.org/jira/browse/SPARK-46192
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1, 3.4.0
Reporter: zengxl


 

Obtain the following tables and data
{code:java}
create table test_spark(k string default null,v int default null) stored as orc;
create table test_spark_1(k string default null,v int default null) stored as 
orc;
insert into table test_spark_1 values('k1',1),('k2',2),('k3',3);
create table test_spark_2(k string default null,v int default null) stored as 
orc; 
insert into table test_spark_2 values('k3',3),('k4',4),('k5',5);

{code}
Execute the following SQL
{code:java}
insert into table test_spark (k) 
select k from test_spark_1
union
select k from test_spark_2 

{code}
exception:
{code:java}
23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
CatalogAndIdentifier23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: 
here is CatalogAndIdentifier23/12/01 10:44:25 INFO 
HiveSessionStateBuilder$$anon$1: here is CatalogAndIdentifier23/12/01 10:44:26 
INFO Analyzer$ResolveUserSpecifiedColumns: i.userSpecifiedCols.size is 
123/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: 
i.userSpecifiedCols.size is 123/12/01 10:44:26 INFO 
Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 ,resolved :1 , i.query 
123/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: here is 
ResolveUserSpecifiedColumns tableOutoyt: 2---nameToQueryExpr : 1Error in query: 
`default`.`test_spark` requires that the data to be inserted have the same 
number of columns as the target table: target table has 2 column(s) but the 
inserted data has 1 column(s), including 0 partition column(s) having constant 
value(s). {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46189) Various Pandas functions fail in interpreted mode

2023-11-30 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-46189:
--
Fix Version/s: 3.4.3

> Various Pandas functions fail in interpreted mode
> -
>
> Key: SPARK-46189
> URL: https://issues.apache.org/jira/browse/SPARK-46189
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1, 3.4.3
>
>
> Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and 
> {{stddev}}) fail with an unboxing-related exception when run in interpreted 
> mode.
> Here are some reproduction cases for pyspark interactive mode:
> {noformat}
> spark.sql("set spark.sql.codegen.wholeStage=false")
> spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> import numpy as np
> import pandas as pd
> import pyspark.pandas as ps
> pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a")
> psser = ps.from_pandas(pser)
> # each of the following actions gets an unboxing error
> psser.kurt()
> psser.var()
> psser.skew()
> # set up for covariance test
> pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"])
> psdf = ps.from_pandas(pdf)
> # this gets an unboxing error
> psdf.cov()
> # set up for stddev resr
> from pyspark.pandas.spark import functions as SF
> from pyspark.sql.functions import col
> from pyspark.sql import Row
> df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), 
> Row(a=8)])
> # this gets an unboxing error
> df.select(SF.stddev(col("a"), 1)).collect()
> {noformat}
> Exception from the first case ({{psser.kurt()}}) is
> {noformat}
> java.lang.ClassCastException: class java.lang.Integer cannot be cast to class 
> java.lang.Double (java.lang.Integer and java.lang.Double are in module 
> java.base of loader 'bootstrap')
>   at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112)
>   at 
> org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184)
>   at scala.math.Ordering.lt(Ordering.scala:98)
>   at scala.math.Ordering.lt$(Ordering.scala:98)
>   at 
> org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184)
>   at 
> org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46189) Various Pandas functions fail in interpreted mode

2023-11-30 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-46189:
-

Assignee: Bruce Robbins

> Various Pandas functions fail in interpreted mode
> -
>
> Key: SPARK-46189
> URL: https://issues.apache.org/jira/browse/SPARK-46189
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
>
> Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and 
> {{stddev}}) fail with an unboxing-related exception when run in interpreted 
> mode.
> Here are some reproduction cases for pyspark interactive mode:
> {noformat}
> spark.sql("set spark.sql.codegen.wholeStage=false")
> spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> import numpy as np
> import pandas as pd
> import pyspark.pandas as ps
> pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a")
> psser = ps.from_pandas(pser)
> # each of the following actions gets an unboxing error
> psser.kurt()
> psser.var()
> psser.skew()
> # set up for covariance test
> pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"])
> psdf = ps.from_pandas(pdf)
> # this gets an unboxing error
> psdf.cov()
> # set up for stddev resr
> from pyspark.pandas.spark import functions as SF
> from pyspark.sql.functions import col
> from pyspark.sql import Row
> df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), 
> Row(a=8)])
> # this gets an unboxing error
> df.select(SF.stddev(col("a"), 1)).collect()
> {noformat}
> Exception from the first case ({{psser.kurt()}}) is
> {noformat}
> java.lang.ClassCastException: class java.lang.Integer cannot be cast to class 
> java.lang.Double (java.lang.Integer and java.lang.Double are in module 
> java.base of loader 'bootstrap')
>   at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112)
>   at 
> org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184)
>   at scala.math.Ordering.lt(Ordering.scala:98)
>   at scala.math.Ordering.lt$(Ordering.scala:98)
>   at 
> org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184)
>   at 
> org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46189) Various Pandas functions fail in interpreted mode

2023-11-30 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-46189.
---
Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 44099
[https://github.com/apache/spark/pull/44099]

> Various Pandas functions fail in interpreted mode
> -
>
> Key: SPARK-46189
> URL: https://issues.apache.org/jira/browse/SPARK-46189
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0
>
>
> Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and 
> {{stddev}}) fail with an unboxing-related exception when run in interpreted 
> mode.
> Here are some reproduction cases for pyspark interactive mode:
> {noformat}
> spark.sql("set spark.sql.codegen.wholeStage=false")
> spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> import numpy as np
> import pandas as pd
> import pyspark.pandas as ps
> pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a")
> psser = ps.from_pandas(pser)
> # each of the following actions gets an unboxing error
> psser.kurt()
> psser.var()
> psser.skew()
> # set up for covariance test
> pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"])
> psdf = ps.from_pandas(pdf)
> # this gets an unboxing error
> psdf.cov()
> # set up for stddev resr
> from pyspark.pandas.spark import functions as SF
> from pyspark.sql.functions import col
> from pyspark.sql import Row
> df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), 
> Row(a=8)])
> # this gets an unboxing error
> df.select(SF.stddev(col("a"), 1)).collect()
> {noformat}
> Exception from the first case ({{psser.kurt()}}) is
> {noformat}
> java.lang.ClassCastException: class java.lang.Integer cannot be cast to class 
> java.lang.Double (java.lang.Integer and java.lang.Double are in module 
> java.base of loader 'bootstrap')
>   at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112)
>   at 
> org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184)
>   at scala.math.Ordering.lt(Ordering.scala:98)
>   at scala.math.Ordering.lt$(Ordering.scala:98)
>   at 
> org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184)
>   at 
> org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46191) Improve `FileSystemPersistenceEngine.persist` error message in case of the existing file

2023-11-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46191.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44100
[https://github.com/apache/spark/pull/44100]

> Improve `FileSystemPersistenceEngine.persist` error message in case of the 
> existing file
> 
>
> Key: SPARK-46191
> URL: https://issues.apache.org/jira/browse/SPARK-46191
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46135) Fix table format error in ipynb docs

2023-11-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46135:


Assignee: BingKun Pan

> Fix table format error in ipynb docs
> 
>
> Key: SPARK-46135
> URL: https://issues.apache.org/jira/browse/SPARK-46135
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46135) Fix table format error in ipynb docs

2023-11-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46135.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44049
[https://github.com/apache/spark/pull/44049]

> Fix table format error in ipynb docs
> 
>
> Key: SPARK-46135
> URL: https://issues.apache.org/jira/browse/SPARK-46135
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46191) Improve `FileSystemPersistenceEngine.persist` error message in case of the existing file

2023-11-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46191:
-

Assignee: Dongjoon Hyun

> Improve `FileSystemPersistenceEngine.persist` error message in case of the 
> existing file
> 
>
> Key: SPARK-46191
> URL: https://issues.apache.org/jira/browse/SPARK-46191
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42551) Support more subexpression elimination cases

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42551:
---
Labels: pull-request-available  (was: )

> Support more subexpression elimination cases
> 
>
> Key: SPARK-42551
> URL: https://issues.apache.org/jira/browse/SPARK-42551
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Wan Kun
>Priority: Major
>  Labels: pull-request-available
>
> h1. *Design Sketch*
> h2. How to support more subexpressions elimination cases
>  * Get all common expressions from input expressions of the current physical 
> operator to current CodeGenContext. Recursively visits all subexpressions 
> regardless of whether the current expression is a conditional expression.
>  * For each common expression:
>  ** Add a new boolean variable *subExprInit* to indicate whether it has  
> already been evaluated. 
>  ** Add a new code block in CodeGenSupport trait, and reset those 
> *subExprInit* variables to *false* before the physical operators begin to 
> evaluate the input row.
>  ** Add a new wrapper subExpr function for each common subexpression.
> |private void subExpr_n(${argList}) {
>  if (!subExprInit) {
>    ${eval.code}
>    subExprInit_n = true;
>    subExprIsNull_n = ${eval.isNull};
>    subExprValue_n = ${eval.value};
>  }
> }|
>  
>  * When generating the input expression code,  if the input expression is a 
> common expression, the expression code will be replaced with the 
> corresponding subExpr function. When the subExpr function is called for the 
> first time, *subExprInit* will be set to true, and the subsequent function 
> calls will do nothing.
> h2. Why should we support whole-stage subexpression elimination
> Right now each spark physical operator shares nothing but the input row, so 
> the same expressions may be evaluated multiple times across different 
> operators. For example, the expression udf(c1, c2) in plan Project [udf(c1, 
> c2)] - Filter [udf(c1, c2) > 0] - Relation will be evaluated both in Project 
> and Filter operators.  We can reuse the expression results across different 
> operators such as Project and Filter.
> h2. How to support whole-stage subexpression elimination
>  * Add two properties in CodegenSupport trait, the reusable expressions and 
> the the output attributes, we can reuse the expression results only if the 
> output attributes are the same.
>  * Visit all operators from top to bottom, bound the candidate expressions 
> with the output attributes and add to the current candidate reusable 
> expressions.
>  * Visit all operators from bottom to top, collect all the common expressions 
> to the current operator, and add the initialize code to the current operator 
> if the common expressions have not been initialized.
>  * Replace the common expressions code when generating codes for  the 
> physical operators.
> h1. *New support subexpression elimination patterns*
>  * 
> h2. *Support subexpression elimination with conditional expressions*
> {code:java}
> SELECT case when v + 2 > 1 then 1
> when v + 1 > 2 then 2
> when v + 1 > 3 then 3 END vv
> FROM values(1) as t2(v)
> {code}
> We can reuse the result of expression  *v + 1*
> {code:java}
> SELECT a, max(if(a > 0, b + c, null)) max_bc, min(if(a > 1, b + c, null)) 
> min_bc
> FROM values(1, 1, 1) as t(a, b, c)
> GROUP BY a
> {code}
> We can reuse the result of expression  b + c
>  * 
> h2. *Support subexpression elimination in FilterExec*
>  
> {code:java}
> SELECT * FROM (
>   SELECT v * v + 1 v1 from values(1) as t2(v)
> ) t
> where v1 > 5 and v1 < 10
> {code}
> We can reuse the result of expression  *v* * *v* *+* *1*
>  * 
> h2. *Support subexpression elimination in JoinExec*
>  
> {code:java}
> SELECT * 
> FROM values(1, 1) as t1(a, b) 
> join values(1, 2) as t2(x, y)
> ON b * y between 2 and 3{code}
>  
> We can reuse the result of expression  *b* * *y*
>  * 
> h2. *Support subexpression elimination in ExpandExec*
> {code:java}
> SELECT a, count(b),
>   count(distinct case when b > 1 then b + c else null end) as count_bc_1,
>   count(distinct case when b < 0 then b + c else null end) as count_bc_2
> FROM values(1, 1, 1) as t(a, b, c)
> GROUP BY a
> {code}
> We can reuse the result of expression  b + c



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43403) GET /history//1/jobs/ failed: java.lang.IllegalStateException: DB is closed

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43403:
---
Labels: pull-request-available  (was: )

> GET /history//1/jobs/ failed: java.lang.IllegalStateException: DB is 
> closed
> --
>
> Key: SPARK-43403
> URL: https://issues.apache.org/jira/browse/SPARK-43403
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Zhou Yifan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-05-08-11-33-13-634.png
>
>
> !image-2023-05-08-11-33-13-634.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44773) Code-gen CodegenFallback expression in WholeStageCodegen if possible

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44773:
---
Labels: pull-request-available  (was: )

> Code-gen CodegenFallback expression in WholeStageCodegen if possible
> 
>
> Key: SPARK-44773
> URL: https://issues.apache.org/jira/browse/SPARK-44773
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Wan Kun
>Priority: Major
>  Labels: pull-request-available
>
> Now both WholeStageCodegen framework and SubExpressionElimination framework 
> does not support CodegenFallback expression, but the CodegenFallback 
> expression which contains nullSafeEval method could gen-code just like common 
> expressions, now they are always be executed in a new 
> SpecificUnsafeProjection class, and we can not eliminate the sub expressions.
> For example:
> SQL:
> {code:sql}
> SELECT from_json(regexp_replace(s, 'a', 'x'), 'x INT, b DOUBLE').x,
>from_json(regexp_replace(s, 'a', 'x'), 'x INT, b DOUBLE').b
> FROM values('{"a":1, "b":0.8}') t(s)
> {code}
> plan:
> {code:java}
> *(1) Project [from_json(StructField(x,IntegerType,true), 
> regexp_replace(s#218, a, x, 1), Some(America/Los_Angeles)).x AS 
> from_json(regexp_replace(s, a, x, 1)).x#219, 
> from_json(StructField(b,DoubleType,true), regexp_replace(s#218, a, x, 1), 
> Some(America/Los_Angeles)).b AS from_json(regexp_replace(s, a, x, 1)).b#220]
> +- *(1) LocalTableScan [s#218]
> {code}
> Due to expression org.apache.spark.sql.catalyst.expressions.JsonToStructs is 
> CodegenFallback expression, so we can not reuse the result of 
> {*}regexp_replace(s, 'a', 'x'){*}.
> We can support expression 
> org.apache.spark.sql.catalyst.expressions.JsonToStructs code-gen in 
> WholeStageCodegen framework, and then reuse the result of 
> {*}regexp_replace(s, 'a', 'x'){*}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46191) Improve `FileSystemPersistenceEngine.persist` error message in case of the existing file

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46191:
---
Labels: pull-request-available  (was: )

> Improve `FileSystemPersistenceEngine.persist` error message in case of the 
> existing file
> 
>
> Key: SPARK-46191
> URL: https://issues.apache.org/jira/browse/SPARK-46191
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46191) Improve `FileSystemPersistenceEngine.persist` error message in case of the existing file

2023-11-30 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46191:
-

 Summary: Improve `FileSystemPersistenceEngine.persist` error 
message in case of the existing file
 Key: SPARK-46191
 URL: https://issues.apache.org/jira/browse/SPARK-46191
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45940) Add InputPartition to DataSourceReader interface

2023-11-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45940:


Assignee: Allison Wang

> Add InputPartition to DataSourceReader interface
> 
>
> Key: SPARK-45940
> URL: https://issues.apache.org/jira/browse/SPARK-45940
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Add InputPartition class and make the partitions method return a list of 
> input partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45940) Add InputPartition to DataSourceReader interface

2023-11-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45940.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44085
[https://github.com/apache/spark/pull/44085]

> Add InputPartition to DataSourceReader interface
> 
>
> Key: SPARK-45940
> URL: https://issues.apache.org/jira/browse/SPARK-45940
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add InputPartition class and make the partitions method return a list of 
> input partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46189) Various Pandas functions fail in interpreted mode

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46189:
---
Labels: pull-request-available  (was: )

> Various Pandas functions fail in interpreted mode
> -
>
> Key: SPARK-46189
> URL: https://issues.apache.org/jira/browse/SPARK-46189
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
>
> Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and 
> {{stddev}}) fail with an unboxing-related exception when run in interpreted 
> mode.
> Here are some reproduction cases for pyspark interactive mode:
> {noformat}
> spark.sql("set spark.sql.codegen.wholeStage=false")
> spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> import numpy as np
> import pandas as pd
> import pyspark.pandas as ps
> pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a")
> psser = ps.from_pandas(pser)
> # each of the following actions gets an unboxing error
> psser.kurt()
> psser.var()
> psser.skew()
> # set up for covariance test
> pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"])
> psdf = ps.from_pandas(pdf)
> # this gets an unboxing error
> psdf.cov()
> # set up for stddev resr
> from pyspark.pandas.spark import functions as SF
> from pyspark.sql.functions import col
> from pyspark.sql import Row
> df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), 
> Row(a=8)])
> # this gets an unboxing error
> df.select(SF.stddev(col("a"), 1)).collect()
> {noformat}
> Exception from the first case ({{psser.kurt()}}) is
> {noformat}
> java.lang.ClassCastException: class java.lang.Integer cannot be cast to class 
> java.lang.Double (java.lang.Integer and java.lang.Double are in module 
> java.base of loader 'bootstrap')
>   at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112)
>   at 
> org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184)
>   at scala.math.Ordering.lt(Ordering.scala:98)
>   at scala.math.Ordering.lt$(Ordering.scala:98)
>   at 
> org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184)
>   at 
> org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46188) Fix the CSS of Spark doc's generated tables

2023-11-30 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-46188:
---
Fix Version/s: 4.0.0

> Fix the CSS of Spark doc's generated tables
> ---
>
> Key: SPARK-46188
> URL: https://issues.apache.org/jira/browse/SPARK-46188
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
> Attachments: image-2023-11-30-13-11-01-796.png
>
>
> After [https://github.com/apache/spark/pull/40269], there is no boarder in 
> the generated tables of Spark doc.  We should fix it.
> !image-2023-11-30-13-11-01-796.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46188) Fix the CSS of Spark doc's generated tables

2023-11-30 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-46188.

Fix Version/s: 3.5.1
   Resolution: Fixed

Issue resolved by pull request 44097
[https://github.com/apache/spark/pull/44097]

> Fix the CSS of Spark doc's generated tables
> ---
>
> Key: SPARK-46188
> URL: https://issues.apache.org/jira/browse/SPARK-46188
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1
>
> Attachments: image-2023-11-30-13-11-01-796.png
>
>
> After [https://github.com/apache/spark/pull/40269], there is no boarder in 
> the generated tables of Spark doc.  We should fix it.
> !image-2023-11-30-13-11-01-796.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46190) ANSI Double quoted identifiers do not work in Python threads

2023-11-30 Thread Max Payson (Jira)
Max Payson created SPARK-46190:
--

 Summary: ANSI Double quoted identifiers do not work in Python 
threads
 Key: SPARK-46190
 URL: https://issues.apache.org/jira/browse/SPARK-46190
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.5.0, 3.4.1, 3.4.0
Reporter: Max Payson


Enabling and using `spark.sql.ansi.doubleQuotedIdentifiers` does not work 
correctly in Python threads

The following example shows how applying a filter, "\"status\" = 'Unchanged'", 
leads to empty results when run in a thread. I believe this is because the 
"status" field is interpreted as a literal in the thread, but as an attribute 
outside of it.
{code:python}
from concurrent import futures
from pyspark import sql

spark = (
  sql.SparkSession.builder.master("local[*]")
  .config("spark.sql.ansi.enabled", "true")
  .config("spark.sql.ansi.doubleQuotedIdentifiers", "true")
  .getOrCreate()
)

def demonstrate_issue(spark):
  # Path to JSON file with contents:
  # [{"status": "Unchanged"}, {"status": "Changed"}]
  df = spark.read.json("data/example.json")
  df.filter("\"status\" = 'Unchanged'").show()

# Shows 1 record, expected
demonstrate_issue(spark)

with futures.ThreadPoolExecutor(1) as executor:
  # Shows 0 records, unexpected
  executor.submit(demonstrate_issue, spark)
 {code}
 

Additional testing notes:
 * When parsing the expression with `sql.functions.expr` in Java via Py4J, the 
"status" field is interpreted as a literal value from the thread, not an 
attribute
 * Using double quotes with `spark.sql` does work in the thread
 * Using a dataframe created in memory does work in the thread
 * Tested in versions 3.4.0, 3.4.1, & 3.5.0 on Windows and Mac

 

The original PR that added this option is here: 
[https://github.com/apache/spark/pull/38022]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46189) Various Pandas functions fail in interpreted mode

2023-11-30 Thread Bruce Robbins (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-46189:
--
Description: 
Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and {{stddev}}) 
fail with an unboxing-related exception when run in interpreted mode.

Here are some reproduction cases for pyspark interactive mode:
{noformat}
spark.sql("set spark.sql.codegen.wholeStage=false")
spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")

import numpy as np
import pandas as pd

import pyspark.pandas as ps

pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a")
psser = ps.from_pandas(pser)

# each of the following actions gets an unboxing error
psser.kurt()
psser.var()
psser.skew()

# set up for covariance test
pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"])
psdf = ps.from_pandas(pdf)

# this gets an unboxing error
psdf.cov()

# set up for stddev resr
from pyspark.pandas.spark import functions as SF
from pyspark.sql.functions import col
from pyspark.sql import Row
df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), 
Row(a=8)])

# this gets an unboxing error
df.select(SF.stddev(col("a"), 1)).collect()
{noformat}
Exception from the first case ({{psser.kurt()}}) is
{noformat}
java.lang.ClassCastException: class java.lang.Integer cannot be cast to class 
java.lang.Double (java.lang.Integer and java.lang.Double are in module 
java.base of loader 'bootstrap')
at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112)
at 
org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184)
at scala.math.Ordering.lt(Ordering.scala:98)
at scala.math.Ordering.lt$(Ordering.scala:98)
at 
org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184)
at 
org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196)
{noformat}

  was:
Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and {{stddev}}) 
fail with an unboxing-related exception when run in interpreted mode.

Here are some reproduction cases for pyspark interactive mode:
{noformat}
sql("set spark.sql.codegen.wholeStage=false")
spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")

import numpy as np
import pandas as pd

import pyspark.pandas as ps

pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a")
psser = ps.from_pandas(pser)

# each of the following actions gets an unboxing error
psser.kurt()
psser.var()
psser.skew()

# set up for covariance test
pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"])
psdf = ps.from_pandas(pdf)

# this gets an unboxing error
psdf.cov()

# set up for stddev resr
from pyspark.pandas.spark import functions as SF
from pyspark.sql.functions import col
from pyspark.sql import Row
df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), 
Row(a=8)])

# this gets an unboxing error
df.select(SF.stddev(col("a"), 1)).collect()
{noformat}
Exception from the first case ({{psser.kurt()}}) is
{noformat}
java.lang.ClassCastException: class java.lang.Integer cannot be cast to class 
java.lang.Double (java.lang.Integer and java.lang.Double are in module 
java.base of loader 'bootstrap')
at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112)
at 
org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184)
at scala.math.Ordering.lt(Ordering.scala:98)
at scala.math.Ordering.lt$(Ordering.scala:98)
at 
org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184)
at 
org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196)
{noformat}


> Various Pandas functions fail in interpreted mode
> -
>
> Key: SPARK-46189
> URL: https://issues.apache.org/jira/browse/SPARK-46189
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>
> Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and 
> {{stddev}}) fail with an unboxing-related exception when run in interpreted 
> mode.
> Here are some reproduction cases for pyspark interactive mode:
> {noformat}
> spark.sql("set spark.sql.codegen.wholeStage=false")
> spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> import numpy as np
> import pandas as pd
> import pyspark.pandas as ps
> pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a")
> psser = ps.from_pandas(pser)
> # each of the following actions gets an unboxing error
> psser.kurt()
> psser.var()
> psser.skew()
> # set up for 

[jira] [Created] (SPARK-46189) Various Pandas functions fail in interpreted mode

2023-11-30 Thread Bruce Robbins (Jira)
Bruce Robbins created SPARK-46189:
-

 Summary: Various Pandas functions fail in interpreted mode
 Key: SPARK-46189
 URL: https://issues.apache.org/jira/browse/SPARK-46189
 Project: Spark
  Issue Type: Bug
  Components: Pandas API on Spark, SQL
Affects Versions: 3.5.0, 3.4.1
Reporter: Bruce Robbins


Various Pandas functions ({{kurt}}, {{var}}, {{skew}}, {{cov}}, and {{stddev}}) 
fail with an unboxing-related exception when run in interpreted mode.

Here are some reproduction cases for pyspark interactive mode:
{noformat}
sql("set spark.sql.codegen.wholeStage=false")
spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")

import numpy as np
import pandas as pd

import pyspark.pandas as ps

pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a")
psser = ps.from_pandas(pser)

# each of the following actions gets an unboxing error
psser.kurt()
psser.var()
psser.skew()

# set up for covariance test
pdf = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], columns=["a", "b"])
psdf = ps.from_pandas(pdf)

# this gets an unboxing error
psdf.cov()

# set up for stddev resr
from pyspark.pandas.spark import functions as SF
from pyspark.sql.functions import col
from pyspark.sql import Row
df = spark.createDataFrame([Row(a=1), Row(a=2), Row(a=3), Row(a=7), Row(a=9), 
Row(a=8)])

# this gets an unboxing error
df.select(SF.stddev(col("a"), 1)).collect()
{noformat}
Exception from the first case ({{psser.kurt()}}) is
{noformat}
java.lang.ClassCastException: class java.lang.Integer cannot be cast to class 
java.lang.Double (java.lang.Integer and java.lang.Double are in module 
java.base of loader 'bootstrap')
at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112)
at 
org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.compare(PhysicalDataType.scala:184)
at scala.math.Ordering.lt(Ordering.scala:98)
at scala.math.Ordering.lt$(Ordering.scala:98)
at 
org.apache.spark.sql.catalyst.types.PhysicalDoubleType$$anonfun$2.lt(PhysicalDataType.scala:184)
at 
org.apache.spark.sql.catalyst.expressions.LessThan.nullSafeEval(predicates.scala:1196)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46188) Fix the CSS of Spark doc's generated tables

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46188:
---
Labels: pull-request-available  (was: )

> Fix the CSS of Spark doc's generated tables
> ---
>
> Key: SPARK-46188
> URL: https://issues.apache.org/jira/browse/SPARK-46188
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-11-30-13-11-01-796.png
>
>
> After [https://github.com/apache/spark/pull/40269], there is no boarder in 
> the generated tables of Spark doc.  We should fix it.
> !image-2023-11-30-13-11-01-796.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46188) Fix the CSS of Spark doc's generated tables

2023-11-30 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-46188:
---
Description: 
After [https://github.com/apache/spark/pull/40269], there is no boarder in the 
generated tables of Spark doc.  We should fix it.

!image-2023-11-30-13-11-01-796.png!

  was:
After https://github.com/apache/spark/pull/40269, there is no boarder in the 
generated tables of Spark doc.  We should fix it.

!image-2023-11-30-13-10-03-875.png!


> Fix the CSS of Spark doc's generated tables
> ---
>
> Key: SPARK-46188
> URL: https://issues.apache.org/jira/browse/SPARK-46188
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Attachments: image-2023-11-30-13-11-01-796.png
>
>
> After [https://github.com/apache/spark/pull/40269], there is no boarder in 
> the generated tables of Spark doc.  We should fix it.
> !image-2023-11-30-13-11-01-796.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46188) Fix the CSS of Spark doc's generated tables

2023-11-30 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-46188:
---
Attachment: image-2023-11-30-13-11-01-796.png

> Fix the CSS of Spark doc's generated tables
> ---
>
> Key: SPARK-46188
> URL: https://issues.apache.org/jira/browse/SPARK-46188
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Attachments: image-2023-11-30-13-11-01-796.png
>
>
> After https://github.com/apache/spark/pull/40269, there is no boarder in the 
> generated tables of Spark doc.  We should fix it.
> !image-2023-11-30-13-10-03-875.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46188) Fix the CSS of Spark doc's generated tables

2023-11-30 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-46188:
--

 Summary: Fix the CSS of Spark doc's generated tables
 Key: SPARK-46188
 URL: https://issues.apache.org/jira/browse/SPARK-46188
 Project: Spark
  Issue Type: Task
  Components: Documentation
Affects Versions: 4.0.0, 3.5.1
Reporter: Gengliang Wang
Assignee: Gengliang Wang


After https://github.com/apache/spark/pull/40269, there is no boarder in the 
generated tables of Spark doc.  We should fix it.

!image-2023-11-30-13-10-03-875.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45315) Drop JDK 8/11 and make JDK 17 by default

2023-11-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45315.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> Drop JDK 8/11 and make JDK 17 by default
> 
>
> Key: SPARK-45315
> URL: https://issues.apache.org/jira/browse/SPARK-45315
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Critical
>  Labels: releasenotes
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46186) Invalid Spark Connect execution state transition if interrupted before thread started

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46186:
---
Labels: pull-request-available  (was: )

> Invalid Spark Connect execution state transition if interrupted before thread 
> started
> -
>
> Key: SPARK-46186
> URL: https://issues.apache.org/jira/browse/SPARK-46186
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
>
> Fix edge case where interrupting execution before the ExecuteThreadRunner 
> started could lead to illegal state transition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38473) Use error classes in org.apache.spark.scheduler

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-38473:
---
Labels: pull-request-available  (was: )

> Use error classes in org.apache.spark.scheduler
> ---
>
> Key: SPARK-38473
> URL: https://issues.apache.org/jira/browse/SPARK-38473
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3

2023-11-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44573.
---
Resolution: Invalid

Thank you for the confirmation, [~siddaraju.g.c].

BTW, Apache Spark 3.4.2 is released today with several correctness patches.
- https://spark.apache.org/releases/spark-release-3-4-2.html

> Couldn't submit Spark application to Kubenetes in versions v1.27.3
> --
>
> Key: SPARK-44573
> URL: https://issues.apache.org/jira/browse/SPARK-44573
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.4.1
>Reporter: Siddaraju G C
>Priority: Major
>
> Spark-submit ( cluster mode on Kubernetes ) results error 
> *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s 
> cluster.
> Steps followed:
>  * using IBM cloud, created 3 Instances
>  * 1st Instance act as master node and another two acts as worker nodes
>  
> {noformat}
> root@vsi-spark-master:/opt# kubectl get nodes
> NAME                 STATUS   ROLES                  AGE   VERSION
> vsi-spark-master     Ready    control-plane,master   2d    v1.27.3+k3s1
> vsi-spark-worker-1   Ready                     47h   v1.27.3+k3s1
> vsi-spark-worker-2   Ready                     47h   
> v1.27.3+k3s1{noformat}
>  * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder 
>  * Ran spark by using below command
>  
> {noformat}
> root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master 
> k8s://http://:6443 --conf 
> spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode 
> cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf 
> spark.executor.instances=5 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark  --conf 
> spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB 
> local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat}
>  * And getting below error message.
> {noformat}
> 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
> 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified 
> a krb5.conf file locally or via a ConfigMap. Make sure that you have the 
> krb5.conf locally on the driver image.
> 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" 
> first. It should be yes.
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93)
>     at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244)
>     at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216)
>     at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>     at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.IOException: Connection reset
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535)
>     at 
> 

[jira] [Closed] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3

2023-11-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-44573.
-

> Couldn't submit Spark application to Kubenetes in versions v1.27.3
> --
>
> Key: SPARK-44573
> URL: https://issues.apache.org/jira/browse/SPARK-44573
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.4.1
>Reporter: Siddaraju G C
>Priority: Major
>
> Spark-submit ( cluster mode on Kubernetes ) results error 
> *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s 
> cluster.
> Steps followed:
>  * using IBM cloud, created 3 Instances
>  * 1st Instance act as master node and another two acts as worker nodes
>  
> {noformat}
> root@vsi-spark-master:/opt# kubectl get nodes
> NAME                 STATUS   ROLES                  AGE   VERSION
> vsi-spark-master     Ready    control-plane,master   2d    v1.27.3+k3s1
> vsi-spark-worker-1   Ready                     47h   v1.27.3+k3s1
> vsi-spark-worker-2   Ready                     47h   
> v1.27.3+k3s1{noformat}
>  * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder 
>  * Ran spark by using below command
>  
> {noformat}
> root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master 
> k8s://http://:6443 --conf 
> spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode 
> cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf 
> spark.executor.instances=5 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark  --conf 
> spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB 
> local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat}
>  * And getting below error message.
> {noformat}
> 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
> 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified 
> a krb5.conf file locally or via a ConfigMap. Make sure that you have the 
> krb5.conf locally on the driver image.
> 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" 
> first. It should be yes.
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93)
>     at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244)
>     at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216)
>     at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>     at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.IOException: Connection reset
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:349)
>     at 
> 

[jira] [Commented] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3

2023-11-30 Thread Siddaraju G C (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791660#comment-17791660
 ] 

Siddaraju G C commented on SPARK-44573:
---

[~dongjoon] After pointing to correct IKS cluster endpoint, Spark is doing good.
We can close this ticket for now.

> Couldn't submit Spark application to Kubenetes in versions v1.27.3
> --
>
> Key: SPARK-44573
> URL: https://issues.apache.org/jira/browse/SPARK-44573
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.4.1
>Reporter: Siddaraju G C
>Priority: Major
>
> Spark-submit ( cluster mode on Kubernetes ) results error 
> *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s 
> cluster.
> Steps followed:
>  * using IBM cloud, created 3 Instances
>  * 1st Instance act as master node and another two acts as worker nodes
>  
> {noformat}
> root@vsi-spark-master:/opt# kubectl get nodes
> NAME                 STATUS   ROLES                  AGE   VERSION
> vsi-spark-master     Ready    control-plane,master   2d    v1.27.3+k3s1
> vsi-spark-worker-1   Ready                     47h   v1.27.3+k3s1
> vsi-spark-worker-2   Ready                     47h   
> v1.27.3+k3s1{noformat}
>  * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder 
>  * Ran spark by using below command
>  
> {noformat}
> root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master 
> k8s://http://:6443 --conf 
> spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode 
> cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf 
> spark.executor.instances=5 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark  --conf 
> spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB 
> local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat}
>  * And getting below error message.
> {noformat}
> 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
> 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified 
> a krb5.conf file locally or via a ConfigMap. Make sure that you have the 
> krb5.conf locally on the driver image.
> 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" 
> first. It should be yes.
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93)
>     at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244)
>     at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216)
>     at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>     at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.IOException: Connection reset
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558)
>     

[jira] [Commented] (SPARK-37358) Spark-on-K8S: Allow disabling of resources.limits.memory in executor pod spec

2023-11-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-37358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791638#comment-17791638
 ] 

Björn Boschman commented on SPARK-37358:


anybody ever looking into this?

we can provide a patch

> Spark-on-K8S: Allow disabling of resources.limits.memory in executor pod spec
> -
>
> Key: SPARK-37358
> URL: https://issues.apache.org/jira/browse/SPARK-37358
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Andrew de Quincey
>Priority: Major
>
> When spark creates an executor pod on my Kubernetes cluster, it adds the 
> following resources definition:
> {{      resources:}}
> {{        limits:}}
> {{          memory: 896Mi}}
> {{        requests:}}
> {{          cpu: '4'}}
> {{          memory: 896Mi}}
> Note that resources.limits.cpu is not set. This is controlled by the 
> spark.kubernetes.driver.limit.cores setting (which we intentionally do not 
> set).
> We'd like to be able to omit the resources.limit.memory setting as well to 
> let the spark worker expand its memory as necessary.
> However, this isn't possible. The scala code in 
> BasicExecutorFeatureStep.scala is as follows:
> {{{}.editOrNewResources(){}}}{{{}.addToRequests("memory", 
> executorMemoryQuantity){}}}{{{}.addToLimits("memory", 
> executorMemoryQuantity){}}}{{{}.addToRequests("cpu", 
> executorCpuQuantity){}}}{{{}.addToLimits(executorResourceQuantities.asJava){}}}{{{}.endResources(){}}}}}{}}}
>  
> i.e. it always adds the memory limit, and there's no way to stop it.
> Oh - most of our code is in python, so it is not bound by the JVM memory 
> settings,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46186) Invalid Spark Connect execution state transition if interrupted before thread started

2023-11-30 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-46186:
-

 Summary: Invalid Spark Connect execution state transition if 
interrupted before thread started
 Key: SPARK-46186
 URL: https://issues.apache.org/jira/browse/SPARK-46186
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski


Fix edge case where interrupting execution before the ExecuteThreadRunner 
started could lead to illegal state transition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46185) Add Apache Spark 3.4.2 Dockerfiles

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46185:
---
Labels: pull-request-available  (was: )

> Add Apache Spark 3.4.2 Dockerfiles
> --
>
> Key: SPARK-46185
> URL: https://issues.apache.org/jira/browse/SPARK-46185
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Docker
>Affects Versions: 3.4.2
>Reporter: Yikun Jiang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46185) Add Apache Spark 3.4.2 Dockerfiles

2023-11-30 Thread Yikun Jiang (Jira)
Yikun Jiang created SPARK-46185:
---

 Summary: Add Apache Spark 3.4.2 Dockerfiles
 Key: SPARK-46185
 URL: https://issues.apache.org/jira/browse/SPARK-46185
 Project: Spark
  Issue Type: Bug
  Components: Spark Docker
Affects Versions: 3.4.2
Reporter: Yikun Jiang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3

2023-11-30 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791564#comment-17791564
 ] 

Dongjoon Hyun commented on SPARK-44573:
---

To [~siddaraju.g.c], could you try other Apache Spark binaries and let us know 
the result?
If it consistently fails on your environment across multiple Apache Spark 
binaries, it could be a setting issue like [~dcoliversun] mentioned in the 
above.

> Couldn't submit Spark application to Kubenetes in versions v1.27.3
> --
>
> Key: SPARK-44573
> URL: https://issues.apache.org/jira/browse/SPARK-44573
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.4.1
>Reporter: Siddaraju G C
>Priority: Major
>
> Spark-submit ( cluster mode on Kubernetes ) results error 
> *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s 
> cluster.
> Steps followed:
>  * using IBM cloud, created 3 Instances
>  * 1st Instance act as master node and another two acts as worker nodes
>  
> {noformat}
> root@vsi-spark-master:/opt# kubectl get nodes
> NAME                 STATUS   ROLES                  AGE   VERSION
> vsi-spark-master     Ready    control-plane,master   2d    v1.27.3+k3s1
> vsi-spark-worker-1   Ready                     47h   v1.27.3+k3s1
> vsi-spark-worker-2   Ready                     47h   
> v1.27.3+k3s1{noformat}
>  * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder 
>  * Ran spark by using below command
>  
> {noformat}
> root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master 
> k8s://http://:6443 --conf 
> spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode 
> cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf 
> spark.executor.instances=5 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark  --conf 
> spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB 
> local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat}
>  * And getting below error message.
> {noformat}
> 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
> 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified 
> a krb5.conf file locally or via a ConfigMap. Make sure that you have the 
> krb5.conf locally on the driver image.
> 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" 
> first. It should be yes.
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93)
>     at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244)
>     at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216)
>     at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>     at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.IOException: Connection reset
>     at 
> 

[jira] [Updated] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3

2023-11-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44573:
--
Priority: Major  (was: Blocker)

> Couldn't submit Spark application to Kubenetes in versions v1.27.3
> --
>
> Key: SPARK-44573
> URL: https://issues.apache.org/jira/browse/SPARK-44573
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.4.1
>Reporter: Siddaraju G C
>Priority: Major
>
> Spark-submit ( cluster mode on Kubernetes ) results error 
> *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s 
> cluster.
> Steps followed:
>  * using IBM cloud, created 3 Instances
>  * 1st Instance act as master node and another two acts as worker nodes
>  
> {noformat}
> root@vsi-spark-master:/opt# kubectl get nodes
> NAME                 STATUS   ROLES                  AGE   VERSION
> vsi-spark-master     Ready    control-plane,master   2d    v1.27.3+k3s1
> vsi-spark-worker-1   Ready                     47h   v1.27.3+k3s1
> vsi-spark-worker-2   Ready                     47h   
> v1.27.3+k3s1{noformat}
>  * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder 
>  * Ran spark by using below command
>  
> {noformat}
> root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master 
> k8s://http://:6443 --conf 
> spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode 
> cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf 
> spark.executor.instances=5 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark  --conf 
> spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB 
> local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat}
>  * And getting below error message.
> {noformat}
> 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
> 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified 
> a krb5.conf file locally or via a ConfigMap. Make sure that you have the 
> krb5.conf locally on the driver image.
> 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" 
> first. It should be yes.
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93)
>     at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244)
>     at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216)
>     at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>     at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.IOException: Connection reset
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:349)
>     at 
> 

[jira] [Comment Edited] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3

2023-11-30 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791534#comment-17791534
 ] 

Qian Sun edited comment on SPARK-44573 at 11/30/23 9:48 AM:


Did you bind role with your serviceaccount? 

ref: [https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac]

cc [~dongjoon] 


was (Author: dcoliversun):
Did you bind role with your serviceaccount? 

ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac

> Couldn't submit Spark application to Kubenetes in versions v1.27.3
> --
>
> Key: SPARK-44573
> URL: https://issues.apache.org/jira/browse/SPARK-44573
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.4.1
>Reporter: Siddaraju G C
>Priority: Blocker
>
> Spark-submit ( cluster mode on Kubernetes ) results error 
> *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s 
> cluster.
> Steps followed:
>  * using IBM cloud, created 3 Instances
>  * 1st Instance act as master node and another two acts as worker nodes
>  
> {noformat}
> root@vsi-spark-master:/opt# kubectl get nodes
> NAME                 STATUS   ROLES                  AGE   VERSION
> vsi-spark-master     Ready    control-plane,master   2d    v1.27.3+k3s1
> vsi-spark-worker-1   Ready                     47h   v1.27.3+k3s1
> vsi-spark-worker-2   Ready                     47h   
> v1.27.3+k3s1{noformat}
>  * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder 
>  * Ran spark by using below command
>  
> {noformat}
> root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master 
> k8s://http://:6443 --conf 
> spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode 
> cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf 
> spark.executor.instances=5 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark  --conf 
> spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB 
> local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat}
>  * And getting below error message.
> {noformat}
> 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
> 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified 
> a krb5.conf file locally or via a ConfigMap. Make sure that you have the 
> krb5.conf locally on the driver image.
> 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" 
> first. It should be yes.
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93)
>     at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244)
>     at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216)
>     at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>     at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.IOException: Connection reset
>     at 
> 

[jira] [Commented] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3

2023-11-30 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791534#comment-17791534
 ] 

Qian Sun commented on SPARK-44573:
--

Did you bind role with your serviceaccount? 

ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac

> Couldn't submit Spark application to Kubenetes in versions v1.27.3
> --
>
> Key: SPARK-44573
> URL: https://issues.apache.org/jira/browse/SPARK-44573
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.4.1
>Reporter: Siddaraju G C
>Priority: Blocker
>
> Spark-submit ( cluster mode on Kubernetes ) results error 
> *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s 
> cluster.
> Steps followed:
>  * using IBM cloud, created 3 Instances
>  * 1st Instance act as master node and another two acts as worker nodes
>  
> {noformat}
> root@vsi-spark-master:/opt# kubectl get nodes
> NAME                 STATUS   ROLES                  AGE   VERSION
> vsi-spark-master     Ready    control-plane,master   2d    v1.27.3+k3s1
> vsi-spark-worker-1   Ready                     47h   v1.27.3+k3s1
> vsi-spark-worker-2   Ready                     47h   
> v1.27.3+k3s1{noformat}
>  * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder 
>  * Ran spark by using below command
>  
> {noformat}
> root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master 
> k8s://http://:6443 --conf 
> spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode 
> cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf 
> spark.executor.instances=5 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark  --conf 
> spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB 
> local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat}
>  * And getting below error message.
> {noformat}
> 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
> 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified 
> a krb5.conf file locally or via a ConfigMap. Make sure that you have the 
> krb5.conf locally on the driver image.
> 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" 
> first. It should be yes.
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93)
>     at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244)
>     at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216)
>     at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>     at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.IOException: Connection reset
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558)
>     at 

[jira] [Updated] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg in spark3.5.0 website

2023-11-30 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-46183:
-
Summary: Incorrect path for spark-hero-thin-light.jpg in spark3.5.0 website 
 (was: Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website)

> Incorrect path for spark-hero-thin-light.jpg in spark3.5.0 website
> --
>
> Key: SPARK-46183
> URL: https://issues.apache.org/jira/browse/SPARK-46183
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Qian Sun
>Priority: Minor
> Attachments: network.png
>
>
> When I visit [https://spark.apache.org/docs/3.5.0/,] 
> spark-hero-thin-light.jpg is not found caused by 
> [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,]
>  the path should be ../images/spark-hero-thin-light.jpg



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website

2023-11-30 Thread Qian Sun (Jira)
Qian Sun created SPARK-46183:


 Summary: Incorrect path for spark-hero-thin-light.jpg for 
spark3.5.0 website
 Key: SPARK-46183
 URL: https://issues.apache.org/jira/browse/SPARK-46183
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.5.0
Reporter: Qian Sun


When I visit [https://spark.apache.org/docs/3.5.0/,] spark-hero-thin-light.jpg 
is not found caused by 
[https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,]
 the path should be ../images/spark-hero-thin-light.jpg



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website

2023-11-30 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-46183:
-
Attachment: network.png

> Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website
> ---
>
> Key: SPARK-46183
> URL: https://issues.apache.org/jira/browse/SPARK-46183
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Qian Sun
>Priority: Minor
> Attachments: network.png
>
>
> When I visit [https://spark.apache.org/docs/3.5.0/,] 
> spark-hero-thin-light.jpg is not found caused by 
> [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,]
>  the path should be ../images/spark-hero-thin-light.jpg



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website

2023-11-30 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-46183:
-
Attachment: (was: 
L1VzZXJzL2hlbmd6aGVuLnNxL0xpYnJhcnkvQXBwbGljYXRpb24gU3VwcG9ydC9pRGluZ1RhbGsvNDUyMDQ5NjgwX3YyL0ltYWdlRmlsZXMvMTcwMTMzNjk5MjkzNF81QjRENEU2RC1FNUM2LTQxNEQtOERGRS0wOTIxRUUzMjY2OTcucG5n.png)

> Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website
> ---
>
> Key: SPARK-46183
> URL: https://issues.apache.org/jira/browse/SPARK-46183
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Qian Sun
>Priority: Minor
>
> When I visit [https://spark.apache.org/docs/3.5.0/,] 
> spark-hero-thin-light.jpg is not found caused by 
> [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,]
>  the path should be ../images/spark-hero-thin-light.jpg



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-32246:
--

Assignee: (was: Apache Spark)

> Have a way to optionally run streaming-kinesis-asl
> --
>
> Key: SPARK-32246
> URL: https://issues.apache.org/jira/browse/SPARK-32246
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on 
> external Amazon kinesis service.
> We should have a way to run it optionally. Currently, this is not being run 
> in Github Actions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-32246:
--

Assignee: Apache Spark

> Have a way to optionally run streaming-kinesis-asl
> --
>
> Key: SPARK-32246
> URL: https://issues.apache.org/jira/browse/SPARK-32246
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on 
> external Amazon kinesis service.
> We should have a way to run it optionally. Currently, this is not being run 
> in Github Actions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46170) Support inject adaptive query post planner strategy rules in SparkSessionExtensions

2023-11-30 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You reassigned SPARK-46170:
-

Assignee: XiDuo You

> Support inject adaptive query post planner strategy rules in 
> SparkSessionExtensions
> ---
>
> Key: SPARK-46170
> URL: https://issues.apache.org/jira/browse/SPARK-46170
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46170) Support inject adaptive query post planner strategy rules in SparkSessionExtensions

2023-11-30 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You resolved SPARK-46170.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44074
[https://github.com/apache/spark/pull/44074]

> Support inject adaptive query post planner strategy rules in 
> SparkSessionExtensions
> ---
>
> Key: SPARK-46170
> URL: https://issues.apache.org/jira/browse/SPARK-46170
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-32246:
--

Assignee: Apache Spark

> Have a way to optionally run streaming-kinesis-asl
> --
>
> Key: SPARK-32246
> URL: https://issues.apache.org/jira/browse/SPARK-32246
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on 
> external Amazon kinesis service.
> We should have a way to run it optionally. Currently, this is not being run 
> in Github Actions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-32246:
--

Assignee: (was: Apache Spark)

> Have a way to optionally run streaming-kinesis-asl
> --
>
> Key: SPARK-32246
> URL: https://issues.apache.org/jira/browse/SPARK-32246
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on 
> external Amazon kinesis service.
> We should have a way to run it optionally. Currently, this is not being run 
> in Github Actions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-32246:
--

Assignee: (was: Apache Spark)

> Have a way to optionally run streaming-kinesis-asl
> --
>
> Key: SPARK-32246
> URL: https://issues.apache.org/jira/browse/SPARK-32246
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on 
> external Amazon kinesis service.
> We should have a way to run it optionally. Currently, this is not being run 
> in Github Actions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-32246:
--

Assignee: Apache Spark

> Have a way to optionally run streaming-kinesis-asl
> --
>
> Key: SPARK-32246
> URL: https://issues.apache.org/jira/browse/SPARK-32246
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on 
> external Amazon kinesis service.
> We should have a way to run it optionally. Currently, this is not being run 
> in Github Actions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45825) Fix these issue in module sql/catalyst

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45825:
--

Assignee: Apache Spark  (was: Jiaan Geng)

> Fix these issue in module sql/catalyst
> --
>
> Key: SPARK-45825
> URL: https://issues.apache.org/jira/browse/SPARK-45825
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45825) Fix these issue in module sql/catalyst

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45825:
--

Assignee: Jiaan Geng  (was: Apache Spark)

> Fix these issue in module sql/catalyst
> --
>
> Key: SPARK-45825
> URL: https://issues.apache.org/jira/browse/SPARK-45825
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12105) Add a DataFrame.show() with argument for output PrintStream

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-12105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-12105:
---
Labels: bulk-closed pull-request-available  (was: bulk-closed)

> Add a DataFrame.show() with argument for output PrintStream
> ---
>
> Key: SPARK-12105
> URL: https://issues.apache.org/jira/browse/SPARK-12105
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Dean Wampler
>Priority: Minor
>  Labels: bulk-closed, pull-request-available
>
> It would be nice to send the output of DataFrame.show(...) to a different 
> output stream than stdout, including just capturing the string itself. This 
> is useful, e.g., for testing. Actually, it would be sufficient and perhaps 
> better to just make DataFrame.showString a public method, 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46179) Generate golden files for SQLQueryTestSuites with Postgres

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46179:
---
Labels: pull-request-available  (was: )

> Generate golden files for SQLQueryTestSuites with Postgres
> --
>
> Key: SPARK-46179
> URL: https://issues.apache.org/jira/browse/SPARK-46179
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Andy Lam
>Priority: Major
>  Labels: pull-request-available
>
> For correctness checking of our SQLQueryTestSuites, we want to run 
> SQLQueryTestSuites with other DBMS as a reference DBMS to generate golden 
> files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44881) Executor stucked on retrying to fetch shuffle data when `java.lang.OutOfMemoryError. unable to create native thread` exception occurred.

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44881:
---
Labels: pull-request-available  (was: )

> Executor stucked on retrying to fetch shuffle data when 
> `java.lang.OutOfMemoryError. unable to create native thread` exception 
> occurred.
> 
>
> Key: SPARK-44881
> URL: https://issues.apache.org/jira/browse/SPARK-44881
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: hgs
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org