[jira] [Updated] (SPARK-38700) Use error classes in the execution errors of save mode

2022-06-13 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38700:
-
Fix Version/s: 3.3.1

> Use error classes in the execution errors of save mode
> --
>
> Key: SPARK-38700
> URL: https://issues.apache.org/jira/browse/SPARK-38700
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: panbingkun
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
>
> Migrate the following errors in QueryExecutionErrors:
> * unsupportedSaveModeError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39457) Support pure IPV6 environment without IPV4

2022-06-13 Thread Ruslan Dautkhanov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553901#comment-17553901
 ] 

Ruslan Dautkhanov commented on SPARK-39457:
---

Is there a dependency on Hadoop to support IPv6 too? HADOOP-11890 

> Support pure IPV6 environment without IPV4
> --
>
> Key: SPARK-39457
> URL: https://issues.apache.org/jira/browse/SPARK-39457
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: DB Tsai
>Priority: Major
>  Labels: releasenotes
>
> Spark doesn't fully work in pure IPV6 environment that doesn't have IPV4 at 
> all. This is an umbrella jira tracking the support of pure IPV6 deployment. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39454) failed to convert LogicalPlan to SparkPlan when subquery exists after "IN" predicate

2022-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39454.
--
Resolution: Duplicate

> failed to convert LogicalPlan to SparkPlan when subquery exists after "IN" 
> predicate
> 
>
> Key: SPARK-39454
> URL: https://issues.apache.org/jira/browse/SPARK-39454
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
> Environment: Spark 3.2.1, Standalone mode.
>  
> Spark shell start:
> {code:java}
> SPARK_HOME=/spark-3.2.1-bin-hadoop3.2
>  
> $SPARK_HOME/bin/pyspark --master local[*] \
>         --conf spark.executor.cores=12 \
>         --driver-memory 40G  \
>         --executor-memory 10G  \
>         --conf spark.driver.maxResultSize=8G \
>         --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.1 \
>         --conf 
> spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
>  \
>         --conf 
> spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
>         --conf spark.sql.catalog.spark_catalog.type=hadoop \
>         --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
>         --conf spark.sql.catalog.local.type=hadoop \
>         --conf spark.sql.catalog.local.warehouse=$PWD/local-warehouse \
>         --conf spark.sql.catalog.spark_catalog.warehouse=$PWD/spark-warehouse 
> {code}
>Reporter: Yanzhe Xu
>Priority: Major
> Attachments: catalog_returns_repro.tar.gz, 
> catalog_sales_repro.tar.gz, date_dim_repro.tar.gz
>
>
> When running a query with Iceberg:
> {code:java}
> spark.sql("drop table if exists catalog_returns")
> spark.sql("drop table if exists catalog_sales")
> spark.sql("drop table if exists date_dim")
>  
> spark.read.parquet("catalog_returns_repro").createOrReplaceTempView("temp_catalog_returns")
> spark.read.parquet("catalog_sales_repro").createOrReplaceTempView("temp_catalog_sales")
> spark.read.parquet("date_dim_repro").createOrReplaceTempView("temp_date_dim")
>  
> spark.sql("create table if not exists catalog_returns using iceberg 
> partitioned by (cr_returned_date_sk) 
> tblproperties('write.parquet.compression-codec' = 'snappy') as select * from 
> temp_catalog_returns")
> spark.sql("create table if not exists catalog_sales using iceberg partitioned 
> by (cs_sold_date_sk) tblproperties('write.parquet.compression-codec' = 
> 'snappy') as select * from temp_catalog_sales")
> spark.sql("create table if not exists date_dim using iceberg 
> tblproperties('write.parquet.compression-codec' = 'snappy') as select * from 
> temp_date_dim")
> spark.sql("delete from catalog_returns where cr_order_number in (select 
> cs_order_number from catalog_sales, date_dim where cs_sold_date_sk=d_date_sk 
> and d_date between '2000-05-20' and '2000-05-21');").explain(True) {code}
> Spark gives the following error:
> {code:java}
> : java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to 
> org.apache.spark.sql.execution.SparkPlan
>   at scala.collection.immutable.List.map(List.scala:293)
>   at 
> org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
>   at 
> org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
>   at 
> org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at 

[jira] [Resolved] (SPARK-39462) Contains function to check if a string is contained

2022-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39462.
--
Resolution: Duplicate

> Contains function to check if a string is contained
> ---
>
> Key: SPARK-39462
> URL: https://issues.apache.org/jira/browse/SPARK-39462
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: melin
>Priority: Major
>
> Returns TRUE if the first expression contains the second expression.
> {{SELECT CONTAINS('spark sql lakehouse', 'lake')-- True}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39462) Contains function to check if a string is contained

2022-06-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39462:
-
Summary: Contains function to check if a string is contained  (was: 
[Support]add contains function)

> Contains function to check if a string is contained
> ---
>
> Key: SPARK-39462
> URL: https://issues.apache.org/jira/browse/SPARK-39462
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: melin
>Priority: Major
>
> Returns TRUE if the first expression contains the second expression.
> {{SELECT CONTAINS('spark sql lakehouse', 'lake')-- True}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39339) Support TimestampNTZ in JDBC data source

2022-06-13 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-39339:
--

Assignee: Ivan Sadikov

> Support TimestampNTZ in JDBC data source
> 
>
> Key: SPARK-39339
> URL: https://issues.apache.org/jira/browse/SPARK-39339
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39339) Support TimestampNTZ in JDBC data source

2022-06-13 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-39339.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36726
[https://github.com/apache/spark/pull/36726]

> Support TimestampNTZ in JDBC data source
> 
>
> Key: SPARK-39339
> URL: https://issues.apache.org/jira/browse/SPARK-39339
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39463) Use UUID for test database location in JavaJdbcRDDSuite

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553885#comment-17553885
 ] 

Apache Spark commented on SPARK-39463:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36864

> Use UUID for test database location in JavaJdbcRDDSuite
> ---
>
> Key: SPARK-39463
> URL: https://issues.apache.org/jira/browse/SPARK-39463
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39463) Use UUID for test database location in JavaJdbcRDDSuite

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553884#comment-17553884
 ] 

Apache Spark commented on SPARK-39463:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36864

> Use UUID for test database location in JavaJdbcRDDSuite
> ---
>
> Key: SPARK-39463
> URL: https://issues.apache.org/jira/browse/SPARK-39463
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39463) Use UUID for test database location in JavaJdbcRDDSuite

2022-06-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39463:


Assignee: Apache Spark

> Use UUID for test database location in JavaJdbcRDDSuite
> ---
>
> Key: SPARK-39463
> URL: https://issues.apache.org/jira/browse/SPARK-39463
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39463) Use UUID for test database location in JavaJdbcRDDSuite

2022-06-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39463:


Assignee: (was: Apache Spark)

> Use UUID for test database location in JavaJdbcRDDSuite
> ---
>
> Key: SPARK-39463
> URL: https://issues.apache.org/jira/browse/SPARK-39463
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39463) Use UUID for test database location in JavaJdbcRDDSuite

2022-06-13 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-39463:
-

 Summary: Use UUID for test database location in JavaJdbcRDDSuite
 Key: SPARK-39463
 URL: https://issues.apache.org/jira/browse/SPARK-39463
 Project: Spark
  Issue Type: Test
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39459) LocalSchedulerBackend doesn't support IPV6

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39459:
--
Affects Version/s: 3.4.0
   (was: 3.2.1)

> LocalSchedulerBackend doesn't support IPV6
> --
>
> Key: SPARK-39459
> URL: https://issues.apache.org/jira/browse/SPARK-39459
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: DB Tsai
>Priority: Major
>
> {code:java}
> ➜  ./bin/spark-shell
> 22/06/09 14:52:35 WARN Utils: Your hostname, DBs-Mac-mini-2.local resolves to 
> a loopback address: 127.0.0.1; using 2600:1700:1151:11ef:0:0:0:2000 instead 
> (on interface en1)
> 22/06/09 14:52:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 22/06/09 14:52:43 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 22/06/09 14:52:44 ERROR SparkContext: Error initializing SparkContext.
> java.lang.AssertionError: assertion failed: Expected hostname or IPv6 IP 
> enclosed in [] but got 2600:1700:1151:11ef:0:0:0:2000
>   at scala.Predef$.assert(Predef.scala:223) ~[scala-library-2.12.15.jar:?]
>   at org.apache.spark.util.Utils$.checkHost(Utils.scala:1110) 
> ~[spark-core_2.12-3.2.0.jar:3.2.0.37]
>   at org.apache.spark.executor.Executor.(Executor.scala:89) 
> ~[spark-core_2.12-3.2.0.jar:3.2.0.37]
>   at 
> org.apache.spark.scheduler.local.LocalEndpoint.(LocalSchedulerBackend.scala:64)
>  ~[spark-core_2.12-3.2.0.jar:3.2.0]
>   at 
> org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
>  ~[spark-core_2.12-3.2.0.jar:3.2.0]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39457) Support pure IPV6 environment without IPV4

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39457:
--
Labels: releasenotes  (was: )

> Support pure IPV6 environment without IPV4
> --
>
> Key: SPARK-39457
> URL: https://issues.apache.org/jira/browse/SPARK-39457
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: DB Tsai
>Priority: Major
>  Labels: releasenotes
>
> Spark doesn't fully work in pure IPV6 environment that doesn't have IPV4 at 
> all. This is an umbrella jira tracking the support of pure IPV6 deployment. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39459) LocalSchedulerBackend doesn't support IPV6

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-39459:
-

Assignee: Dongjoon Hyun

> LocalSchedulerBackend doesn't support IPV6
> --
>
> Key: SPARK-39459
> URL: https://issues.apache.org/jira/browse/SPARK-39459
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: DB Tsai
>Assignee: Dongjoon Hyun
>Priority: Major
>
> {code:java}
> ➜  ./bin/spark-shell
> 22/06/09 14:52:35 WARN Utils: Your hostname, DBs-Mac-mini-2.local resolves to 
> a loopback address: 127.0.0.1; using 2600:1700:1151:11ef:0:0:0:2000 instead 
> (on interface en1)
> 22/06/09 14:52:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 22/06/09 14:52:43 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 22/06/09 14:52:44 ERROR SparkContext: Error initializing SparkContext.
> java.lang.AssertionError: assertion failed: Expected hostname or IPv6 IP 
> enclosed in [] but got 2600:1700:1151:11ef:0:0:0:2000
>   at scala.Predef$.assert(Predef.scala:223) ~[scala-library-2.12.15.jar:?]
>   at org.apache.spark.util.Utils$.checkHost(Utils.scala:1110) 
> ~[spark-core_2.12-3.2.0.jar:3.2.0.37]
>   at org.apache.spark.executor.Executor.(Executor.scala:89) 
> ~[spark-core_2.12-3.2.0.jar:3.2.0.37]
>   at 
> org.apache.spark.scheduler.local.LocalEndpoint.(LocalSchedulerBackend.scala:64)
>  ~[spark-core_2.12-3.2.0.jar:3.2.0]
>   at 
> org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
>  ~[spark-core_2.12-3.2.0.jar:3.2.0]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39459) LocalSchedulerBackend doesn't support IPV6

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553880#comment-17553880
 ] 

Apache Spark commented on SPARK-39459:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36863

> LocalSchedulerBackend doesn't support IPV6
> --
>
> Key: SPARK-39459
> URL: https://issues.apache.org/jira/browse/SPARK-39459
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: DB Tsai
>Priority: Major
>
> {code:java}
> ➜  ./bin/spark-shell
> 22/06/09 14:52:35 WARN Utils: Your hostname, DBs-Mac-mini-2.local resolves to 
> a loopback address: 127.0.0.1; using 2600:1700:1151:11ef:0:0:0:2000 instead 
> (on interface en1)
> 22/06/09 14:52:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 22/06/09 14:52:43 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 22/06/09 14:52:44 ERROR SparkContext: Error initializing SparkContext.
> java.lang.AssertionError: assertion failed: Expected hostname or IPv6 IP 
> enclosed in [] but got 2600:1700:1151:11ef:0:0:0:2000
>   at scala.Predef$.assert(Predef.scala:223) ~[scala-library-2.12.15.jar:?]
>   at org.apache.spark.util.Utils$.checkHost(Utils.scala:1110) 
> ~[spark-core_2.12-3.2.0.jar:3.2.0.37]
>   at org.apache.spark.executor.Executor.(Executor.scala:89) 
> ~[spark-core_2.12-3.2.0.jar:3.2.0.37]
>   at 
> org.apache.spark.scheduler.local.LocalEndpoint.(LocalSchedulerBackend.scala:64)
>  ~[spark-core_2.12-3.2.0.jar:3.2.0]
>   at 
> org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
>  ~[spark-core_2.12-3.2.0.jar:3.2.0]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39459) LocalSchedulerBackend doesn't support IPV6

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553879#comment-17553879
 ] 

Apache Spark commented on SPARK-39459:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36863

> LocalSchedulerBackend doesn't support IPV6
> --
>
> Key: SPARK-39459
> URL: https://issues.apache.org/jira/browse/SPARK-39459
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: DB Tsai
>Priority: Major
>
> {code:java}
> ➜  ./bin/spark-shell
> 22/06/09 14:52:35 WARN Utils: Your hostname, DBs-Mac-mini-2.local resolves to 
> a loopback address: 127.0.0.1; using 2600:1700:1151:11ef:0:0:0:2000 instead 
> (on interface en1)
> 22/06/09 14:52:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 22/06/09 14:52:43 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 22/06/09 14:52:44 ERROR SparkContext: Error initializing SparkContext.
> java.lang.AssertionError: assertion failed: Expected hostname or IPv6 IP 
> enclosed in [] but got 2600:1700:1151:11ef:0:0:0:2000
>   at scala.Predef$.assert(Predef.scala:223) ~[scala-library-2.12.15.jar:?]
>   at org.apache.spark.util.Utils$.checkHost(Utils.scala:1110) 
> ~[spark-core_2.12-3.2.0.jar:3.2.0.37]
>   at org.apache.spark.executor.Executor.(Executor.scala:89) 
> ~[spark-core_2.12-3.2.0.jar:3.2.0.37]
>   at 
> org.apache.spark.scheduler.local.LocalEndpoint.(LocalSchedulerBackend.scala:64)
>  ~[spark-core_2.12-3.2.0.jar:3.2.0]
>   at 
> org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
>  ~[spark-core_2.12-3.2.0.jar:3.2.0]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39459) LocalSchedulerBackend doesn't support IPV6

2022-06-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39459:


Assignee: (was: Apache Spark)

> LocalSchedulerBackend doesn't support IPV6
> --
>
> Key: SPARK-39459
> URL: https://issues.apache.org/jira/browse/SPARK-39459
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: DB Tsai
>Priority: Major
>
> {code:java}
> ➜  ./bin/spark-shell
> 22/06/09 14:52:35 WARN Utils: Your hostname, DBs-Mac-mini-2.local resolves to 
> a loopback address: 127.0.0.1; using 2600:1700:1151:11ef:0:0:0:2000 instead 
> (on interface en1)
> 22/06/09 14:52:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 22/06/09 14:52:43 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 22/06/09 14:52:44 ERROR SparkContext: Error initializing SparkContext.
> java.lang.AssertionError: assertion failed: Expected hostname or IPv6 IP 
> enclosed in [] but got 2600:1700:1151:11ef:0:0:0:2000
>   at scala.Predef$.assert(Predef.scala:223) ~[scala-library-2.12.15.jar:?]
>   at org.apache.spark.util.Utils$.checkHost(Utils.scala:1110) 
> ~[spark-core_2.12-3.2.0.jar:3.2.0.37]
>   at org.apache.spark.executor.Executor.(Executor.scala:89) 
> ~[spark-core_2.12-3.2.0.jar:3.2.0.37]
>   at 
> org.apache.spark.scheduler.local.LocalEndpoint.(LocalSchedulerBackend.scala:64)
>  ~[spark-core_2.12-3.2.0.jar:3.2.0]
>   at 
> org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
>  ~[spark-core_2.12-3.2.0.jar:3.2.0]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39459) LocalSchedulerBackend doesn't support IPV6

2022-06-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39459:


Assignee: Apache Spark

> LocalSchedulerBackend doesn't support IPV6
> --
>
> Key: SPARK-39459
> URL: https://issues.apache.org/jira/browse/SPARK-39459
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: DB Tsai
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> ➜  ./bin/spark-shell
> 22/06/09 14:52:35 WARN Utils: Your hostname, DBs-Mac-mini-2.local resolves to 
> a loopback address: 127.0.0.1; using 2600:1700:1151:11ef:0:0:0:2000 instead 
> (on interface en1)
> 22/06/09 14:52:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 22/06/09 14:52:43 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 22/06/09 14:52:44 ERROR SparkContext: Error initializing SparkContext.
> java.lang.AssertionError: assertion failed: Expected hostname or IPv6 IP 
> enclosed in [] but got 2600:1700:1151:11ef:0:0:0:2000
>   at scala.Predef$.assert(Predef.scala:223) ~[scala-library-2.12.15.jar:?]
>   at org.apache.spark.util.Utils$.checkHost(Utils.scala:1110) 
> ~[spark-core_2.12-3.2.0.jar:3.2.0.37]
>   at org.apache.spark.executor.Executor.(Executor.scala:89) 
> ~[spark-core_2.12-3.2.0.jar:3.2.0.37]
>   at 
> org.apache.spark.scheduler.local.LocalEndpoint.(LocalSchedulerBackend.scala:64)
>  ~[spark-core_2.12-3.2.0.jar:3.2.0]
>   at 
> org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
>  ~[spark-core_2.12-3.2.0.jar:3.2.0]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39454) failed to convert LogicalPlan to SparkPlan when subquery exists after "IN" predicate

2022-06-13 Thread XiDuo You (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553874#comment-17553874
 ] 

XiDuo You commented on SPARK-39454:
---

[~allxu] this issue should be fixed by SPARK-37995

> failed to convert LogicalPlan to SparkPlan when subquery exists after "IN" 
> predicate
> 
>
> Key: SPARK-39454
> URL: https://issues.apache.org/jira/browse/SPARK-39454
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
> Environment: Spark 3.2.1, Standalone mode.
>  
> Spark shell start:
> {code:java}
> SPARK_HOME=/spark-3.2.1-bin-hadoop3.2
>  
> $SPARK_HOME/bin/pyspark --master local[*] \
>         --conf spark.executor.cores=12 \
>         --driver-memory 40G  \
>         --executor-memory 10G  \
>         --conf spark.driver.maxResultSize=8G \
>         --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.1 \
>         --conf 
> spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
>  \
>         --conf 
> spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
>         --conf spark.sql.catalog.spark_catalog.type=hadoop \
>         --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
>         --conf spark.sql.catalog.local.type=hadoop \
>         --conf spark.sql.catalog.local.warehouse=$PWD/local-warehouse \
>         --conf spark.sql.catalog.spark_catalog.warehouse=$PWD/spark-warehouse 
> {code}
>Reporter: Yanzhe Xu
>Priority: Major
> Attachments: catalog_returns_repro.tar.gz, 
> catalog_sales_repro.tar.gz, date_dim_repro.tar.gz
>
>
> When running a query with Iceberg:
> {code:java}
> spark.sql("drop table if exists catalog_returns")
> spark.sql("drop table if exists catalog_sales")
> spark.sql("drop table if exists date_dim")
>  
> spark.read.parquet("catalog_returns_repro").createOrReplaceTempView("temp_catalog_returns")
> spark.read.parquet("catalog_sales_repro").createOrReplaceTempView("temp_catalog_sales")
> spark.read.parquet("date_dim_repro").createOrReplaceTempView("temp_date_dim")
>  
> spark.sql("create table if not exists catalog_returns using iceberg 
> partitioned by (cr_returned_date_sk) 
> tblproperties('write.parquet.compression-codec' = 'snappy') as select * from 
> temp_catalog_returns")
> spark.sql("create table if not exists catalog_sales using iceberg partitioned 
> by (cs_sold_date_sk) tblproperties('write.parquet.compression-codec' = 
> 'snappy') as select * from temp_catalog_sales")
> spark.sql("create table if not exists date_dim using iceberg 
> tblproperties('write.parquet.compression-codec' = 'snappy') as select * from 
> temp_date_dim")
> spark.sql("delete from catalog_returns where cr_order_number in (select 
> cs_order_number from catalog_sales, date_dim where cs_sold_date_sk=d_date_sk 
> and d_date between '2000-05-20' and '2000-05-21');").explain(True) {code}
> Spark gives the following error:
> {code:java}
> : java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to 
> org.apache.spark.sql.execution.SparkPlan
>   at scala.collection.immutable.List.map(List.scala:293)
>   at 
> org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
>   at 
> org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
>   at 
> org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   

[jira] [Comment Edited] (SPARK-24815) Structured Streaming should support dynamic allocation

2022-06-13 Thread Ramiz Mehran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553867#comment-17553867
 ] 

Ramiz Mehran edited comment on SPARK-24815 at 6/14/22 3:05 AM:
---

Guys, is this thread still alive?

I think dynamic scaling for SSS should be taken from spark-streaming itself. 
The logic of "processing/batch duration ratio" makes sense and removes any 
other dependency from the calculation. Also, there should be a moving average 
to calculate and this moving average batch count can be configurable.


was (Author: JIRAUSER290918):
Guys, is this thread still alive?

I think SSS for structure-streaming should be taken from spark-streaming 
itself. The logic of "processing/batch duration ratio" makes sense and removes 
any other dependency from the calculation. Also, there should be a moving 
average to calculate and this moving average batch count can be configurable.

> Structured Streaming should support dynamic allocation
> --
>
> Key: SPARK-24815
> URL: https://issues.apache.org/jira/browse/SPARK-24815
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core, Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Karthik Palaniappan
>Priority: Minor
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation

2022-06-13 Thread Ramiz Mehran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553867#comment-17553867
 ] 

Ramiz Mehran commented on SPARK-24815:
--

Guys, is this thread still alive?

I think SSS for structure-streaming should be taken from spark-streaming 
itself. The logic of "processing/batch duration ratio" makes sense and removes 
any other dependency from the calculation. Also, there should be a moving 
average to calculate and this moving average batch count can be configurable.

> Structured Streaming should support dynamic allocation
> --
>
> Key: SPARK-24815
> URL: https://issues.apache.org/jira/browse/SPARK-24815
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core, Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Karthik Palaniappan
>Priority: Minor
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39461) Print `SPARK_LOCAL_(HOSTNAME|IP)` in `build/{mvn|sbt}`

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-39461:
-

Assignee: Dongjoon Hyun

> Print `SPARK_LOCAL_(HOSTNAME|IP)` in `build/{mvn|sbt}`
> --
>
> Key: SPARK-39461
> URL: https://issues.apache.org/jira/browse/SPARK-39461
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39461) Print `SPARK_LOCAL_(HOSTNAME|IP)` in `build/{mvn|sbt}`

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-39461.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36862
[https://github.com/apache/spark/pull/36862]

> Print `SPARK_LOCAL_(HOSTNAME|IP)` in `build/{mvn|sbt}`
> --
>
> Key: SPARK-39461
> URL: https://issues.apache.org/jira/browse/SPARK-39461
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30835) Add support for YARN decommissioning & pre-emption

2022-06-13 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-30835.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 35683
[https://github.com/apache/spark/pull/35683]

> Add support for YARN decommissioning & pre-emption
> --
>
> Key: SPARK-30835
> URL: https://issues.apache.org/jira/browse/SPARK-30835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, YARN
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Abhishek Dixit
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30835) Add support for YARN decommissioning & pre-emption

2022-06-13 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-30835:
---

Assignee: Abhishek Dixit

> Add support for YARN decommissioning & pre-emption
> --
>
> Key: SPARK-30835
> URL: https://issues.apache.org/jira/browse/SPARK-30835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, YARN
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Abhishek Dixit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39462) [Support]add contains function

2022-06-13 Thread melin (Jira)
melin created SPARK-39462:
-

 Summary: [Support]add contains function
 Key: SPARK-39462
 URL: https://issues.apache.org/jira/browse/SPARK-39462
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: melin


Returns TRUE if the first expression contains the second expression.

{{SELECT CONTAINS('spark sql lakehouse', 'lake')-- True}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39433) to_date function returns a null for the first week of the year

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-39433.
--
Resolution: Duplicate

Oh, I think it's a subset of the linked issue 
https://issues.apache.org/jira/browse/SPARK-38571

> to_date function returns a null for the first week of the year
> --
>
> Key: SPARK-39433
> URL: https://issues.apache.org/jira/browse/SPARK-39433
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.2
>Reporter: CHARLES HOGG
>Priority: Minor
>
> When I use week of year in the to_date function, the first week of the year 
> returns a null for many years.
> ```
> df=pyrasa.sparkSession.createDataFrame([["2013-01"],["2013-02"],["2017-01"],["2018-01"]],["input"])
> df.select(func.col("input"),func.to_date(func.col("input"),"-ww").alias("date"))
>  \
>   .show()
> ```
> ```
> +---+--+
> |  input|      date|
> +---+--+
> |2013-01|      null|
> |2013-02|2013-01-06|
> |2017-01|2017-01-01|
> |2018-01|      null|
> +---+--+
> ```
> Why is this? Is it a bug in the to_date function?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38812) when i clean data ,I hope one rdd spill two rdd according clean data rule

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38812.
--
Resolution: Invalid

> when i clean data ,I hope one rdd spill two rdd according clean data rule
> -
>
> Key: SPARK-38812
> URL: https://issues.apache.org/jira/browse/SPARK-38812
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: gaokui
>Priority: Major
>
> when id do clean data,one rdd according one value(>or <) filter data, and 
> then generate two different set,one is error data file, another is errorless 
> data file.
> Now I use filter, but this filter must have two spark dag job, that cost too 
> much.
> exactly some code like iterator.span(preidicate) and then return one 
> tuple(iter1,iter2)
> one dataset will be spilted tow dataset in one rule data clean progress.
> i hope compute once not twice.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38925) update guava to 30.1.1-jre

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38925.
--
Fix Version/s: (was: 3.4.0)
   Resolution: Won't Fix

> update guava to 30.1.1-jre
> --
>
> Key: SPARK-38925
> URL: https://issues.apache.org/jira/browse/SPARK-38925
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1
>Reporter: qian
>Assignee: Apache Spark
>Priority: Major
>
> Update guava to 30.1.1-jre
> guava 14.0.1 has risk:
>  * 
> [CVE-2020-8908|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8908]
>  * 
> [CVE-2018-10237|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-10237]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38960) Spark should fail fast if initial memory too large(set by "spark.executor.extraJavaOptions") for executor to start

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38960.
--
Fix Version/s: (was: 3.4.0)
   Resolution: Won't Fix

> Spark should fail fast if initial memory too large(set by 
> "spark.executor.extraJavaOptions") for executor to start
> --
>
> Key: SPARK-38960
> URL: https://issues.apache.org/jira/browse/SPARK-38960
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Spark Submit, YARN
>Affects Versions: 3.4.0
>Reporter: panbingkun
>Priority: Minor
>
> if you set initial memory(set by 
> "spark.executor.extraJavaOptions=-Xms\{XXX}G" ) larger than maximum 
> memory(set by "spark.executor.memory")
> Eg.
>      *spark.executor.memory=1G*
>      *spark.executor.extraJavaOptions=-Xms2G*
>  
> from the driver process you just see executor failures with no warning, since 
> the more meaningful errors are buried in the executor logs. 
> Eg., on Yarn, you see:
> {noformat}
> Error occurred during initialization of VM
> Initial heap size set to a larger value than the maximum heap size{noformat}
> Instead we should just fail fast with a clear error message in the driver 
> logs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39005) Introduce 4 new functions to KVUtils

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-39005.
--
Resolution: Won't Fix

> Introduce 4 new functions to KVUtils
> 
>
> Key: SPARK-39005
> URL: https://issues.apache.org/jira/browse/SPARK-39005
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Introduce 4 new functions
>  * count:Counts the number of elements in the KVStoreView which satisfy a 
> predicate.
>  * foreach:Applies a function f to all values produced by KVStoreView.
>  * mapToSeq:Maps all values of KVStoreView to new Seq using a transformation 
> function.
>  * size: The size of KVStoreView.
> And use the above functions to simplify the code related to `KVStoreView`, 
> and the above functions will release `LevelDB/RocksDBIterator` earlier.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39020) [CVE-2020-9480] Transitive dependency "unused" from spark-sql_2.12 highlight as vulnerable in dependency tracker

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-39020.
--
Resolution: Not A Problem

This isn't a real artifact and can be ignored.

> [CVE-2020-9480] Transitive dependency  "unused"  from spark-sql_2.12  
> highlight as vulnerable in dependency tracker
> ---
>
> Key: SPARK-39020
> URL: https://issues.apache.org/jira/browse/SPARK-39020
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Sundar
>Priority: Minor
> Attachments: Dependency-Track.png
>
>
> I am using spark-sql_2.12 dependency version 3.2.1 in my project. My 
> dependency tracker highlights  the transitive dependency  "unused"  from  
> spark-sql_2.12 as vulnerable. I check there is no update for this artifacts 
> since 2014. Is the artifact used anywhere in spark ?
> To resolve this vulnerability,  can I exclude this "unused" artifact from 
> spark-sql_2.12 ?  Will it cause any issues in my project ? 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39066) Update to Maven 3.8.6 / fix problem with dev/test-dependencies.sh in 3.8.5

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-39066:
-
Summary: Update to Maven 3.8.6 / fix problem with dev/test-dependencies.sh 
in 3.8.5  (was: Run `./dev/test-dependencies.sh --replace-manifest ` use Maven 
3.8.5 produce wrong result)

> Update to Maven 3.8.6 / fix problem with dev/test-dependencies.sh in 3.8.5
> --
>
> Key: SPARK-39066
> URL: https://issues.apache.org/jira/browse/SPARK-39066
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
> Attachments: hadoop2-deps.txt, spark-deps-hadoop-2-hive-2.3
>
>
> Run `./dev/test-dependencies.sh --replace-manifest ` use maven 3.8.5 produce 
> wrong result:
> {code:java}
> diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
> b/dev/deps/spark-deps-hadoop-2-hive-2.3
> index b6df3ea5ce..e803aadcfc 100644
> --- a/dev/deps/spark-deps-hadoop-2-hive-2.3
> +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
> @@ -6,15 +6,14 @@ ST4/4.0.4//ST4-4.0.4.jar
>  activation/1.1.1//activation-1.1.1.jar
>  aircompressor/0.21//aircompressor-0.21.jar
>  algebra_2.12/2.0.1//algebra_2.12-2.0.1.jar
> +aliyun-java-sdk-core/4.5.10//aliyun-java-sdk-core-4.5.10.jar
> +aliyun-java-sdk-kms/2.11.0//aliyun-java-sdk-kms-2.11.0.jar
> +aliyun-java-sdk-ram/3.1.0//aliyun-java-sdk-ram-3.1.0.jar
> +aliyun-sdk-oss/3.13.0//aliyun-sdk-oss-3.13.0.jar
>  annotations/17.0.0//annotations-17.0.0.jar
>  antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
>  antlr4-runtime/4.8//antlr4-runtime-4.8.jar
>  aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
> -aopalliance/1.0//aopalliance-1.0.jar
> -apacheds-i18n/2.0.0-M15//apacheds-i18n-2.0.0-M15.jar
> -apacheds-kerberos-codec/2.0.0-M15//apacheds-kerberos-codec-2.0.0-M15.jar
> -api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
> -api-util/1.0.0-M20//api-util-1.0.0-M20.jar
>  arpack/2.2.1//arpack-2.2.1.jar
>  arpack_combined_all/0.1//arpack_combined_all-0.1.jar
>  arrow-format/7.0.0//arrow-format-7.0.0.jar
> @@ -26,7 +25,10 @@ automaton/1.11-8//automaton-1.11-8.jar
>  avro-ipc/1.11.0//avro-ipc-1.11.0.jar
>  avro-mapred/1.11.0//avro-mapred-1.11.0.jar
>  avro/1.11.0//avro-1.11.0.jar
> -azure-storage/2.0.0//azure-storage-2.0.0.jar
> +aws-java-sdk-bundle/1.11.1026//aws-java-sdk-bundle-1.11.1026.jar
> +azure-data-lake-store-sdk/2.3.9//azure-data-lake-store-sdk-2.3.9.jar
> +azure-keyvault-core/1.0.0//azure-keyvault-core-1.0.0.jar
> +azure-storage/7.0.1//azure-storage-7.0.1.jar
>  blas/2.2.1//blas-2.2.1.jar
>  bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
>  breeze-macros_2.12/1.2//breeze-macros_2.12-1.2.jar
> @@ -34,28 +36,24 @@ breeze_2.12/1.2//breeze_2.12-1.2.jar
>  cats-kernel_2.12/2.1.1//cats-kernel_2.12-2.1.1.jar
>  chill-java/0.10.0//chill-java-0.10.0.jar
>  chill_2.12/0.10.0//chill_2.12-0.10.0.jar
> -commons-beanutils/1.9.4//commons-beanutils-1.9.4.jar
>  commons-cli/1.5.0//commons-cli-1.5.0.jar
>  commons-codec/1.15//commons-codec-1.15.jar
>  commons-collections/3.2.2//commons-collections-3.2.2.jar
>  commons-collections4/4.4//commons-collections4-4.4.jar
>  commons-compiler/3.0.16//commons-compiler-3.0.16.jar
>  commons-compress/1.21//commons-compress-1.21.jar
> -commons-configuration/1.6//commons-configuration-1.6.jar
>  commons-crypto/1.1.0//commons-crypto-1.1.0.jar
>  commons-dbcp/1.4//commons-dbcp-1.4.jar
> -commons-digester/1.8//commons-digester-1.8.jar
> -commons-httpclient/3.1//commons-httpclient-3.1.jar
>  commons-io/2.4//commons-io-2.4.jar
>  commons-lang/2.6//commons-lang-2.6.jar
>  commons-lang3/3.12.0//commons-lang3-3.12.0.jar
>  commons-logging/1.1.3//commons-logging-1.1.3.jar
>  commons-math3/3.6.1//commons-math3-3.6.1.jar
> -commons-net/3.1//commons-net-3.1.jar
>  commons-pool/1.5.4//commons-pool-1.5.4.jar
>  commons-text/1.9//commons-text-1.9.jar
>  compress-lzf/1.1//compress-lzf-1.1.jar
>  core/1.1.2//core-1.1.2.jar
> +cos_api-bundle/5.6.19//cos_api-bundle-5.6.19.jar
>  curator-client/2.7.1//curator-client-2.7.1.jar
>  curator-framework/2.7.1//curator-framework-2.7.1.jar
>  curator-recipes/2.7.1//curator-recipes-2.7.1.jar
> @@ -69,25 +67,17 @@ generex/1.0.2//generex-1.0.2.jar
>  gmetric4j/1.0.10//gmetric4j-1.0.10.jar
>  gson/2.2.4//gson-2.2.4.jar
>  guava/14.0.1//guava-14.0.1.jar
> -guice-servlet/3.0//guice-servlet-3.0.jar
> -guice/3.0//guice-3.0.jar
> -hadoop-annotations/2.7.4//hadoop-annotations-2.7.4.jar
> -hadoop-auth/2.7.4//hadoop-auth-2.7.4.jar
> -hadoop-aws/2.7.4//hadoop-aws-2.7.4.jar
> -hadoop-azure/2.7.4//hadoop-azure-2.7.4.jar
> -hadoop-client/2.7.4//hadoop-client-2.7.4.jar
> -hadoop-common/2.7.4//hadoop-common-2.7.4.jar
> -hadoop-hdfs/2.7.4//hadoop-hdfs-2.7.4.jar
> -hadoop-mapreduce-client-app/2.7.4//hadoop-mapreduce-client-app-2.7.4.jar
> 

[jira] [Updated] (SPARK-39066) Update to Maven 3.8.6 / fix problem with dev/test-dependencies.sh in 3.8.5

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-39066:
-
Issue Type: Improvement  (was: Bug)

> Update to Maven 3.8.6 / fix problem with dev/test-dependencies.sh in 3.8.5
> --
>
> Key: SPARK-39066
> URL: https://issues.apache.org/jira/browse/SPARK-39066
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
> Attachments: hadoop2-deps.txt, spark-deps-hadoop-2-hive-2.3
>
>
> Run `./dev/test-dependencies.sh --replace-manifest ` use maven 3.8.5 produce 
> wrong result:
> {code:java}
> diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
> b/dev/deps/spark-deps-hadoop-2-hive-2.3
> index b6df3ea5ce..e803aadcfc 100644
> --- a/dev/deps/spark-deps-hadoop-2-hive-2.3
> +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
> @@ -6,15 +6,14 @@ ST4/4.0.4//ST4-4.0.4.jar
>  activation/1.1.1//activation-1.1.1.jar
>  aircompressor/0.21//aircompressor-0.21.jar
>  algebra_2.12/2.0.1//algebra_2.12-2.0.1.jar
> +aliyun-java-sdk-core/4.5.10//aliyun-java-sdk-core-4.5.10.jar
> +aliyun-java-sdk-kms/2.11.0//aliyun-java-sdk-kms-2.11.0.jar
> +aliyun-java-sdk-ram/3.1.0//aliyun-java-sdk-ram-3.1.0.jar
> +aliyun-sdk-oss/3.13.0//aliyun-sdk-oss-3.13.0.jar
>  annotations/17.0.0//annotations-17.0.0.jar
>  antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
>  antlr4-runtime/4.8//antlr4-runtime-4.8.jar
>  aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
> -aopalliance/1.0//aopalliance-1.0.jar
> -apacheds-i18n/2.0.0-M15//apacheds-i18n-2.0.0-M15.jar
> -apacheds-kerberos-codec/2.0.0-M15//apacheds-kerberos-codec-2.0.0-M15.jar
> -api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
> -api-util/1.0.0-M20//api-util-1.0.0-M20.jar
>  arpack/2.2.1//arpack-2.2.1.jar
>  arpack_combined_all/0.1//arpack_combined_all-0.1.jar
>  arrow-format/7.0.0//arrow-format-7.0.0.jar
> @@ -26,7 +25,10 @@ automaton/1.11-8//automaton-1.11-8.jar
>  avro-ipc/1.11.0//avro-ipc-1.11.0.jar
>  avro-mapred/1.11.0//avro-mapred-1.11.0.jar
>  avro/1.11.0//avro-1.11.0.jar
> -azure-storage/2.0.0//azure-storage-2.0.0.jar
> +aws-java-sdk-bundle/1.11.1026//aws-java-sdk-bundle-1.11.1026.jar
> +azure-data-lake-store-sdk/2.3.9//azure-data-lake-store-sdk-2.3.9.jar
> +azure-keyvault-core/1.0.0//azure-keyvault-core-1.0.0.jar
> +azure-storage/7.0.1//azure-storage-7.0.1.jar
>  blas/2.2.1//blas-2.2.1.jar
>  bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
>  breeze-macros_2.12/1.2//breeze-macros_2.12-1.2.jar
> @@ -34,28 +36,24 @@ breeze_2.12/1.2//breeze_2.12-1.2.jar
>  cats-kernel_2.12/2.1.1//cats-kernel_2.12-2.1.1.jar
>  chill-java/0.10.0//chill-java-0.10.0.jar
>  chill_2.12/0.10.0//chill_2.12-0.10.0.jar
> -commons-beanutils/1.9.4//commons-beanutils-1.9.4.jar
>  commons-cli/1.5.0//commons-cli-1.5.0.jar
>  commons-codec/1.15//commons-codec-1.15.jar
>  commons-collections/3.2.2//commons-collections-3.2.2.jar
>  commons-collections4/4.4//commons-collections4-4.4.jar
>  commons-compiler/3.0.16//commons-compiler-3.0.16.jar
>  commons-compress/1.21//commons-compress-1.21.jar
> -commons-configuration/1.6//commons-configuration-1.6.jar
>  commons-crypto/1.1.0//commons-crypto-1.1.0.jar
>  commons-dbcp/1.4//commons-dbcp-1.4.jar
> -commons-digester/1.8//commons-digester-1.8.jar
> -commons-httpclient/3.1//commons-httpclient-3.1.jar
>  commons-io/2.4//commons-io-2.4.jar
>  commons-lang/2.6//commons-lang-2.6.jar
>  commons-lang3/3.12.0//commons-lang3-3.12.0.jar
>  commons-logging/1.1.3//commons-logging-1.1.3.jar
>  commons-math3/3.6.1//commons-math3-3.6.1.jar
> -commons-net/3.1//commons-net-3.1.jar
>  commons-pool/1.5.4//commons-pool-1.5.4.jar
>  commons-text/1.9//commons-text-1.9.jar
>  compress-lzf/1.1//compress-lzf-1.1.jar
>  core/1.1.2//core-1.1.2.jar
> +cos_api-bundle/5.6.19//cos_api-bundle-5.6.19.jar
>  curator-client/2.7.1//curator-client-2.7.1.jar
>  curator-framework/2.7.1//curator-framework-2.7.1.jar
>  curator-recipes/2.7.1//curator-recipes-2.7.1.jar
> @@ -69,25 +67,17 @@ generex/1.0.2//generex-1.0.2.jar
>  gmetric4j/1.0.10//gmetric4j-1.0.10.jar
>  gson/2.2.4//gson-2.2.4.jar
>  guava/14.0.1//guava-14.0.1.jar
> -guice-servlet/3.0//guice-servlet-3.0.jar
> -guice/3.0//guice-3.0.jar
> -hadoop-annotations/2.7.4//hadoop-annotations-2.7.4.jar
> -hadoop-auth/2.7.4//hadoop-auth-2.7.4.jar
> -hadoop-aws/2.7.4//hadoop-aws-2.7.4.jar
> -hadoop-azure/2.7.4//hadoop-azure-2.7.4.jar
> -hadoop-client/2.7.4//hadoop-client-2.7.4.jar
> -hadoop-common/2.7.4//hadoop-common-2.7.4.jar
> -hadoop-hdfs/2.7.4//hadoop-hdfs-2.7.4.jar
> -hadoop-mapreduce-client-app/2.7.4//hadoop-mapreduce-client-app-2.7.4.jar
> -hadoop-mapreduce-client-common/2.7.4//hadoop-mapreduce-client-common-2.7.4.jar
> 

[jira] [Updated] (SPARK-39066) Update to Maven 3.8.6 / fix problem with dev/test-dependencies.sh in 3.8.5

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-39066:
-
Priority: Minor  (was: Major)

> Update to Maven 3.8.6 / fix problem with dev/test-dependencies.sh in 3.8.5
> --
>
> Key: SPARK-39066
> URL: https://issues.apache.org/jira/browse/SPARK-39066
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
> Attachments: hadoop2-deps.txt, spark-deps-hadoop-2-hive-2.3
>
>
> Run `./dev/test-dependencies.sh --replace-manifest ` use maven 3.8.5 produce 
> wrong result:
> {code:java}
> diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
> b/dev/deps/spark-deps-hadoop-2-hive-2.3
> index b6df3ea5ce..e803aadcfc 100644
> --- a/dev/deps/spark-deps-hadoop-2-hive-2.3
> +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
> @@ -6,15 +6,14 @@ ST4/4.0.4//ST4-4.0.4.jar
>  activation/1.1.1//activation-1.1.1.jar
>  aircompressor/0.21//aircompressor-0.21.jar
>  algebra_2.12/2.0.1//algebra_2.12-2.0.1.jar
> +aliyun-java-sdk-core/4.5.10//aliyun-java-sdk-core-4.5.10.jar
> +aliyun-java-sdk-kms/2.11.0//aliyun-java-sdk-kms-2.11.0.jar
> +aliyun-java-sdk-ram/3.1.0//aliyun-java-sdk-ram-3.1.0.jar
> +aliyun-sdk-oss/3.13.0//aliyun-sdk-oss-3.13.0.jar
>  annotations/17.0.0//annotations-17.0.0.jar
>  antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
>  antlr4-runtime/4.8//antlr4-runtime-4.8.jar
>  aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
> -aopalliance/1.0//aopalliance-1.0.jar
> -apacheds-i18n/2.0.0-M15//apacheds-i18n-2.0.0-M15.jar
> -apacheds-kerberos-codec/2.0.0-M15//apacheds-kerberos-codec-2.0.0-M15.jar
> -api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
> -api-util/1.0.0-M20//api-util-1.0.0-M20.jar
>  arpack/2.2.1//arpack-2.2.1.jar
>  arpack_combined_all/0.1//arpack_combined_all-0.1.jar
>  arrow-format/7.0.0//arrow-format-7.0.0.jar
> @@ -26,7 +25,10 @@ automaton/1.11-8//automaton-1.11-8.jar
>  avro-ipc/1.11.0//avro-ipc-1.11.0.jar
>  avro-mapred/1.11.0//avro-mapred-1.11.0.jar
>  avro/1.11.0//avro-1.11.0.jar
> -azure-storage/2.0.0//azure-storage-2.0.0.jar
> +aws-java-sdk-bundle/1.11.1026//aws-java-sdk-bundle-1.11.1026.jar
> +azure-data-lake-store-sdk/2.3.9//azure-data-lake-store-sdk-2.3.9.jar
> +azure-keyvault-core/1.0.0//azure-keyvault-core-1.0.0.jar
> +azure-storage/7.0.1//azure-storage-7.0.1.jar
>  blas/2.2.1//blas-2.2.1.jar
>  bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
>  breeze-macros_2.12/1.2//breeze-macros_2.12-1.2.jar
> @@ -34,28 +36,24 @@ breeze_2.12/1.2//breeze_2.12-1.2.jar
>  cats-kernel_2.12/2.1.1//cats-kernel_2.12-2.1.1.jar
>  chill-java/0.10.0//chill-java-0.10.0.jar
>  chill_2.12/0.10.0//chill_2.12-0.10.0.jar
> -commons-beanutils/1.9.4//commons-beanutils-1.9.4.jar
>  commons-cli/1.5.0//commons-cli-1.5.0.jar
>  commons-codec/1.15//commons-codec-1.15.jar
>  commons-collections/3.2.2//commons-collections-3.2.2.jar
>  commons-collections4/4.4//commons-collections4-4.4.jar
>  commons-compiler/3.0.16//commons-compiler-3.0.16.jar
>  commons-compress/1.21//commons-compress-1.21.jar
> -commons-configuration/1.6//commons-configuration-1.6.jar
>  commons-crypto/1.1.0//commons-crypto-1.1.0.jar
>  commons-dbcp/1.4//commons-dbcp-1.4.jar
> -commons-digester/1.8//commons-digester-1.8.jar
> -commons-httpclient/3.1//commons-httpclient-3.1.jar
>  commons-io/2.4//commons-io-2.4.jar
>  commons-lang/2.6//commons-lang-2.6.jar
>  commons-lang3/3.12.0//commons-lang3-3.12.0.jar
>  commons-logging/1.1.3//commons-logging-1.1.3.jar
>  commons-math3/3.6.1//commons-math3-3.6.1.jar
> -commons-net/3.1//commons-net-3.1.jar
>  commons-pool/1.5.4//commons-pool-1.5.4.jar
>  commons-text/1.9//commons-text-1.9.jar
>  compress-lzf/1.1//compress-lzf-1.1.jar
>  core/1.1.2//core-1.1.2.jar
> +cos_api-bundle/5.6.19//cos_api-bundle-5.6.19.jar
>  curator-client/2.7.1//curator-client-2.7.1.jar
>  curator-framework/2.7.1//curator-framework-2.7.1.jar
>  curator-recipes/2.7.1//curator-recipes-2.7.1.jar
> @@ -69,25 +67,17 @@ generex/1.0.2//generex-1.0.2.jar
>  gmetric4j/1.0.10//gmetric4j-1.0.10.jar
>  gson/2.2.4//gson-2.2.4.jar
>  guava/14.0.1//guava-14.0.1.jar
> -guice-servlet/3.0//guice-servlet-3.0.jar
> -guice/3.0//guice-3.0.jar
> -hadoop-annotations/2.7.4//hadoop-annotations-2.7.4.jar
> -hadoop-auth/2.7.4//hadoop-auth-2.7.4.jar
> -hadoop-aws/2.7.4//hadoop-aws-2.7.4.jar
> -hadoop-azure/2.7.4//hadoop-azure-2.7.4.jar
> -hadoop-client/2.7.4//hadoop-client-2.7.4.jar
> -hadoop-common/2.7.4//hadoop-common-2.7.4.jar
> -hadoop-hdfs/2.7.4//hadoop-hdfs-2.7.4.jar
> -hadoop-mapreduce-client-app/2.7.4//hadoop-mapreduce-client-app-2.7.4.jar
> -hadoop-mapreduce-client-common/2.7.4//hadoop-mapreduce-client-common-2.7.4.jar
> -hadoop-mapreduce-client-core/2.7.4//hadoop-mapreduce-client-core-2.7.4.jar
> 

[jira] [Resolved] (SPARK-39441) Speed up DeduplicateRelations

2022-06-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39441.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36837
[https://github.com/apache/spark/pull/36837]

> Speed up DeduplicateRelations
> -
>
> Key: SPARK-39441
> URL: https://issues.apache.org/jira/browse/SPARK-39441
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Speed up the Analyzer rule DeduplicateRelations



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39441) Speed up DeduplicateRelations

2022-06-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-39441:
---

Assignee: Allison Wang

> Speed up DeduplicateRelations
> -
>
> Key: SPARK-39441
> URL: https://issues.apache.org/jira/browse/SPARK-39441
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> Speed up the Analyzer rule DeduplicateRelations



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39460) Fix CoarseGrainedSchedulerBackendSuite to handle fast allocations

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-39460.
---
Fix Version/s: 3.4.0
 Assignee: Dongjoon Hyun
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/36860

> Fix CoarseGrainedSchedulerBackendSuite to handle fast allocations
> -
>
> Key: SPARK-39460
> URL: https://issues.apache.org/jira/browse/SPARK-39460
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39151) Need Support for avro 1.11.0 in apache spark

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-39151.
--
Resolution: Duplicate

Spark already uses 1.11.0

> Need Support for avro 1.11.0 in apache spark
> 
>
> Key: SPARK-39151
> URL: https://issues.apache.org/jira/browse/SPARK-39151
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Amit Agarwal
>Priority: Minor
>
> There is apache avro foss compliance issue in older version of avro, 1.8.2, 
> this is resolved from1.10.0 but we needed newer version 1.11.0 to be support 
> in spark, since spark is based on 1.8.2 and 1.11.0 has major changes with 
> respect to java.time.* package, our manual change of version upgrade is not 
> working. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39146) The singleton Jackson ObjectMapper should be preferred

2022-06-13 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553828#comment-17553828
 ] 

Sean R. Owen commented on SPARK-39146:
--

Sounds good, do you want to open a PR?

> The singleton Jackson ObjectMapper should be preferred
> --
>
> Key: SPARK-39146
> URL: https://issues.apache.org/jira/browse/SPARK-39146
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> I write a mirco-benchmark to test Jackson ObjectWriter read and write:
> [https://github.com/LuciferYang/spark/blob/objectMapper/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/JacksonBenchmark.scala]
> and run this use GA:
>  
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1022-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Test create ObjectMapper:                 Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> 
> Test create ObjectMapper                            648            652        
>    4          0.0       64819.0       1.0XOpenJDK 64-Bit Server VM 
> 1.8.0_332-b09 on Linux 5.13.0-1022-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Test write map to json:                   Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> 
> Test Multiple                                      2116           2127        
>   15          0.0      211556.5       1.0X
> Test Single                                           4              4        
>    0          2.4         416.1     508.4XOpenJDK 64-Bit Server VM 
> 1.8.0_332-b09 on Linux 5.13.0-1022-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Test read json to map:                    Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> 
> Test Multiple                                      8848           8867        
>   27          0.0      884776.2       1.0X
> Test Single   {code}
>  
>  
> From the test results, we should use singleton Jackson ObjectMapper, because 
> it seems expensive to new a ObjectMapper instance.
>  
> The following code in Spark not use singleton:
>  
> {code:java}
> common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
> core/src/main/scala/org/apache/spark/status/api/v1/JacksonMessageWriter.scala
> core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala
> resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
> sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2.scala
> sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
>  {code}
> can find hotpath and fix them



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39151) Need Support for avro 1.11.0 in apache spark

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-39151:
-
Issue Type: Improvement  (was: Bug)
  Priority: Minor  (was: Major)

> Need Support for avro 1.11.0 in apache spark
> 
>
> Key: SPARK-39151
> URL: https://issues.apache.org/jira/browse/SPARK-39151
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Amit Agarwal
>Priority: Minor
>
> There is apache avro foss compliance issue in older version of avro, 1.8.2, 
> this is resolved from1.10.0 but we needed newer version 1.11.0 to be support 
> in spark, since spark is based on 1.8.2 and 1.11.0 has major changes with 
> respect to java.time.* package, our manual change of version upgrade is not 
> working. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39269) spark3.2.0 commit tmp file is not found when rename

2022-06-13 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553826#comment-17553826
 ] 

Sean R. Owen commented on SPARK-39269:
--

I think this just isn't actionable - not clear how it is reproduced or what you 
are saying the problem is

> spark3.2.0 commit tmp file is not found when rename 
> 
>
> Key: SPARK-39269
> URL: https://issues.apache.org/jira/browse/SPARK-39269
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL, Structured Streaming
>Affects Versions: 3.2.0
> Environment: spark 3.2.0
> yarn
> 2 executors and 1 driver
> a job include of 4 stream query
>Reporter: cxb
>Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> a job include 4 stream query and one of query throw offset tmp file is not 
> found in running that lead to exit the job
> but it hasn't happen to me ever when i using spark 3.0.0
> i look at code of implement in spark3.2 that it is not big different to 
> spark3.0
> maybe jackson of new version problem?
>  
> {code:java}
> java.io.FileNotFoundException: rename source 
> /tmp/chenxiaobin/regist_gp_bmhb_v2/commits/.35362.b4684b94-c0bb-4d87-baf0-cd1a508d7be7.tmp
>  is not found.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.validateRenameSource(FSDirRenameOp.java:561)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:361)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:300)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:247)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3931)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename2(NameNodeRpcServer.java:1039)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename2(ClientNamenodeProtocolServerSideTranslatorPB.java:610)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1991)
>   at org.apache.hadoop.fs.Hdfs.renameInternal(Hdfs.java:341)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:690)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958)
>   at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.renameTempFile(CheckpointFileManager.scala:346)
>   at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.close(CheckpointFileManager.scala:154)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.$anonfun$addNewBatchByStream$2(HDFSMetadataLog.scala:176)
>   at 
> scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.addNewBatchByStream(HDFSMetadataLog.scala:171)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:116)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$18(MicroBatchExecution.scala:615)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:627)
>   at 
> 

[jira] [Commented] (SPARK-39302) Spark SQL - wrong field selection in group by

2022-06-13 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553825#comment-17553825
 ] 

Sean R. Owen commented on SPARK-39302:
--

I don't understand this - you didn't show a query with a group by?

> Spark SQL - wrong field selection in group by
> -
>
> Key: SPARK-39302
> URL: https://issues.apache.org/jira/browse/SPARK-39302
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: shaharl...@gmail.com
>Priority: Minor
>
> SQL parser select wrong field to group by.
> See following example.
> Sample data:
>  
> {code:java}
> spark.sql("""
>             select "US" as pv_countryCode
>             union all
>             select "IL" as pv_countryCode
>          """).createOrReplaceTempView("my_test_data")
> spark.sql("""
>             select -3 as id, "US" as countryCode
>          """).createOrReplaceTempView("country_codes_sample")
>           {code}
> code:
> {code:java}
> spark.sql("""
>             select "US" as pv_countryCode
>             union all
>             select "IL" as pv_countryCode
>          """).createOrReplaceTempView("my_test_data")
> spark.sql("""
>             select -3 as id, "US" as countryCode
>          """).createOrReplaceTempView("country_codes_sample")
>           {code}
> Error:
> {code:java}
> org.apache.spark.sql.AnalysisException: expression 'cct.`id`' is neither 
> present in the group by, nor is it an aggregate function. Add to group by or 
> wrap in first() (or first_value) if you don't care which value you get.;
> Aggregate [country_id#910], [coalesce(id#886, -3) AS country_id#908, count(1) 
> AS count#909L]
> +- Join LeftOuter, (countryCode#887 = country_id#910)
>:- SubqueryAlias BASE
>:  +- Project [pv_countryCode#883 AS country_id#910]
>: +- SubqueryAlias my_test_data
>:+- Union false, false
>:   :- Project [US AS pv_countryCode#883]
>:   :  +- OneRowRelation
>:   +- Project [IL AS pv_countryCode#884]
>:  +- OneRowRelation
>+- SubqueryAlias cct
>   +- SubqueryAlias country_codes_sample
>  +- Project [-3 AS id#886, US AS countryCode#887]
> +- OneRowRelation {code}
> I expected Spark to choose the selected country_id(country_id#908) instead of 
> country_id#910
> Or at least throw ambiguous exception when grouping by `country_id`.
> This lead developers to add `cct.id` into group by which results in 
> unexpected results.
> (In case country_id has both null and -3 values)
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39388) Reuse orcSchema when push down Orc predicates

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-39388:
-
Summary: Reuse orcSchema when push down Orc predicates  (was: Reuse 
orcScheam when push down Orc predicates)

> Reuse orcSchema when push down Orc predicates
> -
>
> Key: SPARK-39388
> URL: https://issues.apache.org/jira/browse/SPARK-39388
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> `OrcUtils.readCatalystSchema` method will trigger a file IO to read 
> `orcSchema` when Orc pushes down predicates.
> After SPARK-37463:
> -  For `OrcFileFormat`, we can reuse [a read 
> `orcSchema`](https://github.com/apache/spark/blob/cc0bf563b8caea21da5692f05e34b5f77e002ab9/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala#L146)
>  to save this file read IO when Orc pushes down predicates
> - For `OrcPartitionReaderFactory`  we can achieve the same goal by moving 
> `OrcFile.createReader` before `pushDownPredicates`.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39433) to_date function returns a null for the first week of the year

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-39433:
-
Priority: Minor  (was: Major)

Not sure; "w" is not supported, it seems: 
https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

Yet it sort of works. The cases that return null have the first day of the 
first week of the year before the start of the year (at least, that's what Java 
does with it).

[~maxgekk] do you happen to know?

> to_date function returns a null for the first week of the year
> --
>
> Key: SPARK-39433
> URL: https://issues.apache.org/jira/browse/SPARK-39433
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.2
>Reporter: CHARLES HOGG
>Priority: Minor
>
> When I use week of year in the to_date function, the first week of the year 
> returns a null for many years.
> ```
> df=pyrasa.sparkSession.createDataFrame([["2013-01"],["2013-02"],["2017-01"],["2018-01"]],["input"])
> df.select(func.col("input"),func.to_date(func.col("input"),"-ww").alias("date"))
>  \
>   .show()
> ```
> ```
> +---+--+
> |  input|      date|
> +---+--+
> |2013-01|      null|
> |2013-02|2013-01-06|
> |2017-01|2017-01-01|
> |2018-01|      null|
> +---+--+
> ```
> Why is this? Is it a bug in the to_date function?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39436) graph.connectedComponents(maxIterations) get ArrayIndexOutOfBoundsException: -1

2022-06-13 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553823#comment-17553823
 ] 

Sean R. Owen commented on SPARK-39436:
--

This is not actionable. There is no detail on the error or reproduction.

> graph.connectedComponents(maxIterations) get ArrayIndexOutOfBoundsException: 
> -1
> ---
>
> Key: SPARK-39436
> URL: https://issues.apache.org/jira/browse/SPARK-39436
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.4.3
> Environment: val graph = Graph(vertices, 
> edges).partitionBy(PartitionStrategy.RandomVertexCut)
> The following exceptions are reported during execution,There is no problem 
> when the data volume is small, but an error will be reported when the data 
> volume is large
> h1. java.lang.ArrayIndexOutOfBoundsException: -1
>Reporter: hanyingjun
>Priority: Major
>
> val graph = Graph(vertices, 
> edges).partitionBy(PartitionStrategy.RandomVertexCut)
> The following exceptions are reported during execution,There is no problem 
> when the data volume is small, but an error will be reported when the data 
> volume is large
> h1. java.lang.ArrayIndexOutOfBoundsException: -1



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39435) graph.connectedComponents(maxIterations) get ArrayIndexOutOfBoundsException: -1

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-39435.
--
Resolution: Duplicate

> graph.connectedComponents(maxIterations) get ArrayIndexOutOfBoundsException: 
> -1
> ---
>
> Key: SPARK-39435
> URL: https://issues.apache.org/jira/browse/SPARK-39435
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.4.3
> Environment: val graph = Graph(vertices, 
> edges).partitionBy(PartitionStrategy.RandomVertexCut)
> The following exceptions are reported during execution,There is no problem 
> when the data volume is small, but an error will be reported when the data 
> volume is large
> h1. java.lang.ArrayIndexOutOfBoundsException: -1
>Reporter: hanyingjun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37865) Spark should not dedup the groupingExpressions when the first child of Union has duplicate columns

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-37865:
-
Fix Version/s: (was: 3.1.3)
   (was: 3.0.4)
   (was: 3.3.0)
   (was: 3.2.2)

> Spark should not dedup the groupingExpressions when the first child of Union 
> has duplicate columns
> --
>
> Key: SPARK-37865
> URL: https://issues.apache.org/jira/browse/SPARK-37865
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Chao Gao
>Assignee: Karen Feng
>Priority: Major
>
> When the first child of Union has duplicate columns like select a, a from t1 
> union select a, b from t2, spark only use the first column to aggregate the 
> results, which would make the results incorrect, and this behavior is 
> inconsistent with other engines like PostgreSQL, MySQL. We could alias the 
> attribute of the first child of union to resolve this, or you could argue 
> that this is the feature of Spark SQL.
> sample query:
> select
> a,
> a
> from values (1, 1), (1, 2) as t1(a, b)
> UNION
> SELECT
> a,
> b
> from values (1, 1), (1, 2) as t2(a, b)
> result is
> (1,1)
> result from PostgreSQL and MySQL
> (1,1)
> (1,2)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37865) Spark should not dedup the groupingExpressions when the first child of Union has duplicate columns

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-37865:
-
Fix Version/s: 3.1.3
   3.0.4
   3.3.0
   3.2.2

> Spark should not dedup the groupingExpressions when the first child of Union 
> has duplicate columns
> --
>
> Key: SPARK-37865
> URL: https://issues.apache.org/jira/browse/SPARK-37865
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Chao Gao
>Assignee: Karen Feng
>Priority: Major
> Fix For: 3.1.3, 3.0.4, 3.3.0, 3.2.2
>
>
> When the first child of Union has duplicate columns like select a, a from t1 
> union select a, b from t2, spark only use the first column to aggregate the 
> results, which would make the results incorrect, and this behavior is 
> inconsistent with other engines like PostgreSQL, MySQL. We could alias the 
> attribute of the first child of union to resolve this, or you could argue 
> that this is the feature of Spark SQL.
> sample query:
> select
> a,
> a
> from values (1, 1), (1, 2) as t1(a, b)
> UNION
> SELECT
> a,
> b
> from values (1, 1), (1, 2) as t2(a, b)
> result is
> (1,1)
> result from PostgreSQL and MySQL
> (1,1)
> (1,2)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39345) Replace `filter` negation condition with `filterNot`

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-39345.
--
Resolution: Won't Fix

> Replace `filter` negation condition with `filterNot`
> 
>
> Key: SPARK-39345
> URL: https://issues.apache.org/jira/browse/SPARK-39345
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams, MLlib, Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> {code:java}
> seq.filter(!condition) {code}
>  ->
> {code:java}
> seq.filterNot(condition)  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39289) Replace map.getOrElse(false/true) with exists/forall

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-39289.
--
Resolution: Won't Fix

> Replace map.getOrElse(false/true) with exists/forall
> 
>
> Key: SPARK-39289
> URL: https://issues.apache.org/jira/browse/SPARK-39289
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: qian
>Priority: Trivial
>
> Replace _map(_.toBoolean).getOrElse(false)_ with _exists(_.toBoolean)_
> Replace _map(_.toBoolean).getOrElse(true)_ with _exists(_.toBoolean)_



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39461) Print `SPARK_LOCAL_(HOSTNAME|IP)` in `build/{mvn|sbt}`

2022-06-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39461:


Assignee: (was: Apache Spark)

> Print `SPARK_LOCAL_(HOSTNAME|IP)` in `build/{mvn|sbt}`
> --
>
> Key: SPARK-39461
> URL: https://issues.apache.org/jira/browse/SPARK-39461
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38422) Encryption algorithms should be used with secure mode and padding scheme

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38422.
--
Resolution: Not A Problem

> Encryption algorithms should be used with secure mode and padding scheme
> 
>
> Key: SPARK-38422
> URL: https://issues.apache.org/jira/browse/SPARK-38422
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> I have scanned java files with Sonarqube and in 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
> {code:java}
> try {
>   if (mode.equalsIgnoreCase("ECB") &&
>   (padding.equalsIgnoreCase("PKCS") || 
> padding.equalsIgnoreCase("DEFAULT"))) {
> Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding");
> {code}
> Encryption operation mode and the padding scheme should be chosen 
> appropriately to guarantee data confidentiality, integrity and authenticity:
> For block cipher encryption algorithms (like AES):
> The GCM (Galois Counter Mode) mode which works internally with zero/no 
> padding scheme, is recommended, as it is designed to provide both data 
> authenticity (integrity) and confidentiality. Other similar modes are CCM, 
> CWC, EAX, IAPM and OCB.
> The CBC (Cipher Block Chaining) mode by itself provides only data 
> confidentiality, it’s recommended to use it along with Message Authentication 
> Code or similar to achieve data authenticity (integrity) too and thus to 
> prevent padding oracle attacks.
> The ECB (Electronic Codebook) mode doesn’t provide serious message 
> confidentiality: under a given key any given plaintext block always gets 
> encrypted to the same ciphertext block. This mode should not be used.
> For RSA encryption algorithm, the recommended padding scheme is OAEP.
> [OWASP Top 10 2021|https://owasp.org/Top10/A02_2021-Cryptographic_Failures/] 
> Category A2 - Cryptographic Failures
> [OWASP Top 10 
> 2017|https://owasp.org/www-project-top-ten/2017/A6_2017-Security_Misconfiguration.html]
>  Category A6 - Security Misconfiguration
> [Mobile 
> AppSec|https://mobile-security.gitbook.io/masvs/security-requirements/0x08-v3-cryptography_verification_requirements]
>  Verification Standard - Cryptography Requirements
> [OWASP Mobile Top 10 
> 2016|https://owasp.org/www-project-mobile-top-10/2016-risks/m5-insufficient-cryptography]
>  Category M5 - Insufficient Cryptography
> [MITRE, CWE-327|https://cwe.mitre.org/data/definitions/327.html]  - Use of a 
> Broken or Risky Cryptographic Algorithm
> [CERT, 
> MSC61-J.|https://wiki.sei.cmu.edu/confluence/display/java/MSC61-J.+Do+not+use+insecure+or+weak+cryptographic+algorithms]
>  - Do not use insecure or weak cryptographic algorithms
> [SANS Top 25|https://www.sans.org/top25-software-errors/#cat3] - Porous 
> Defenses



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38801) ISO-8859-1 encoding doesn't work for text format

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38801.
--
Resolution: Won't Fix

> ISO-8859-1 encoding doesn't work for text format
> 
>
> Key: SPARK-38801
> URL: https://issues.apache.org/jira/browse/SPARK-38801
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
> Environment: Databricks runtime 10.3 (spark 3.2.1, scala 2.12)
>Reporter: Himanshu Arora
>Priority: Major
> Attachments: Screenshot 2022-04-06 at 09.29.24.png, Screenshot 
> 2022-04-06 at 09.30.02.png
>
>
> When reading text files from spark which are not in UTF-8 charset it doesn't 
> work well for foreign language characters (for ex. French chars like è and 
> é). They are all replaced by �. In my case the text files were in ISO-8859-1 
> encoding.
> After digging into docs, it seems that spark still uses Hadoop's 
> LineRecordReader class for text format which only supports UTF-8. Here's the 
> source code of that class: 
> [LineRecordReader.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java#L154]
>  
> You can see this issue in the screenshot below:
> !Screenshot 2022-04-06 at 09.29.24.png!
> As you can see the French word *données* is read as {*}donn�es{*}. The work 
> *Clôturé* is read as *Cl�tur�.*
>  
> I also read the same text file as CSV format while providing the correct 
> charset value and it works fine in this case as you can see the screenshot 
> below:
> !Screenshot 2022-04-06 at 09.30.02.png!
>  
> So this issue is specifically for text format. Therefore reporting this 
> issue. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39461) Print `SPARK_LOCAL_(HOSTNAME|IP)` in `build/{mvn|sbt}`

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553821#comment-17553821
 ] 

Apache Spark commented on SPARK-39461:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36862

> Print `SPARK_LOCAL_(HOSTNAME|IP)` in `build/{mvn|sbt}`
> --
>
> Key: SPARK-39461
> URL: https://issues.apache.org/jira/browse/SPARK-39461
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39461) Print `SPARK_LOCAL_(HOSTNAME|IP)` in `build/{mvn|sbt}`

2022-06-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39461:


Assignee: Apache Spark

> Print `SPARK_LOCAL_(HOSTNAME|IP)` in `build/{mvn|sbt}`
> --
>
> Key: SPARK-39461
> URL: https://issues.apache.org/jira/browse/SPARK-39461
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38858) PythonException - socke.timeout: timed out - socket.py line 707

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38858.
--
Resolution: Not A Problem

> PythonException - socke.timeout: timed out - socket.py line 707
> ---
>
> Key: SPARK-38858
> URL: https://issues.apache.org/jira/browse/SPARK-38858
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.2.1
> Environment: Intel i7 core
> 64Gb ram ( 30Gb assigned to spark executor memory)
> 4 cores
> Windows 11
> Working in jupyter notebook
> Python version - 3.9.7
> Apache Spark version - 3.2.1
>Reporter: Alex Veale
>Priority: Major
>  Labels: test
> Attachments: socketError - timed out.png, socketpy.png
>
>
> I have a database of about 8 million residential addresses address. I perform 
> 3 separate cleaning operations on the data using udf's and regular 
> expressions (python re package). Then I create an additional column by 
> splitting the 'cleaned' address by commas and then taking the object in the 
> last index as the suburb and use this column as a key to joining the original 
> data frame to a supplementary 1 which contains suburb and country pairs, 
> joining on suburb and then finally create another column containing the final 
> address with the 'unsplit clean' address column concatenated with the country 
> column pulled in by the join. 
> When I try to display the result by calling show, I get the desired result if 
> I show only the first 1000 records or less, however if I try to show more 
> records or I add an additional filter to only display records that have been 
> modified, I get a socket timeout error.
> I have tried to increase the socket's send and receive buffer sizes to the 
> maximum of 1048576 bytes, as well as increasing the spark executor heartbeat 
> interval (7200s )as well as the spark network timeout (3600s); and I have 
> tried repartitioning the data to 16 and 32 partitions, all of which have had 
> no impact on the result.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38884) java.util.NoSuchElementException: key not found: numPartitions

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38884.
--
Resolution: Won't Fix

> java.util.NoSuchElementException: key not found: numPartitions
> --
>
> Key: SPARK-38884
> URL: https://issues.apache.org/jira/browse/SPARK-38884
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: hadoop 3.1.1
> spark 3.0.1
>Reporter: chopperChen
>Priority: Major
>
> When running function spark.sql("sql").isEmpty, the logs print 
> {*}_java.util.NoSuchElementException: key not found: numPartitions_{*}.
> My sql like:
>  
> {code:java}
> // hr is a partition column
> select * from (select col1, '24' as hr from table1
>union all select col1, '2' as hr from table2
>union all select col1, hr from table3) df1
> inner join (select col1, '24' as hr from table4
> union all select col1, '2' as hr from table5
> union all select col1, hr from table6) df2
> on df1.col1=df2.col1
> {code}
>  
> *exception:*
> Caused by: java.util.NoSuchElementException: key not found: numPartitions
>     at scala.collection.MapLike.default(MapLike.scala:235)
>     at scala.collection.MapLike.default$(MapLike.scala:234)
>     at scala.collection.AbstractMap.default(Map.scala:63)
>     at scala.collection.MapLike.apply(MapLike.scala:144)
>     at scala.collection.MapLike.apply$(MapLike.scala:143)
>     at scala.collection.AbstractMap.apply(Map.scala:63)
>     at 
> org.apache.spark.sql.execution.FileSourceScanExec.$anonfun$sendDriverMetrics$1(DataSourceScanExec.scala:197)
>     at 
> org.apache.spark.sql.execution.FileSourceScanExec.$anonfun$sendDriverMetrics$1$adapted(DataSourceScanExec.scala:197)
>     at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>     at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>     at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>     at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>     at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>     at 
> org.apache.spark.sql.execution.FileSourceScanExec.sendDriverMetrics(DataSourceScanExec.scala:197)
>     at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:407)
>     at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:390)
>     at 
> org.apache.spark.sql.execution.FileSourceScanExec.doExecuteColumnar(DataSourceScanExec.scala:485)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>     at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
>     at 
> org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
>     at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteColumnar(WholeStageCodegenExec.scala:519)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>     at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
>     at 
> org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
>     at 
> org.apache.spark.sql.execution.ColumnarToRowExec.inputRDDs(Columnar.scala:196)
>     at 
> org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:133)
>     at 
> org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:47)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>     at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
>     at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
>     at 
> org.apache.spark.sql.execution.UnionExec.$anonfun$doExecute$5(basicPhysicalOperators.scala:644)
>     at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>     at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>     at 
> 

[jira] [Resolved] (SPARK-38265) Update comments of ExecutorAllocationClient

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38265.
--
Resolution: Won't Fix

> Update comments of ExecutorAllocationClient
> ---
>
> Key: SPARK-38265
> URL: https://issues.apache.org/jira/browse/SPARK-38265
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Shockang
>Priority: Trivial
>
> The class comment of ExecutorAllocationClient is out of date.
> {code:java}
> This is currently supported only in YARN mode. {code}
> Nowadays, this is supported in the following modes: Spark's Standalone, 
> YARN-Client, YARN-Cluster, Mesos, Kubernetes.
>  
> In my opinion, this comment should be updated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38919) Filter stale event log before parse logs

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38919.
--
Resolution: Invalid

> Filter stale event log before parse logs
> 
>
> Key: SPARK-38919
> URL: https://issues.apache.org/jira/browse/SPARK-38919
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: shuyouZZ
>Priority: Major
>
> We should filter stale event logs before parse and replay logs if enable 
> clean logs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38539) Add document for ToNumber for DecimalFormat's compatibility

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38539.
--
Resolution: Not A Problem

> Add document for ToNumber for DecimalFormat's compatibility
> ---
>
> Key: SPARK-38539
> URL: https://issues.apache.org/jira/browse/SPARK-38539
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38539) Add document for ToNumber for DecimalFormat's compatibility

2022-06-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-38539:
-
Priority: Trivial  (was: Major)

> Add document for ToNumber for DecimalFormat's compatibility
> ---
>
> Key: SPARK-38539
> URL: https://issues.apache.org/jira/browse/SPARK-38539
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39461) Print `SPARK_LOCAL_(HOSTNAME|IP)` in `build/{mvn|sbt}`

2022-06-13 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-39461:
-

 Summary: Print `SPARK_LOCAL_(HOSTNAME|IP)` in `build/{mvn|sbt}`
 Key: SPARK-39461
 URL: https://issues.apache.org/jira/browse/SPARK-39461
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39460) Fix CoarseGrainedSchedulerBackendSuite to handle fast allocations

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553817#comment-17553817
 ] 

Apache Spark commented on SPARK-39460:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36860

> Fix CoarseGrainedSchedulerBackendSuite to handle fast allocations
> -
>
> Key: SPARK-39460
> URL: https://issues.apache.org/jira/browse/SPARK-39460
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39460) Fix CoarseGrainedSchedulerBackendSuite to handle fast allocations

2022-06-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39460:


Assignee: (was: Apache Spark)

> Fix CoarseGrainedSchedulerBackendSuite to handle fast allocations
> -
>
> Key: SPARK-39460
> URL: https://issues.apache.org/jira/browse/SPARK-39460
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39460) Fix CoarseGrainedSchedulerBackendSuite to handle fast allocations

2022-06-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39460:


Assignee: Apache Spark

> Fix CoarseGrainedSchedulerBackendSuite to handle fast allocations
> -
>
> Key: SPARK-39460
> URL: https://issues.apache.org/jira/browse/SPARK-39460
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38796) Implement the to_number and try_to_number SQL functions according to a new specification

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553816#comment-17553816
 ] 

Apache Spark commented on SPARK-38796:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/36861

> Implement the to_number and try_to_number SQL functions according to a new 
> specification
> 
>
> Key: SPARK-38796
> URL: https://issues.apache.org/jira/browse/SPARK-38796
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
> Fix For: 3.3.0
>
>
> This tracks implementing the 'to_number' and 'try_to_number' SQL function 
> expressions according to new semantics described below. The former is 
> equivalent to the latter except that it throws an exception instead of 
> returning NULL for cases where the input string does not match the format 
> string.
>  
> ---
>  
> *try_to_number function (expr, fmt):*
> Returns 'expr' cast to DECIMAL using formatting 'fmt', or 'NULL' if 'expr' is 
> not a valid match for the given format.
>  
> Syntax: 
> [ S ] [ L | $ ]
> [ 0 | 9 | G | , ] [...]
> [ . | D ] 
> [ 0 | 9 ] [...]       
> [ L | $ ] [ PR | MI | S ] ' }
>  
> *Arguments:*
> 'expr': A STRING expression representing a number. 'expr' may include leading 
> or trailing spaces.
> 'fmt': An STRING literal, specifying the expected format of 'expr'.
>  
> *Returns:*
> A DECIMAL(p, s) where 'p' is the total number of digits ('0' or '9') and 's' 
> is the number of digits after the decimal point, or 0 if there is none.
>  
> *Format elements allowed (case insensitive):*
>  * 0 or 9
>   Specifies an expected digit between '0' and '9'. 
>   A '0' to the left of the decimal points indicates that 'expr' must have at 
> least as many digits. A leading '9' indicates that 'expr' may omit these 
> digits.
>   'expr' must not be larger than the number of digits to the left of the 
> decimal point allowed by the format string.
>   Digits to the right of the decimal point in the format string indicate the 
> most digits that 'expr' may have to the right of the decimal point.
>  * . or D
>   Specifies the position of the decimal point.
>   'expr' does not need to include a decimal point.
>  * , or G
>   Specifies the position of the ',' grouping (thousands) separator.
>   There must be a '0' or '9' to the left of the rightmost grouping separator. 
>   'expr' must match the grouping separator relevant for the size of the 
> number. 
>  * $
>   Specifies the location of the '$' currency sign. This character may only be 
> specified once.
>  * S 
>   Specifies the position of an option '+' or '-' sign. This character may 
> only be specified once.
>  * MI
>   Specifies that 'expr' has an optional '-' sign at the end, but no '+'.
>  * PR
>   Specifies that 'expr' indicates a negative number with wrapping angled 
> brackets ('<1>'). If 'expr' contains any characters other than '0' through 
> '9' and those permitted in 'fmt' a 'NULL' is returned.
>  
> *Examples:*
> {{– The format expects:}}
> {{–  * an optional sign at the beginning,}}
> {{–  * followed by a dollar sign,}}
> {{–  * followed by a number between 3 and 6 digits long,}}
> {{–  * thousands separators,}}
> {{–  * up to two dights beyond the decimal point. }}
> {{> SELECT try_to_number('-$12,345.67', 'S$999,099.99');}}
> {{ -12345.67}}
> {{– The plus sign is optional, and so are fractional digits.}}
> {{> SELECT try_to_number('$345', 'S$999,099.99');}}
> {{ 345.00}}
> {{– The format requires at least three digits.}}
> {{> SELECT try_to_number('$45', 'S$999,099.99');}}
> {{ NULL}}
> {{– The format requires at least three digits.}}
> {{> SELECT try_to_number('$045', 'S$999,099.99');}}
> {{ 45.00}}
> {{– Using brackets to denote negative values}}
> {{> SELECT try_to_number('<1234>', '99PR');}}
> {{ -1234}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39460) Fix CoarseGrainedSchedulerBackendSuite to handle fast allocations

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553815#comment-17553815
 ] 

Apache Spark commented on SPARK-39460:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36860

> Fix CoarseGrainedSchedulerBackendSuite to handle fast allocations
> -
>
> Key: SPARK-39460
> URL: https://issues.apache.org/jira/browse/SPARK-39460
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39460) Fix CoarseGrainedSchedulerBackendSuite to handle fast allocations

2022-06-13 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-39460:
-

 Summary: Fix CoarseGrainedSchedulerBackendSuite to handle fast 
allocations
 Key: SPARK-39460
 URL: https://issues.apache.org/jira/browse/SPARK-39460
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, Tests
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39442) Update PlanStabilitySuite comment with SPARK_ANSI_SQL_MODE

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39442:
--
Fix Version/s: 3.3.1
   (was: 3.3.0)

> Update PlanStabilitySuite comment with SPARK_ANSI_SQL_MODE
> --
>
> Key: SPARK-39442
> URL: https://issues.apache.org/jira/browse/SPARK-39442
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.3.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39458) Fix UISuite for IPv6

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39458:
--
Fix Version/s: 3.3.1
   (was: 3.3.0)

> Fix UISuite for IPv6
> 
>
> Key: SPARK-39458
> URL: https://issues.apache.org/jira/browse/SPARK-39458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39458) Fix UISuite for IPv6

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-39458:
-

Assignee: Dongjoon Hyun

> Fix UISuite for IPv6
> 
>
> Key: SPARK-39458
> URL: https://issues.apache.org/jira/browse/SPARK-39458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39458) Fix UISuite for IPv6

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-39458.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 36858
[https://github.com/apache/spark/pull/36858]

> Fix UISuite for IPv6
> 
>
> Key: SPARK-39458
> URL: https://issues.apache.org/jira/browse/SPARK-39458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39439) Suppress error log for in-progress event log not found

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-39439:
-

Assignee: Cheng Pan

> Suppress error log for in-progress event log not found
> --
>
> Key: SPARK-39439
> URL: https://issues.apache.org/jira/browse/SPARK-39439
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39439) Suppress error log for in-progress event log not found

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-39439.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36832
[https://github.com/apache/spark/pull/36832]

> Suppress error log for in-progress event log not found
> --
>
> Key: SPARK-39439
> URL: https://issues.apache.org/jira/browse/SPARK-39439
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39458) Fix UISuite for IPv6

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553754#comment-17553754
 ] 

Apache Spark commented on SPARK-39458:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36858

> Fix UISuite for IPv6
> 
>
> Key: SPARK-39458
> URL: https://issues.apache.org/jira/browse/SPARK-39458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39458) Fix UISuite for IPv6

2022-06-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39458:


Assignee: (was: Apache Spark)

> Fix UISuite for IPv6
> 
>
> Key: SPARK-39458
> URL: https://issues.apache.org/jira/browse/SPARK-39458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39458) Fix UISuite for IPv6

2022-06-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39458:


Assignee: Apache Spark

> Fix UISuite for IPv6
> 
>
> Key: SPARK-39458
> URL: https://issues.apache.org/jira/browse/SPARK-39458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39458) Fix UISuite for IPv6

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553753#comment-17553753
 ] 

Apache Spark commented on SPARK-39458:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36858

> Fix UISuite for IPv6
> 
>
> Key: SPARK-39458
> URL: https://issues.apache.org/jira/browse/SPARK-39458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39458) Fix UISuite for IPv6

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39458:
--
Parent: SPARK-39457
Issue Type: Sub-task  (was: Bug)

> Fix UISuite for IPv6
> 
>
> Key: SPARK-39458
> URL: https://issues.apache.org/jira/browse/SPARK-39458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39457) Support pure IPV6 environment without IPV4

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39457:
--
Affects Version/s: 3.4.0
   (was: 3.2.1)

> Support pure IPV6 environment without IPV4
> --
>
> Key: SPARK-39457
> URL: https://issues.apache.org/jira/browse/SPARK-39457
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: DB Tsai
>Priority: Major
>
> Spark doesn't fully work in pure IPV6 environment that doesn't have IPV4 at 
> all. This is an umbrella jira tracking the support of pure IPV6 deployment. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39458) Fix UISuite for IPv6

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39458:
--
Affects Version/s: 3.2.1
   3.1.2
   3.3.0

> Fix UISuite for IPv6
> 
>
> Key: SPARK-39458
> URL: https://issues.apache.org/jira/browse/SPARK-39458
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39459) LocalSchedulerBackend doesn't support IPV6

2022-06-13 Thread DB Tsai (Jira)
DB Tsai created SPARK-39459:
---

 Summary: LocalSchedulerBackend doesn't support IPV6
 Key: SPARK-39459
 URL: https://issues.apache.org/jira/browse/SPARK-39459
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.2.1
Reporter: DB Tsai



{code:java}
➜  ./bin/spark-shell
22/06/09 14:52:35 WARN Utils: Your hostname, DBs-Mac-mini-2.local resolves to a 
loopback address: 127.0.0.1; using 2600:1700:1151:11ef:0:0:0:2000 instead (on 
interface en1)
22/06/09 14:52:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
22/06/09 14:52:43 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
22/06/09 14:52:44 ERROR SparkContext: Error initializing SparkContext.
java.lang.AssertionError: assertion failed: Expected hostname or IPv6 IP 
enclosed in [] but got 2600:1700:1151:11ef:0:0:0:2000
at scala.Predef$.assert(Predef.scala:223) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.util.Utils$.checkHost(Utils.scala:1110) 
~[spark-core_2.12-3.2.0.jar:3.2.0.37]
at org.apache.spark.executor.Executor.(Executor.scala:89) 
~[spark-core_2.12-3.2.0.jar:3.2.0.37]
at 
org.apache.spark.scheduler.local.LocalEndpoint.(LocalSchedulerBackend.scala:64)
 ~[spark-core_2.12-3.2.0.jar:3.2.0]
at 
org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
 ~[spark-core_2.12-3.2.0.jar:3.2.0]
{code}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39458) Fix UISuite for IPv6

2022-06-13 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-39458:
-

 Summary: Fix UISuite for IPv6
 Key: SPARK-39458
 URL: https://issues.apache.org/jira/browse/SPARK-39458
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39457) Support pure IPV6 environment without IPV4

2022-06-13 Thread DB Tsai (Jira)
DB Tsai created SPARK-39457:
---

 Summary: Support pure IPV6 environment without IPV4
 Key: SPARK-39457
 URL: https://issues.apache.org/jira/browse/SPARK-39457
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.2.1
Reporter: DB Tsai


Spark doesn't fully work in pure IPV6 environment that doesn't have IPV4 at 
all. This is an umbrella jira tracking the support of pure IPV6 deployment. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39434) Provide runtime error query context when array index is out of bound

2022-06-13 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-39434:
--
Fix Version/s: 3.4.0

> Provide runtime error query context when array index is out of bound
> 
>
> Key: SPARK-39434
> URL: https://issues.apache.org/jira/browse/SPARK-39434
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29250) Upgrade to Hadoop 3.3.1

2022-06-13 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553744#comment-17553744
 ] 

Steve Loughran commented on SPARK-29250:


use whatever version the spark release was built with if you want least stress. 

> Upgrade to Hadoop 3.3.1
> ---
>
> Key: SPARK-29250
> URL: https://issues.apache.org/jira/browse/SPARK-29250
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Chao Sun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39437) normalize plan id separately in PlanStabilitySuite

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39437:
--
Fix Version/s: 3.3.0
   (was: 3.4.0)

> normalize plan id separately in PlanStabilitySuite
> --
>
> Key: SPARK-39437
> URL: https://issues.apache.org/jira/browse/SPARK-39437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39437) normalize plan id separately in PlanStabilitySuite

2022-06-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39437:
--
Fix Version/s: 3.4.0
   3.3.1
   (was: 3.3.0)

> normalize plan id separately in PlanStabilitySuite
> --
>
> Key: SPARK-39437
> URL: https://issues.apache.org/jira/browse/SPARK-39437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.2.2, 3.4.0, 3.3.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39456) Fix broken function links in the auto-generated pandas API support list documentation.

2022-06-13 Thread Hyunwoo Park (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunwoo Park updated SPARK-39456:
-
Summary: Fix broken function links in the auto-generated pandas API support 
list documentation.  (was: Fix broken function links in the auto-generated 
pandas API support list documetation.)

> Fix broken function links in the auto-generated pandas API support list 
> documentation.
> --
>
> Key: SPARK-39456
> URL: https://issues.apache.org/jira/browse/SPARK-39456
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Priority: Major
>
> In the auto-generated documentation on pandas API support list, there are 
> cases where the link of the function property provided in the document is not 
> connected, so it needs to be corrected.
> The current 'supported API generation' function dynamically compares the 
> modules of `{{{}PySpark.pandas`{}}} and `{{{}pandas`{}}} to find the 
> difference. At this time, the inherited class is also aggregated, and the 
> link is not generated correctly (such as {{`CategoricalIndex.all()`}} is used 
> internally by inheriting `{{{}Index.all()`{}}}.) because it does not match 
> the pattern of each API document.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39456) Fix broken function links in the auto-generated pandas API support list documetation.

2022-06-13 Thread Hyunwoo Park (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunwoo Park updated SPARK-39456:
-
Summary: Fix broken function links in the auto-generated pandas API support 
list documetation.  (was: Fix broken function links in the auto-generated 
documetation on pandas API support list.)

> Fix broken function links in the auto-generated pandas API support list 
> documetation.
> -
>
> Key: SPARK-39456
> URL: https://issues.apache.org/jira/browse/SPARK-39456
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Priority: Major
>
> In the auto-generated documentation on pandas API support list, there are 
> cases where the link of the function property provided in the document is not 
> connected, so it needs to be corrected.
> The current 'supported API generation' function dynamically compares the 
> modules of `{{{}PySpark.pandas`{}}} and `{{{}pandas`{}}} to find the 
> difference. At this time, the inherited class is also aggregated, and the 
> link is not generated correctly (such as {{`CategoricalIndex.all()`}} is used 
> internally by inheriting `{{{}Index.all()`{}}}.) because it does not match 
> the pattern of each API document.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39456) Fix broken function links in the auto-generated documetation on pandas API support list.

2022-06-13 Thread Hyunwoo Park (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553638#comment-17553638
 ] 

Hyunwoo Park commented on SPARK-39456:
--

Related: [https://github.com/apache/spark/pull/36729#issuecomment-1141632078]

> Fix broken function links in the auto-generated documetation on pandas API 
> support list.
> 
>
> Key: SPARK-39456
> URL: https://issues.apache.org/jira/browse/SPARK-39456
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Priority: Major
>
> In the auto-generated documentation on pandas API support list, there are 
> cases where the link of the function property provided in the document is not 
> connected, so it needs to be corrected.
> The current 'supported API generation' function dynamically compares the 
> modules of `{{{}PySpark.pandas`{}}} and `{{{}pandas`{}}} to find the 
> difference. At this time, the inherited class is also aggregated, and the 
> link is not generated correctly (such as {{`CategoricalIndex.all()`}} is used 
> internally by inheriting `{{{}Index.all()`{}}}.) because it does not match 
> the pattern of each API document.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39456) Fix broken function links in the auto-generated documetation on pandas API support list.

2022-06-13 Thread Hyunwoo Park (Jira)
Hyunwoo Park created SPARK-39456:


 Summary: Fix broken function links in the auto-generated 
documetation on pandas API support list.
 Key: SPARK-39456
 URL: https://issues.apache.org/jira/browse/SPARK-39456
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Hyunwoo Park


In the auto-generated documentation on pandas API support list, there are cases 
where the link of the function property provided in the document is not 
connected, so it needs to be corrected.

The current 'supported API generation' function dynamically compares the 
modules of `{{{}PySpark.pandas`{}}} and `{{{}pandas`{}}} to find the 
difference. At this time, the inherited class is also aggregated, and the link 
is not generated correctly (such as {{`CategoricalIndex.all()`}} is used 
internally by inheriting `{{{}Index.all()`{}}}.) because it does not match the 
pattern of each API document.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39455) Improve expression non-codegen code path performance by cache data type matching

2022-06-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39455:


Assignee: (was: Apache Spark)

> Improve expression non-codegen code path performance by cache data type 
> matching
> 
>
> Key: SPARK-39455
> URL: https://issues.apache.org/jira/browse/SPARK-39455
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Some expressions do data type matching inside `eval` but it is not friendly 
> for performance. Which is a overhead for every execution per row.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39455) Improve expression non-codegen code path performance by cache data type matching

2022-06-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39455:


Assignee: Apache Spark

> Improve expression non-codegen code path performance by cache data type 
> matching
> 
>
> Key: SPARK-39455
> URL: https://issues.apache.org/jira/browse/SPARK-39455
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> Some expressions do data type matching inside `eval` but it is not friendly 
> for performance. Which is a overhead for every execution per row.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39455) Improve expression non-codegen code path performance by cache data type matching

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553556#comment-17553556
 ] 

Apache Spark commented on SPARK-39455:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/36856

> Improve expression non-codegen code path performance by cache data type 
> matching
> 
>
> Key: SPARK-39455
> URL: https://issues.apache.org/jira/browse/SPARK-39455
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Some expressions do data type matching inside `eval` but it is not friendly 
> for performance. Which is a overhead for every execution per row.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39455) Improve expression non-codegen code path performance by cache data type matching

2022-06-13 Thread XiDuo You (Jira)
XiDuo You created SPARK-39455:
-

 Summary: Improve expression non-codegen code path performance by 
cache data type matching
 Key: SPARK-39455
 URL: https://issues.apache.org/jira/browse/SPARK-39455
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: XiDuo You


Some expressions do data type matching inside `eval` but it is not friendly for 
performance. Which is a overhead for every execution per row.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39432) element_at(*, 0) does not return INVALID_ARRAY_INDEX_IN_ELEMENT_AT

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553537#comment-17553537
 ] 

Apache Spark commented on SPARK-39432:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36855

> element_at(*, 0) does not return INVALID_ARRAY_INDEX_IN_ELEMENT_AT
> --
>
> Key: SPARK-39432
> URL: https://issues.apache.org/jira/browse/SPARK-39432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Serge Rielau
>Priority: Major
>
> spark-sql> SELECT element_at(array('a', 'b', 'c'), index) FROM VALUES(0), (2) 
> AS T(index);
> 22/06/09 16:23:07 ERROR SparkSQLDriver: Failed in [SELECT 
> element_at(array('a', 'b', 'c'), index) FROM VALUES(0), (2) AS T(index)]
> java.lang.ArrayIndexOutOfBoundsException: SQL array indices start at 1
>  at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.sqlArrayIndexNotStartAtOneError(QueryExecutionErrors.scala:1206)
>  
> This should roll into INVALID_ARRAY_IN_ELEMENT_AT. Makes no sense to make a 
> new error class 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39432) element_at(*, 0) does not return INVALID_ARRAY_INDEX_IN_ELEMENT_AT

2022-06-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553534#comment-17553534
 ] 

Apache Spark commented on SPARK-39432:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36855

> element_at(*, 0) does not return INVALID_ARRAY_INDEX_IN_ELEMENT_AT
> --
>
> Key: SPARK-39432
> URL: https://issues.apache.org/jira/browse/SPARK-39432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Serge Rielau
>Priority: Major
>
> spark-sql> SELECT element_at(array('a', 'b', 'c'), index) FROM VALUES(0), (2) 
> AS T(index);
> 22/06/09 16:23:07 ERROR SparkSQLDriver: Failed in [SELECT 
> element_at(array('a', 'b', 'c'), index) FROM VALUES(0), (2) AS T(index)]
> java.lang.ArrayIndexOutOfBoundsException: SQL array indices start at 1
>  at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.sqlArrayIndexNotStartAtOneError(QueryExecutionErrors.scala:1206)
>  
> This should roll into INVALID_ARRAY_IN_ELEMENT_AT. Makes no sense to make a 
> new error class 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >