date:20221102

[jira] [Updated] (SPARK-40999) Hints on subqueries are not properly propagated

2022-11-02 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-40999:

Fix Version/s: (was: 3.4.0)

> Hints on subqueries are not properly propagated
> ---
>
> Key: SPARK-40999
> URL: https://issues.apache.org/jira/browse/SPARK-40999
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 
> 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.4.0, 3.3.1
>Reporter: Fredrik Klauß
>Priority: Major
>
> Currently, if a user tries to specify a query like the following, the hints 
> on the subquery will be lost. 
> {code:java}
> SELECT * FROM target t WHERE EXISTS
> (SELECT /*+ BROADCAST */ * FROM source s WHERE s.key = t.key){code}
> This happens as hints are removed from the plan and pulled into joins in the 
> beginning of the optimization stage, but subqueries are only turned into 
> joins during optimization. As we remove any hints that are not below a join, 
> we end up removing hints that are below a subquery. 
>  
> To resolve this, we add a hint field to SubqueryExpression that any hints 
> inside a subquery's plan can be pulled into during EliminateResolvedHint, and 
> then pass this hint on when the subquery is turned into a join.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40998) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040

2022-11-02 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40998.
--
Resolution: Fixed

Issue resolved by pull request 38484
[https://github.com/apache/spark/pull/38484]

> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040
> ---
>
> Key: SPARK-40998
> URL: https://issues.apache.org/jira/browse/SPARK-40998
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40977) Complete Support for Union in Python client

2022-11-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40977.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38453
[https://github.com/apache/spark/pull/38453]

> Complete Support for Union in Python client
> ---
>
> Key: SPARK-40977
> URL: https://issues.apache.org/jira/browse/SPARK-40977
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40977) Complete Support for Union in Python client

2022-11-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40977:


Assignee: Rui Wang

> Complete Support for Union in Python client
> ---
>
> Key: SPARK-40977
> URL: https://issues.apache.org/jira/browse/SPARK-40977
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41003) BHJ LeftAnti does not update numOutputRows when codegen is disabled

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628113#comment-17628113
 ] 

Apache Spark commented on SPARK-41003:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/38489

> BHJ LeftAnti does not update numOutputRows when codegen is disabled
> ---
>
> Key: SPARK-41003
> URL: https://issues.apache.org/jira/browse/SPARK-41003
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: dzcxzl
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41003) BHJ LeftAnti does not update numOutputRows when codegen is disabled

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41003:


Assignee: (was: Apache Spark)

> BHJ LeftAnti does not update numOutputRows when codegen is disabled
> ---
>
> Key: SPARK-41003
> URL: https://issues.apache.org/jira/browse/SPARK-41003
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: dzcxzl
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41003) BHJ LeftAnti does not update numOutputRows when codegen is disabled

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41003:


Assignee: Apache Spark

> BHJ LeftAnti does not update numOutputRows when codegen is disabled
> ---
>
> Key: SPARK-41003
> URL: https://issues.apache.org/jira/browse/SPARK-41003
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41003) BHJ LeftAnti does not update numOutputRows when codegen is disabled

2022-11-02 Thread dzcxzl (Jira)

dzcxzl created SPARK-41003:
--

 Summary: BHJ LeftAnti does not update numOutputRows when codegen 
is disabled
 Key: SPARK-41003
 URL: https://issues.apache.org/jira/browse/SPARK-41003
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41002) Compatible `take` and `head` API in Python client

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628104#comment-17628104
 ] 

Apache Spark commented on SPARK-41002:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38488

> Compatible `take` and `head` API in Python client 
> --
>
> Key: SPARK-41002
> URL: https://issues.apache.org/jira/browse/SPARK-41002
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41002) Compatible `take` and `head` API in Python client

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628103#comment-17628103
 ] 

Apache Spark commented on SPARK-41002:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38488

> Compatible `take` and `head` API in Python client 
> --
>
> Key: SPARK-41002
> URL: https://issues.apache.org/jira/browse/SPARK-41002
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41002) Compatible `take` and `head` API in Python client

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41002:


Assignee: (was: Apache Spark)

> Compatible `take` and `head` API in Python client 
> --
>
> Key: SPARK-41002
> URL: https://issues.apache.org/jira/browse/SPARK-41002
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41002) Compatible `take` and `head` API in Python client

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41002:


Assignee: Apache Spark

> Compatible `take` and `head` API in Python client 
> --
>
> Key: SPARK-41002
> URL: https://issues.apache.org/jira/browse/SPARK-41002
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41002) Compatible `take` and `head` API in Python client

2022-11-02 Thread Rui Wang (Jira)

Rui Wang created SPARK-41002:


 Summary: Compatible `take` and `head` API in Python client 
 Key: SPARK-41002
 URL: https://issues.apache.org/jira/browse/SPARK-41002
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0

2022-11-02 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628091#comment-17628091
 ] 

Yang Jie commented on SPARK-34007:
--

[~hyukjin.kwon] Since SPARK-40651 dropped Hadoop2 binary distribution from 
release process, will this issue still exist?

> Downgrade scala-maven-plugin to 4.3.0
> -
>
> Key: SPARK-34007
> URL: https://issues.apache.org/jira/browse/SPARK-34007
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Blocker
> Fix For: 3.1.0
>
>
> After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker 
> release script fails as below:
> {code}
> [INFO] Compiling 21 Scala sources and 3 Java sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
>  ...
> [ERROR] ## Exception when compiling 24 sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
> java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s 
> signer information does not match signer information of other classes in the 
> same package
> java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
> java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
> java.lang.ClassLoader.defineClass(ClassLoader.java:754)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> java.lang.Class.getDeclaredMethods0(Native Method)
> java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> java.lang.Class.privateGetPublicMethods(Class.java:2902)
> java.lang.Class.getMethods(Class.java:1615)
> sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170)
> sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123)
> scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
> sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123)
> sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33)
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40995) Developer Documentation for Spark Connect

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628072#comment-17628072
 ] 

Apache Spark commented on SPARK-40995:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38487

> Developer Documentation for Spark Connect
> -
>
> Key: SPARK-40995
> URL: https://issues.apache.org/jira/browse/SPARK-40995
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.4.0
>
>
> Move the existing minimal doc into the right top level connect readme and add 
> new docs folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40989) Improve `session.sql` testing coverage in Python client

2022-11-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40989.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38472
[https://github.com/apache/spark/pull/38472]

> Improve `session.sql` testing coverage in Python client
> ---
>
> Key: SPARK-40989
> URL: https://issues.apache.org/jira/browse/SPARK-40989
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40989) Improve `session.sql` testing coverage in Python client

2022-11-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40989:


Assignee: Rui Wang

> Improve `session.sql` testing coverage in Python client
> ---
>
> Key: SPARK-40989
> URL: https://issues.apache.org/jira/browse/SPARK-40989
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40995) Developer Documentation for Spark Connect

2022-11-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40995.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38470
[https://github.com/apache/spark/pull/38470]

> Developer Documentation for Spark Connect
> -
>
> Key: SPARK-40995
> URL: https://issues.apache.org/jira/browse/SPARK-40995
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.4.0
>
>
> Move the existing minimal doc into the right top level connect readme and add 
> new docs folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40995) Developer Documentation for Spark Connect

2022-11-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40995:


Assignee: Martin Grund

> Developer Documentation for Spark Connect
> -
>
> Key: SPARK-40995
> URL: https://issues.apache.org/jira/browse/SPARK-40995
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>
> Move the existing minimal doc into the right top level connect readme and add 
> new docs folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41000) Make CommandResult extend Command

2022-11-02 Thread Kelvin Jiang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kelvin Jiang updated SPARK-41000:
-
Description: CommandResult is the logical plan node that stores the results 
from a command. We want this to still be considered a command, rather than e.g. 
a query, so we should extend the trait Command which would allow it to pass 
various checks for commands (such as [this 
one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]).
  (was: CommandResult is the logical plan node that stores the results from a 
command. We want this to still be considered a command, rather than e.g. a 
query, so extending the trait Command would allow it to pass various checks for 
commands (such as [this 
one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]).)

> Make CommandResult extend Command
> -
>
> Key: SPARK-41000
> URL: https://issues.apache.org/jira/browse/SPARK-41000
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kelvin Jiang
>Priority: Major
>
> CommandResult is the logical plan node that stores the results from a 
> command. We want this to still be considered a command, rather than e.g. a 
> query, so we should extend the trait Command which would allow it to pass 
> various checks for commands (such as [this 
> one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-39405) NumPy input support in PySpark SQL

2022-11-02 Thread Xinrong Meng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627984#comment-17627984
 ] 

Xinrong Meng edited comment on SPARK-39405 at 11/2/22 9:10 PM:
---

Hi [~douglas.mo...@databricks.com] the fix is in.


was (Author: xinrongm):
Hi [~douglas.mo...@databricks.com] the commit is in.

> NumPy input support in PySpark SQL
> --
>
> Key: SPARK-39405
> URL: https://issues.apache.org/jira/browse/SPARK-39405
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> NumPy is the fundamental package for scientific computing with Python. It is 
> very commonly used, especially in the data science world. For example, Pandas 
> is backed by NumPy, and Tensors also supports interchangeable conversion 
> from/to NumPy arrays. 
>  
> However, PySpark only supports Python built-in types with the exception of 
> “SparkSession.createDataFrame(pandas.DataFrame)” and “DataFrame.toPandas”. 
>  
> This issue has been raised multiple times internally and externally, see also 
> SPARK-2012, SPARK-37697, SPARK-31776, and SPARK-6857.
>  
> With the NumPy support in SQL, we expect more adaptations from naive data 
> scientists and newcomers leveraging their existing background and codebase 
> with NumPy.
>  
> See more 
> [https://docs.google.com/document/d/1WsBiHoQB3UWERP47C47n_frffxZ9YIoGRwXSwIeMank/edit#]
> .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37697) Make it easier to convert numpy arrays to Spark Dataframes

2022-11-02 Thread Xinrong Meng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627985#comment-17627985
 ] 

Xinrong Meng commented on SPARK-37697:
--

The commit is in.

> Make it easier to convert numpy arrays to Spark Dataframes
> --
>
> Key: SPARK-37697
> URL: https://issues.apache.org/jira/browse/SPARK-37697
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.2
>Reporter: Douglas Moore
>Priority: Major
> Attachments: image-2022-10-31-22-49-37-356.png
>
>
> Make it easier to convert numpy arrays to dataframes.
> Often we receive errors:
>  
> {code:java}
> df = spark.createDataFrame(numpy.arange(10))
> Can not infer schema for type: 
> {code}
>  
> OR
> {code:java}
> df = spark.createDataFrame(numpy.arange(10.))
> Can not infer schema for type: 
> {code}
>  
> Today (Spark 3.x) we have to:
> {code:java}
> spark.createDataFrame(pd.DataFrame(numpy.arange(10.))) {code}
> Make this easier with a direct conversion from Numpy arrays to Spark 
> Dataframes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40990) DataFrame creation from 2d NumPy array with arbitrary columns

2022-11-02 Thread Xinrong Meng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627983#comment-17627983
 ] 

Xinrong Meng commented on SPARK-40990:
--

Hi [~douglas.mo...@databricks.com] Any size of the 2d ndarray works, as long as 
it fits into the memory since ndarray is not distributed.

 

> DataFrame creation from 2d NumPy array with arbitrary columns
> -
>
> Key: SPARK-40990
> URL: https://issues.apache.org/jira/browse/SPARK-40990
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, DataFrame creation from 2d ndarray works only with 2 columns. We 
> should provide complete support for DataFrame creation with 2d ndarray.
> For example, the test case below should work as shown below.
>  
> {code:java}
> >>> spark.createDataFrame(np.arange(100).reshape([10,10])).show()
> +---+---+---+---+---+---+---+---+---+---+                                     
>   
> | _1| _2| _3| _4| _5| _6| _7| _8| _9|_10|
> +---+---+---+---+---+---+---+---+---+---+
> |  0|  1|  2|  3|  4|  5|  6|  7|  8|  9|
> | 10| 11| 12| 13| 14| 15| 16| 17| 18| 19|
> | 20| 21| 22| 23| 24| 25| 26| 27| 28| 29|
> | 30| 31| 32| 33| 34| 35| 36| 37| 38| 39|
> | 40| 41| 42| 43| 44| 45| 46| 47| 48| 49|
> | 50| 51| 52| 53| 54| 55| 56| 57| 58| 59|
> | 60| 61| 62| 63| 64| 65| 66| 67| 68| 69|
> | 70| 71| 72| 73| 74| 75| 76| 77| 78| 79|
> | 80| 81| 82| 83| 84| 85| 86| 87| 88| 89|
> | 90| 91| 92| 93| 94| 95| 96| 97| 98| 99|
> +---+---+---+---+---+---+---+---+---+---+
>   {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39405) NumPy input support in PySpark SQL

2022-11-02 Thread Xinrong Meng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627984#comment-17627984
 ] 

Xinrong Meng commented on SPARK-39405:
--

Hi [~douglas.mo...@databricks.com] the commit is in.

> NumPy input support in PySpark SQL
> --
>
> Key: SPARK-39405
> URL: https://issues.apache.org/jira/browse/SPARK-39405
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> NumPy is the fundamental package for scientific computing with Python. It is 
> very commonly used, especially in the data science world. For example, Pandas 
> is backed by NumPy, and Tensors also supports interchangeable conversion 
> from/to NumPy arrays. 
>  
> However, PySpark only supports Python built-in types with the exception of 
> “SparkSession.createDataFrame(pandas.DataFrame)” and “DataFrame.toPandas”. 
>  
> This issue has been raised multiple times internally and externally, see also 
> SPARK-2012, SPARK-37697, SPARK-31776, and SPARK-6857.
>  
> With the NumPy support in SQL, we expect more adaptations from naive data 
> scientists and newcomers leveraging their existing background and codebase 
> with NumPy.
>  
> See more 
> [https://docs.google.com/document/d/1WsBiHoQB3UWERP47C47n_frffxZ9YIoGRwXSwIeMank/edit#]
> .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41000) Make CommandResult extend Command

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627971#comment-17627971
 ] 

Apache Spark commented on SPARK-41000:
--

User 'kelvinjian-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/38486

> Make CommandResult extend Command
> -
>
> Key: SPARK-41000
> URL: https://issues.apache.org/jira/browse/SPARK-41000
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kelvin Jiang
>Priority: Major
>
> CommandResult is the logical plan node that stores the results from a 
> command. We want this to still be considered a command, rather than e.g. a 
> query, so extending the trait Command would allow it to pass various checks 
> for commands (such as [this 
> one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41000) Make CommandResult extend Command

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41000:


Assignee: Apache Spark

> Make CommandResult extend Command
> -
>
> Key: SPARK-41000
> URL: https://issues.apache.org/jira/browse/SPARK-41000
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kelvin Jiang
>Assignee: Apache Spark
>Priority: Major
>
> CommandResult is the logical plan node that stores the results from a 
> command. We want this to still be considered a command, rather than e.g. a 
> query, so extending the trait Command would allow it to pass various checks 
> for commands (such as [this 
> one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41000) Make CommandResult extend Command

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41000:


Assignee: (was: Apache Spark)

> Make CommandResult extend Command
> -
>
> Key: SPARK-41000
> URL: https://issues.apache.org/jira/browse/SPARK-41000
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kelvin Jiang
>Priority: Major
>
> CommandResult is the logical plan node that stores the results from a 
> command. We want this to still be considered a command, rather than e.g. a 
> query, so extending the trait Command would allow it to pass various checks 
> for commands (such as [this 
> one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41001) Connection string support for Python client

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627936#comment-17627936
 ] 

Apache Spark commented on SPARK-41001:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/38485

> Connection string support for Python client
> ---
>
> Key: SPARK-41001
> URL: https://issues.apache.org/jira/browse/SPARK-41001
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41001) Connection string support for Python client

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41001:


Assignee: (was: Apache Spark)

> Connection string support for Python client
> ---
>
> Key: SPARK-41001
> URL: https://issues.apache.org/jira/browse/SPARK-41001
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41001) Connection string support for Python client

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41001:


Assignee: Apache Spark

> Connection string support for Python client
> ---
>
> Key: SPARK-41001
> URL: https://issues.apache.org/jira/browse/SPARK-41001
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41001) Connection string support for Python client

2022-11-02 Thread Martin Grund (Jira)

Martin Grund created SPARK-41001:


 Summary: Connection string support for Python client
 Key: SPARK-41001
 URL: https://issues.apache.org/jira/browse/SPARK-41001
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41000) Make CommandResult extend Command

2022-11-02 Thread Kelvin Jiang (Jira)

Kelvin Jiang created SPARK-41000:


 Summary: Make CommandResult extend Command
 Key: SPARK-41000
 URL: https://issues.apache.org/jira/browse/SPARK-41000
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Kelvin Jiang


CommandResult is the logical plan node that stores the results from a command. 
We want this to still be considered a command, rather than e.g. a query, so 
extending the trait Command would allow it to pass various checks for commands 
(such as [this 
one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40999) Hints on subqueries are not properly propagated

2022-11-02 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-40999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredrik Klauß updated SPARK-40999:
--
Affects Version/s: 3.4.0

> Hints on subqueries are not properly propagated
> ---
>
> Key: SPARK-40999
> URL: https://issues.apache.org/jira/browse/SPARK-40999
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 
> 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.4.0, 3.3.1
>Reporter: Fredrik Klauß
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, if a user tries to specify a query like the following, the hints 
> on the subquery will be lost. 
> {code:java}
> SELECT * FROM target t WHERE EXISTS
> (SELECT /*+ BROADCAST */ * FROM source s WHERE s.key = t.key){code}
> This happens as hints are removed from the plan and pulled into joins in the 
> beginning of the optimization stage, but subqueries are only turned into 
> joins during optimization. As we remove any hints that are not below a join, 
> we end up removing hints that are below a subquery. 
>  
> To resolve this, we add a hint field to SubqueryExpression that any hints 
> inside a subquery's plan can be pulled into during EliminateResolvedHint, and 
> then pass this hint on when the subquery is turned into a join.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40999) Hints on subqueries are not properly propagated

2022-11-02 Thread Jira

Fredrik Klauß created SPARK-40999:
-

 Summary: Hints on subqueries are not properly propagated
 Key: SPARK-40999
 URL: https://issues.apache.org/jira/browse/SPARK-40999
 Project: Spark
  Issue Type: Bug
  Components: Optimizer, Spark Core
Affects Versions: 3.3.1, 3.2.2, 3.3.0, 3.2.1, 3.1.3, 3.2.0, 3.1.2, 3.1.1, 
3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0
Reporter: Fredrik Klauß
 Fix For: 3.4.0


Currently, if a user tries to specify a query like the following, the hints on 
the subquery will be lost. 
{code:java}
SELECT * FROM target t WHERE EXISTS
(SELECT /*+ BROADCAST */ * FROM source s WHERE s.key = t.key){code}
This happens as hints are removed from the plan and pulled into joins in the 
beginning of the optimization stage, but subqueries are only turned into joins 
during optimization. As we remove any hints that are not below a join, we end 
up removing hints that are below a subquery. 

 

To resolve this, we add a hint field to SubqueryExpression that any hints 
inside a subquery's plan can be pulled into during EliminateResolvedHint, and 
then pass this hint on when the subquery is turned into a join.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40985) Upgrade RoaringBitmap to 0.9.35

2022-11-02 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-40985:


Assignee: Yang Jie

> Upgrade RoaringBitmap to 0.9.35
> ---
>
> Key: SPARK-40985
> URL: https://issues.apache.org/jira/browse/SPARK-40985
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40985) Upgrade RoaringBitmap to 0.9.35

2022-11-02 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40985.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38465
[https://github.com/apache/spark/pull/38465]

> Upgrade RoaringBitmap to 0.9.35
> ---
>
> Key: SPARK-40985
> URL: https://issues.apache.org/jira/browse/SPARK-40985
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40998) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627783#comment-17627783
 ] 

Apache Spark commented on SPARK-40998:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/38484

> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040
> ---
>
> Key: SPARK-40998
> URL: https://issues.apache.org/jira/browse/SPARK-40998
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40998) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40998:


Assignee: Apache Spark  (was: Max Gekk)

> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040
> ---
>
> Key: SPARK-40998
> URL: https://issues.apache.org/jira/browse/SPARK-40998
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40998) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40998:


Assignee: Max Gekk  (was: Apache Spark)

> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040
> ---
>
> Key: SPARK-40998
> URL: https://issues.apache.org/jira/browse/SPARK-40998
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40998) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627785#comment-17627785
 ] 

Apache Spark commented on SPARK-40998:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/38484

> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040
> ---
>
> Key: SPARK-40998
> URL: https://issues.apache.org/jira/browse/SPARK-40998
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40998) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040

2022-11-02 Thread Max Gekk (Jira)

Max Gekk created SPARK-40998:


 Summary: Assign a name to the legacy error class 
_LEGACY_ERROR_TEMP_0040
 Key: SPARK-40998
 URL: https://issues.apache.org/jira/browse/SPARK-40998
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk
Assignee: Max Gekk
 Fix For: 3.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33782) Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

2022-11-02 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-33782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627743#comment-17627743
 ] 

Daniel Glöckner edited comment on SPARK-33782 at 11/2/22 2:34 PM:
--

Will this fix repair the {{--jars}} flag and will JARs be added automatically 
to the driver and executor class path when using 
{{spark.kubernetes.file.upload.path}} / {{file://}} URIs?

https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management

https://spark.apache.org/docs/3.2.0/submitting-applications.html
{quote}
When using spark-submit, the application jar along with any jars included with 
the --jars option will be automatically transferred to the cluster. URLs 
supplied after --jars must be separated by commas. That list is included in the 
driver and executor classpaths. 
{quote}


was (Author: JIRAUSER288949):
Will this fix repair the {{--jars}} flag and will JARs be added automatically 
to the driver and executor class path when using 
{{spark.kubernetes.file.upload.path}} / {{file://}} URIs?

https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management

https://spark.apache.org/docs/3.2.0/submitting-applications.html
??
When using spark-submit, the application jar along with any jars included with 
the --jars option will be automatically transferred to the cluster. URLs 
supplied after --jars must be separated by commas. That list is included in the 
driver and executor classpaths. ??

> Place spark.files, spark.jars and spark.files under the current working 
> directory on the driver in K8S cluster mode
> ---
>
> Key: SPARK-33782
> URL: https://issues.apache.org/jira/browse/SPARK-33782
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> In Yarn cluster modes, the passed files are able to be accessed in the 
> current working directory. Looks like this is not the case in Kubernates 
> cluset mode.
> By doing this, users can, for example, leverage PEX to manage Python 
> dependences in Apache Spark:
> {code}
> pex pyspark==3.0.1 pyarrow==0.15.1 pandas==0.25.3 -o myarchive.pex
> PYSPARK_PYTHON=./myarchive.pex spark-submit --files myarchive.pex
> {code}
> See also https://github.com/apache/spark/pull/30735/files#r540935585.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33782) Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

2022-11-02 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-33782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627743#comment-17627743
 ] 

Daniel Glöckner edited comment on SPARK-33782 at 11/2/22 2:33 PM:
--

Will this fix repair the {{--jars}} flag and will JARs be added automatically 
to the driver and executor class path when using 
{{spark.kubernetes.file.upload.path}} / {{file://}} URIs?

https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management

https://spark.apache.org/docs/3.2.0/submitting-applications.html
??
When using spark-submit, the application jar along with any jars included with 
the --jars option will be automatically transferred to the cluster. URLs 
supplied after --jars must be separated by commas. That list is included in the 
driver and executor classpaths. ??


was (Author: JIRAUSER288949):
The this fix repair the {{--jars}} flag and will JARs be added automatically to 
the driver and executor class path when using 
{{spark.kubernetes.file.upload.path}} / {{file://}} URIs?

https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management

https://spark.apache.org/docs/3.2.0/submitting-applications.html
??
When using spark-submit, the application jar along with any jars included with 
the --jars option will be automatically transferred to the cluster. URLs 
supplied after --jars must be separated by commas. That list is included in the 
driver and executor classpaths. ??

> Place spark.files, spark.jars and spark.files under the current working 
> directory on the driver in K8S cluster mode
> ---
>
> Key: SPARK-33782
> URL: https://issues.apache.org/jira/browse/SPARK-33782
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> In Yarn cluster modes, the passed files are able to be accessed in the 
> current working directory. Looks like this is not the case in Kubernates 
> cluset mode.
> By doing this, users can, for example, leverage PEX to manage Python 
> dependences in Apache Spark:
> {code}
> pex pyspark==3.0.1 pyarrow==0.15.1 pandas==0.25.3 -o myarchive.pex
> PYSPARK_PYTHON=./myarchive.pex spark-submit --files myarchive.pex
> {code}
> See also https://github.com/apache/spark/pull/30735/files#r540935585.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33782) Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

2022-11-02 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-33782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627743#comment-17627743
 ] 

Daniel Glöckner commented on SPARK-33782:
-

The this fix repair the {{--jars}} flag and will JARs be added automatically to 
the driver and executor class path when using 
{{spark.kubernetes.file.upload.path}} / {{file://}} URIs?

https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management

https://spark.apache.org/docs/3.2.0/submitting-applications.html
??
When using spark-submit, the application jar along with any jars included with 
the --jars option will be automatically transferred to the cluster. URLs 
supplied after --jars must be separated by commas. That list is included in the 
driver and executor classpaths. ??

> Place spark.files, spark.jars and spark.files under the current working 
> directory on the driver in K8S cluster mode
> ---
>
> Key: SPARK-33782
> URL: https://issues.apache.org/jira/browse/SPARK-33782
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> In Yarn cluster modes, the passed files are able to be accessed in the 
> current working directory. Looks like this is not the case in Kubernates 
> cluset mode.
> By doing this, users can, for example, leverage PEX to manage Python 
> dependences in Apache Spark:
> {code}
> pex pyspark==3.0.1 pyarrow==0.15.1 pandas==0.25.3 -o myarchive.pex
> PYSPARK_PYTHON=./myarchive.pex spark-submit --files myarchive.pex
> {code}
> See also https://github.com/apache/spark/pull/30735/files#r540935585.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32380) sparksql cannot access hive table while data in hbase

2022-11-02 Thread Ranga Reddy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627704#comment-17627704
 ] 

Ranga Reddy commented on SPARK-32380:
-

The below pull request will solve the issue but needs to check if there are any 
other issues.

[https://github.com/apache/spark/pull/29178]

 

> sparksql cannot access hive table while data in hbase
> -
>
> Key: SPARK-32380
> URL: https://issues.apache.org/jira/browse/SPARK-32380
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: ||component||version||
> |hadoop|2.8.5|
> |hive|2.3.7|
> |spark|3.0.0|
> |hbase|1.4.9|
>Reporter: deyzhong
>Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> * step1: create hbase table
> {code:java}
>  hbase(main):001:0>create 'hbase_test1', 'cf1'
>  hbase(main):001:0> put 'hbase_test', 'r1', 'cf1:c1', '123'
> {code}
>  * step2: create hive table related to hbase table
>  
> {code:java}
> hive> 
> CREATE EXTERNAL TABLE `hivetest.hbase_test`(
>   `key` string COMMENT '', 
>   `value` string COMMENT '')
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 
>   'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ( 
>   'hbase.columns.mapping'=':key,cf1:v1', 
>   'serialization.format'='1')
> TBLPROPERTIES (
>   'hbase.table.name'='hbase_test')
>  {code}
>  * step3: sparksql query hive table while data in hbase
> {code:java}
> spark-sql --master yarn -e "select * from hivetest.hbase_test"
> {code}
>  
> The error log as follow: 
> java.io.IOException: Cannot create a record reader because of a previous 
> error. Please look at the previous logs lines from the task's full log for 
> more details.
>  at 
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:270)
>  at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:131)
>  at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
>  at scala.Option.getOrElse(Option.scala:189)
>  at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
>  at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
>  at scala.Option.getOrElse(Option.scala:189)
>  at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
>  at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
>  at scala.Option.getOrElse(Option.scala:189)
>  at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
>  at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
>  at scala.Option.getOrElse(Option.scala:189)
>  at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
>  at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
>  at scala.Option.getOrElse(Option.scala:189)
>  at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
>  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
>  at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>  at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
>  at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:385)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:412)
>  at 
> org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:58)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
>  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496)
>  at scala.collection.Iterator.foreach(Iterato

[jira] [Assigned] (SPARK-40957) Add in memory cache in HDFSMetadataLog

2022-11-02 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-40957:


Assignee: Boyang Jerry Peng

> Add in memory cache in HDFSMetadataLog
> --
>
> Key: SPARK-40957
> URL: https://issues.apache.org/jira/browse/SPARK-40957
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Boyang Jerry Peng
>Assignee: Boyang Jerry Peng
>Priority: Major
>
> Every time entries in offset log or commit log needs to be access, we read 
> from disk which is slow.  Can a cache of recent entries to speed up reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40957) Add in memory cache in HDFSMetadataLog

2022-11-02 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-40957.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38430
[https://github.com/apache/spark/pull/38430]

> Add in memory cache in HDFSMetadataLog
> --
>
> Key: SPARK-40957
> URL: https://issues.apache.org/jira/browse/SPARK-40957
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Boyang Jerry Peng
>Assignee: Boyang Jerry Peng
>Priority: Major
> Fix For: 3.4.0
>
>
> Every time entries in offset log or commit log needs to be access, we read 
> from disk which is slow.  Can a cache of recent entries to speed up reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40997) K8s resource name prefix should start w/ alphanumeric

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40997:


Assignee: Apache Spark

> K8s resource name prefix should start w/ alphanumeric
> -
>
> Key: SPARK-40997
> URL: https://issues.apache.org/jira/browse/SPARK-40997
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.1
>Reporter: Cheng Pan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40997) K8s resource name prefix should start w/ alphanumeric

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627695#comment-17627695
 ] 

Apache Spark commented on SPARK-40997:
--

User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/38483

> K8s resource name prefix should start w/ alphanumeric
> -
>
> Key: SPARK-40997
> URL: https://issues.apache.org/jira/browse/SPARK-40997
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.1
>Reporter: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40997) K8s resource name prefix should start w/ alphanumeric

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40997:


Assignee: (was: Apache Spark)

> K8s resource name prefix should start w/ alphanumeric
> -
>
> Key: SPARK-40997
> URL: https://issues.apache.org/jira/browse/SPARK-40997
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.1
>Reporter: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40997) K8s resource name prefix should start w/ alphanumeric

2022-11-02 Thread Cheng Pan (Jira)

Cheng Pan created SPARK-40997:
-

 Summary: K8s resource name prefix should start w/ alphanumeric
 Key: SPARK-40997
 URL: https://issues.apache.org/jira/browse/SPARK-40997
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.3.1
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40749) Migrate type check failures of generators onto error classes

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40749:


Assignee: Apache Spark

> Migrate type check failures of generators onto error classes
> 
>
> Key: SPARK-40749
> URL: https://issues.apache.org/jira/browse/SPARK-40749
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in the generator 
> expressions:
> 1. Stack (3): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L163-L170
> 2. ExplodeBase (1): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L299
> 3. Inline (1):
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L441



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40749) Migrate type check failures of generators onto error classes

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627668#comment-17627668
 ] 

Apache Spark commented on SPARK-40749:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38482

> Migrate type check failures of generators onto error classes
> 
>
> Key: SPARK-40749
> URL: https://issues.apache.org/jira/browse/SPARK-40749
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in the generator 
> expressions:
> 1. Stack (3): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L163-L170
> 2. ExplodeBase (1): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L299
> 3. Inline (1):
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L441



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40996) Upgrade `sbt-checkstyle-plugin` to 4.0.0

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40996:


Assignee: Apache Spark

> Upgrade `sbt-checkstyle-plugin` to 4.0.0
> 
>
> Key: SPARK-40996
> URL: https://issues.apache.org/jira/browse/SPARK-40996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> This is a precondition for upgrading sbt 1.7.3
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40996) Upgrade `sbt-checkstyle-plugin` to 4.0.0

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40996:


Assignee: (was: Apache Spark)

> Upgrade `sbt-checkstyle-plugin` to 4.0.0
> 
>
> Key: SPARK-40996
> URL: https://issues.apache.org/jira/browse/SPARK-40996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> This is a precondition for upgrading sbt 1.7.3
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40749) Migrate type check failures of generators onto error classes

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40749:


Assignee: (was: Apache Spark)

> Migrate type check failures of generators onto error classes
> 
>
> Key: SPARK-40749
> URL: https://issues.apache.org/jira/browse/SPARK-40749
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in the generator 
> expressions:
> 1. Stack (3): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L163-L170
> 2. ExplodeBase (1): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L299
> 3. Inline (1):
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L441



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40996) Upgrade `sbt-checkstyle-plugin` to 4.0.0

2022-11-02 Thread Yang Jie (Jira)

Yang Jie created SPARK-40996:


 Summary: Upgrade `sbt-checkstyle-plugin` to 4.0.0
 Key: SPARK-40996
 URL: https://issues.apache.org/jira/browse/SPARK-40996
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


This is a precondition for upgrading sbt 1.7.3

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40374) Migrate type check failures of type creators onto error classes

2022-11-02 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40374.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38463
[https://github.com/apache/spark/pull/38463]

> Migrate type check failures of type creators onto error classes
> ---
>
> Key: SPARK-40374
> URL: https://issues.apache.org/jira/browse/SPARK-40374
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: BingKun Pan
>Priority: Major
> Fix For: 3.4.0
>
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in the complex 
> type creator expressions:
> 1. CreateMap(3): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L205-L214
> 2. CreateNamedStruct(3): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L445-L457
> 3. UpdateFields(2): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L670-L673



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40374) Migrate type check failures of type creators onto error classes

2022-11-02 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-40374:


Assignee: BingKun Pan

> Migrate type check failures of type creators onto error classes
> ---
>
> Key: SPARK-40374
> URL: https://issues.apache.org/jira/browse/SPARK-40374
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: BingKun Pan
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in the complex 
> type creator expressions:
> 1. CreateMap(3): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L205-L214
> 2. CreateNamedStruct(3): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L445-L457
> 3. UpdateFields(2): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L670-L673



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40995) Developer Documentation for Spark Connect

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40995:


Assignee: (was: Apache Spark)

> Developer Documentation for Spark Connect
> -
>
> Key: SPARK-40995
> URL: https://issues.apache.org/jira/browse/SPARK-40995
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> Move the existing minimal doc into the right top level connect readme and add 
> new docs folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40248) Use larger number of bits to build bloom filter

2022-11-02 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-40248.
-
Fix Version/s: 3.4.0
 Assignee: Yuming Wang
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/37697

> Use larger number of bits to build bloom filter 
> 
>
> Key: SPARK-40248
> URL: https://issues.apache.org/jira/browse/SPARK-40248
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40995) Developer Documentation for Spark Connect

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627594#comment-17627594
 ] 

Apache Spark commented on SPARK-40995:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/38470

> Developer Documentation for Spark Connect
> -
>
> Key: SPARK-40995
> URL: https://issues.apache.org/jira/browse/SPARK-40995
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> Move the existing minimal doc into the right top level connect readme and add 
> new docs folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40995) Developer Documentation for Spark Connect

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627593#comment-17627593
 ] 

Apache Spark commented on SPARK-40995:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/38470

> Developer Documentation for Spark Connect
> -
>
> Key: SPARK-40995
> URL: https://issues.apache.org/jira/browse/SPARK-40995
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> Move the existing minimal doc into the right top level connect readme and add 
> new docs folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40995) Developer Documentation for Spark Connect

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40995:


Assignee: Apache Spark

> Developer Documentation for Spark Connect
> -
>
> Key: SPARK-40995
> URL: https://issues.apache.org/jira/browse/SPARK-40995
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>
> Move the existing minimal doc into the right top level connect readme and add 
> new docs folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40995) Developer Documentation for Spark Connect

2022-11-02 Thread Martin Grund (Jira)

Martin Grund created SPARK-40995:


 Summary: Developer Documentation for Spark Connect
 Key: SPARK-40995
 URL: https://issues.apache.org/jira/browse/SPARK-40995
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Move the existing minimal doc into the right top level connect readme and add 
new docs folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40994) Add code example for JDBC data source with partitionColumn

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40994:


Assignee: Apache Spark

> Add code example for JDBC data source with partitionColumn
> --
>
> Key: SPARK-40994
> URL: https://issues.apache.org/jira/browse/SPARK-40994
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 3.4.0
>Reporter: Cheng Su
>Assignee: Apache Spark
>Priority: Minor
>
> We should add code example for JDBC data source with partitionColumn in our 
> documentation - 
> [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html,] to better 
> guide users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40994) Add code example for JDBC data source with partitionColumn

2022-11-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40994:


Assignee: (was: Apache Spark)

> Add code example for JDBC data source with partitionColumn
> --
>
> Key: SPARK-40994
> URL: https://issues.apache.org/jira/browse/SPARK-40994
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 3.4.0
>Reporter: Cheng Su
>Priority: Minor
>
> We should add code example for JDBC data source with partitionColumn in our 
> documentation - 
> [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html,] to better 
> guide users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40994) Add code example for JDBC data source with partitionColumn

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627560#comment-17627560
 ] 

Apache Spark commented on SPARK-40994:
--

User 'c21' has created a pull request for this issue:
https://github.com/apache/spark/pull/38480

> Add code example for JDBC data source with partitionColumn
> --
>
> Key: SPARK-40994
> URL: https://issues.apache.org/jira/browse/SPARK-40994
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 3.4.0
>Reporter: Cheng Su
>Priority: Minor
>
> We should add code example for JDBC data source with partitionColumn in our 
> documentation - 
> [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html,] to better 
> guide users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40994) Add code example for JDBC data source with partitionColumn

2022-11-02 Thread Cheng Su (Jira)

Cheng Su created SPARK-40994:


 Summary: Add code example for JDBC data source with partitionColumn
 Key: SPARK-40994
 URL: https://issues.apache.org/jira/browse/SPARK-40994
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, SQL
Affects Versions: 3.4.0
Reporter: Cheng Su


We should add code example for JDBC data source with partitionColumn in our 
documentation - 
[https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html,] to better 
guide users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39399) proxy-user not working for Spark on k8s in cluster deploy mode

2022-11-02 Thread JiangHua Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627528#comment-17627528
 ] 

JiangHua Zhu commented on SPARK-39399:
--

It looks like HIVE_DELEGATION_TOKEN is not loaded and populated to 
Token#tokenKindMap.
Here are some sources of reference:
 !screenshot-1.png! 

We should first check the dependencies related to hive. [~unamesk15]

> proxy-user not working for Spark on k8s in cluster deploy mode
> --
>
> Key: SPARK-39399
> URL: https://issues.apache.org/jira/browse/SPARK-39399
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.0
>Reporter: Shrikant Prasad
>Priority: Major
> Attachments: screenshot-1.png
>
>
> As part of https://issues.apache.org/jira/browse/SPARK-25355 Proxy user 
> support was added for Spark on K8s. But the PR only added proxy user argument 
> on the spark-submit command. The actual functionality of authentication using 
> the proxy user is not working in case of cluster deploy mode.
> We get AccessControlException when trying to access the kerberized HDFS 
> through a proxy user. 
> Spark-Submit:
> $SPARK_HOME/bin/spark-submit \
> --master  \
> --deploy-mode cluster \
> --name with_proxy_user_di \
> --proxy-user  \
> --class org.apache.spark.examples.SparkPi \
> --conf spark.kubernetes.container.image= \
> --conf spark.kubernetes.driver.limit.cores=1 \
> --conf spark.executor.instances=1 \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
> --conf spark.kubernetes.namespace= \
> --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \
> --conf spark.eventLog.enabled=true \
> --conf spark.eventLog.dir=hdfs:///scaas/shs_logs \
> --conf spark.kubernetes.file.upload.path=hdfs:///tmp \
> --conf spark.kubernetes.container.image.pullPolicy=Always \
> $SPARK_HOME/examples/jars/spark-examples_2.12-3.2.0-1.jar 
> Driver Logs:
> {code:java}
> ++ id -u
> + myuid=185
> ++ id -g
> + mygid=0
> + set +e
> ++ getent passwd 185
> + uidentry=
> + set -e
> + '[' -z '' ']'
> + '[' -w /etc/passwd ']'
> + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
> + SPARK_CLASSPATH=':/opt/spark/jars/*'
> + env
> + grep SPARK_JAVA_OPT_
> + sort -t_ -k4 -n
> + sed 's/[^=]*=\(.*\)/\1/g'
> + readarray -t SPARK_EXECUTOR_JAVA_OPTS
> + '[' -n '' ']'
> + '[' -z ']'
> + '[' -z ']'
> + '[' -n '' ']'
> + '[' -z x ']'
> + SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
> + '[' -z x ']'
> + SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
> + case "$1" in
> + shift 1
> + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client 
> "$@")
> + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress= --deploy-mode client --proxy-user proxy_user 
> --properties-file /opt/spark/conf/spark.properties --class 
> org.apache.spark.examples.SparkPi spark-internal
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
> (file:/opt/spark/jars/spark-unsafe_2.12-3.2.0-1.jar) to constructor 
> java.nio.DirectByteBuffer(long,int)
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.spark.unsafe.Platform
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field 
> org.apache.hadoop.metrics2.lib.MutableRate 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
> annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", 
> sampleName="Ops", always=false, type=DEFAULT, value={"Rate of successful 
> kerberos logins and latency (milliseconds)"}, valueName="Time")
> 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field 
> org.apache.hadoop.metrics2.lib.MutableRate 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
> annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", 
> sampleName="Ops", always=false, type=DEFAULT, value={"Rate of failed kerberos 
> logins and latency (milliseconds)"}, valueName="Time")
> 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field 
> org.apache.hadoop.metrics2.lib.MutableRate 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
> annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", 
> sampleName="Ops", always=false, type=DEFAULT, value={"GetGroups"}, 
> valueName="Time")
> 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field private 
> org.apache.hadoop.metrics2.lib.MutableGaugeLong 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.r

[jira] [Updated] (SPARK-39399) proxy-user not working for Spark on k8s in cluster deploy mode

2022-11-02 Thread JiangHua Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu updated SPARK-39399:
-
Attachment: screenshot-1.png

> proxy-user not working for Spark on k8s in cluster deploy mode
> --
>
> Key: SPARK-39399
> URL: https://issues.apache.org/jira/browse/SPARK-39399
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.0
>Reporter: Shrikant Prasad
>Priority: Major
> Attachments: screenshot-1.png
>
>
> As part of https://issues.apache.org/jira/browse/SPARK-25355 Proxy user 
> support was added for Spark on K8s. But the PR only added proxy user argument 
> on the spark-submit command. The actual functionality of authentication using 
> the proxy user is not working in case of cluster deploy mode.
> We get AccessControlException when trying to access the kerberized HDFS 
> through a proxy user. 
> Spark-Submit:
> $SPARK_HOME/bin/spark-submit \
> --master  \
> --deploy-mode cluster \
> --name with_proxy_user_di \
> --proxy-user  \
> --class org.apache.spark.examples.SparkPi \
> --conf spark.kubernetes.container.image= \
> --conf spark.kubernetes.driver.limit.cores=1 \
> --conf spark.executor.instances=1 \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
> --conf spark.kubernetes.namespace= \
> --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \
> --conf spark.eventLog.enabled=true \
> --conf spark.eventLog.dir=hdfs:///scaas/shs_logs \
> --conf spark.kubernetes.file.upload.path=hdfs:///tmp \
> --conf spark.kubernetes.container.image.pullPolicy=Always \
> $SPARK_HOME/examples/jars/spark-examples_2.12-3.2.0-1.jar 
> Driver Logs:
> {code:java}
> ++ id -u
> + myuid=185
> ++ id -g
> + mygid=0
> + set +e
> ++ getent passwd 185
> + uidentry=
> + set -e
> + '[' -z '' ']'
> + '[' -w /etc/passwd ']'
> + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
> + SPARK_CLASSPATH=':/opt/spark/jars/*'
> + env
> + grep SPARK_JAVA_OPT_
> + sort -t_ -k4 -n
> + sed 's/[^=]*=\(.*\)/\1/g'
> + readarray -t SPARK_EXECUTOR_JAVA_OPTS
> + '[' -n '' ']'
> + '[' -z ']'
> + '[' -z ']'
> + '[' -n '' ']'
> + '[' -z x ']'
> + SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
> + '[' -z x ']'
> + SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
> + case "$1" in
> + shift 1
> + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client 
> "$@")
> + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress= --deploy-mode client --proxy-user proxy_user 
> --properties-file /opt/spark/conf/spark.properties --class 
> org.apache.spark.examples.SparkPi spark-internal
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
> (file:/opt/spark/jars/spark-unsafe_2.12-3.2.0-1.jar) to constructor 
> java.nio.DirectByteBuffer(long,int)
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.spark.unsafe.Platform
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field 
> org.apache.hadoop.metrics2.lib.MutableRate 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
> annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", 
> sampleName="Ops", always=false, type=DEFAULT, value={"Rate of successful 
> kerberos logins and latency (milliseconds)"}, valueName="Time")
> 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field 
> org.apache.hadoop.metrics2.lib.MutableRate 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
> annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", 
> sampleName="Ops", always=false, type=DEFAULT, value={"Rate of failed kerberos 
> logins and latency (milliseconds)"}, valueName="Time")
> 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field 
> org.apache.hadoop.metrics2.lib.MutableRate 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
> annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", 
> sampleName="Ops", always=false, type=DEFAULT, value={"GetGroups"}, 
> valueName="Time")
> 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field private 
> org.apache.hadoop.metrics2.lib.MutableGaugeLong 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal
>  with annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", 
> sampleName="Ops", always=false, type=DEFAULT, value={"Renewal failures since 
> startup"}, valueName="Time")
> 22/04/26 08:54:38 DEBUG Mutable

[jira] [Commented] (SPARK-40697) Add read-side char/varchar handling to cover external data files

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627526#comment-17627526
 ] 

Apache Spark commented on SPARK-40697:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/38479

> Add read-side char/varchar handling to cover external data files
> 
>
> Key: SPARK-40697
> URL: https://issues.apache.org/jira/browse/SPARK-40697
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40697) Add read-side char/varchar handling to cover external data files

2022-11-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627527#comment-17627527
 ] 

Apache Spark commented on SPARK-40697:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/38479

> Add read-side char/varchar handling to cover external data files
> 
>
> Key: SPARK-40697
> URL: https://issues.apache.org/jira/browse/SPARK-40697
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40993) Migrate markdown style README to python/docs/development/testing.rst

2022-11-02 Thread Vivek Garg (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627502#comment-17627502
 ] 

Vivek Garg commented on SPARK-40993:


Hii,
I think you got the answer.

[Salesforce Marketing Cloud 
Certification|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]

> Migrate markdown style README to python/docs/development/testing.rst
> 
>
> Key: SPARK-40993
> URL: https://issues.apache.org/jira/browse/SPARK-40993
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

74 matches

Mail list logo