[jira] [Updated] (SPARK-40999) Hints on subqueries are not properly propagated
[ https://issues.apache.org/jira/browse/SPARK-40999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40999: Fix Version/s: (was: 3.4.0) > Hints on subqueries are not properly propagated > --- > > Key: SPARK-40999 > URL: https://issues.apache.org/jira/browse/SPARK-40999 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, > 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.4.0, 3.3.1 >Reporter: Fredrik Klauß >Priority: Major > > Currently, if a user tries to specify a query like the following, the hints > on the subquery will be lost. > {code:java} > SELECT * FROM target t WHERE EXISTS > (SELECT /*+ BROADCAST */ * FROM source s WHERE s.key = t.key){code} > This happens as hints are removed from the plan and pulled into joins in the > beginning of the optimization stage, but subqueries are only turned into > joins during optimization. As we remove any hints that are not below a join, > we end up removing hints that are below a subquery. > > To resolve this, we add a hint field to SubqueryExpression that any hints > inside a subquery's plan can be pulled into during EliminateResolvedHint, and > then pass this hint on when the subquery is turned into a join. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40998) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040
[ https://issues.apache.org/jira/browse/SPARK-40998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40998. -- Resolution: Fixed Issue resolved by pull request 38484 [https://github.com/apache/spark/pull/38484] > Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040 > --- > > Key: SPARK-40998 > URL: https://issues.apache.org/jira/browse/SPARK-40998 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40977) Complete Support for Union in Python client
[ https://issues.apache.org/jira/browse/SPARK-40977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40977. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38453 [https://github.com/apache/spark/pull/38453] > Complete Support for Union in Python client > --- > > Key: SPARK-40977 > URL: https://issues.apache.org/jira/browse/SPARK-40977 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40977) Complete Support for Union in Python client
[ https://issues.apache.org/jira/browse/SPARK-40977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40977: Assignee: Rui Wang > Complete Support for Union in Python client > --- > > Key: SPARK-40977 > URL: https://issues.apache.org/jira/browse/SPARK-40977 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41003) BHJ LeftAnti does not update numOutputRows when codegen is disabled
[ https://issues.apache.org/jira/browse/SPARK-41003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628113#comment-17628113 ] Apache Spark commented on SPARK-41003: -- User 'cxzl25' has created a pull request for this issue: https://github.com/apache/spark/pull/38489 > BHJ LeftAnti does not update numOutputRows when codegen is disabled > --- > > Key: SPARK-41003 > URL: https://issues.apache.org/jira/browse/SPARK-41003 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: dzcxzl >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41003) BHJ LeftAnti does not update numOutputRows when codegen is disabled
[ https://issues.apache.org/jira/browse/SPARK-41003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41003: Assignee: (was: Apache Spark) > BHJ LeftAnti does not update numOutputRows when codegen is disabled > --- > > Key: SPARK-41003 > URL: https://issues.apache.org/jira/browse/SPARK-41003 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: dzcxzl >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41003) BHJ LeftAnti does not update numOutputRows when codegen is disabled
[ https://issues.apache.org/jira/browse/SPARK-41003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41003: Assignee: Apache Spark > BHJ LeftAnti does not update numOutputRows when codegen is disabled > --- > > Key: SPARK-41003 > URL: https://issues.apache.org/jira/browse/SPARK-41003 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: dzcxzl >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41003) BHJ LeftAnti does not update numOutputRows when codegen is disabled
dzcxzl created SPARK-41003: -- Summary: BHJ LeftAnti does not update numOutputRows when codegen is disabled Key: SPARK-41003 URL: https://issues.apache.org/jira/browse/SPARK-41003 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Reporter: dzcxzl -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41002) Compatible `take` and `head` API in Python client
[ https://issues.apache.org/jira/browse/SPARK-41002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628104#comment-17628104 ] Apache Spark commented on SPARK-41002: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38488 > Compatible `take` and `head` API in Python client > -- > > Key: SPARK-41002 > URL: https://issues.apache.org/jira/browse/SPARK-41002 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41002) Compatible `take` and `head` API in Python client
[ https://issues.apache.org/jira/browse/SPARK-41002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628103#comment-17628103 ] Apache Spark commented on SPARK-41002: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38488 > Compatible `take` and `head` API in Python client > -- > > Key: SPARK-41002 > URL: https://issues.apache.org/jira/browse/SPARK-41002 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41002) Compatible `take` and `head` API in Python client
[ https://issues.apache.org/jira/browse/SPARK-41002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41002: Assignee: (was: Apache Spark) > Compatible `take` and `head` API in Python client > -- > > Key: SPARK-41002 > URL: https://issues.apache.org/jira/browse/SPARK-41002 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41002) Compatible `take` and `head` API in Python client
[ https://issues.apache.org/jira/browse/SPARK-41002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41002: Assignee: Apache Spark > Compatible `take` and `head` API in Python client > -- > > Key: SPARK-41002 > URL: https://issues.apache.org/jira/browse/SPARK-41002 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41002) Compatible `take` and `head` API in Python client
Rui Wang created SPARK-41002: Summary: Compatible `take` and `head` API in Python client Key: SPARK-41002 URL: https://issues.apache.org/jira/browse/SPARK-41002 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0
[ https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628091#comment-17628091 ] Yang Jie commented on SPARK-34007: -- [~hyukjin.kwon] Since SPARK-40651 dropped Hadoop2 binary distribution from release process, will this issue still exist? > Downgrade scala-maven-plugin to 4.3.0 > - > > Key: SPARK-34007 > URL: https://issues.apache.org/jira/browse/SPARK-34007 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Blocker > Fix For: 3.1.0 > > > After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker > release script fails as below: > {code} > [INFO] Compiling 21 Scala sources and 3 Java sources to > /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes > ... > [ERROR] ## Exception when compiling 24 sources to > /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes > java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s > signer information does not match signer information of other classes in the > same package > java.lang.ClassLoader.checkCerts(ClassLoader.java:891) > java.lang.ClassLoader.preDefineClass(ClassLoader.java:661) > java.lang.ClassLoader.defineClass(ClassLoader.java:754) > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > java.net.URLClassLoader.access$100(URLClassLoader.java:74) > java.net.URLClassLoader$1.run(URLClassLoader.java:369) > java.net.URLClassLoader$1.run(URLClassLoader.java:363) > java.security.AccessController.doPrivileged(Native Method) > java.net.URLClassLoader.findClass(URLClassLoader.java:362) > java.lang.ClassLoader.loadClass(ClassLoader.java:418) > java.lang.ClassLoader.loadClass(ClassLoader.java:351) > java.lang.Class.getDeclaredMethods0(Native Method) > java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > java.lang.Class.privateGetPublicMethods(Class.java:2902) > java.lang.Class.getMethods(Class.java:1615) > sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170) > sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123) > scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86) > sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123) > sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33) > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40995) Developer Documentation for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628072#comment-17628072 ] Apache Spark commented on SPARK-40995: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38487 > Developer Documentation for Spark Connect > - > > Key: SPARK-40995 > URL: https://issues.apache.org/jira/browse/SPARK-40995 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Fix For: 3.4.0 > > > Move the existing minimal doc into the right top level connect readme and add > new docs folder. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40989) Improve `session.sql` testing coverage in Python client
[ https://issues.apache.org/jira/browse/SPARK-40989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40989. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38472 [https://github.com/apache/spark/pull/38472] > Improve `session.sql` testing coverage in Python client > --- > > Key: SPARK-40989 > URL: https://issues.apache.org/jira/browse/SPARK-40989 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40989) Improve `session.sql` testing coverage in Python client
[ https://issues.apache.org/jira/browse/SPARK-40989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40989: Assignee: Rui Wang > Improve `session.sql` testing coverage in Python client > --- > > Key: SPARK-40989 > URL: https://issues.apache.org/jira/browse/SPARK-40989 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40995) Developer Documentation for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40995. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38470 [https://github.com/apache/spark/pull/38470] > Developer Documentation for Spark Connect > - > > Key: SPARK-40995 > URL: https://issues.apache.org/jira/browse/SPARK-40995 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Fix For: 3.4.0 > > > Move the existing minimal doc into the right top level connect readme and add > new docs folder. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40995) Developer Documentation for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40995: Assignee: Martin Grund > Developer Documentation for Spark Connect > - > > Key: SPARK-40995 > URL: https://issues.apache.org/jira/browse/SPARK-40995 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > > Move the existing minimal doc into the right top level connect readme and add > new docs folder. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41000) Make CommandResult extend Command
[ https://issues.apache.org/jira/browse/SPARK-41000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kelvin Jiang updated SPARK-41000: - Description: CommandResult is the logical plan node that stores the results from a command. We want this to still be considered a command, rather than e.g. a query, so we should extend the trait Command which would allow it to pass various checks for commands (such as [this one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]). (was: CommandResult is the logical plan node that stores the results from a command. We want this to still be considered a command, rather than e.g. a query, so extending the trait Command would allow it to pass various checks for commands (such as [this one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]).) > Make CommandResult extend Command > - > > Key: SPARK-41000 > URL: https://issues.apache.org/jira/browse/SPARK-41000 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kelvin Jiang >Priority: Major > > CommandResult is the logical plan node that stores the results from a > command. We want this to still be considered a command, rather than e.g. a > query, so we should extend the trait Command which would allow it to pass > various checks for commands (such as [this > one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-39405) NumPy input support in PySpark SQL
[ https://issues.apache.org/jira/browse/SPARK-39405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627984#comment-17627984 ] Xinrong Meng edited comment on SPARK-39405 at 11/2/22 9:10 PM: --- Hi [~douglas.mo...@databricks.com] the fix is in. was (Author: xinrongm): Hi [~douglas.mo...@databricks.com] the commit is in. > NumPy input support in PySpark SQL > -- > > Key: SPARK-39405 > URL: https://issues.apache.org/jira/browse/SPARK-39405 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > NumPy is the fundamental package for scientific computing with Python. It is > very commonly used, especially in the data science world. For example, Pandas > is backed by NumPy, and Tensors also supports interchangeable conversion > from/to NumPy arrays. > > However, PySpark only supports Python built-in types with the exception of > “SparkSession.createDataFrame(pandas.DataFrame)” and “DataFrame.toPandas”. > > This issue has been raised multiple times internally and externally, see also > SPARK-2012, SPARK-37697, SPARK-31776, and SPARK-6857. > > With the NumPy support in SQL, we expect more adaptations from naive data > scientists and newcomers leveraging their existing background and codebase > with NumPy. > > See more > [https://docs.google.com/document/d/1WsBiHoQB3UWERP47C47n_frffxZ9YIoGRwXSwIeMank/edit#] > . -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37697) Make it easier to convert numpy arrays to Spark Dataframes
[ https://issues.apache.org/jira/browse/SPARK-37697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627985#comment-17627985 ] Xinrong Meng commented on SPARK-37697: -- The commit is in. > Make it easier to convert numpy arrays to Spark Dataframes > -- > > Key: SPARK-37697 > URL: https://issues.apache.org/jira/browse/SPARK-37697 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.2 >Reporter: Douglas Moore >Priority: Major > Attachments: image-2022-10-31-22-49-37-356.png > > > Make it easier to convert numpy arrays to dataframes. > Often we receive errors: > > {code:java} > df = spark.createDataFrame(numpy.arange(10)) > Can not infer schema for type: > {code} > > OR > {code:java} > df = spark.createDataFrame(numpy.arange(10.)) > Can not infer schema for type: > {code} > > Today (Spark 3.x) we have to: > {code:java} > spark.createDataFrame(pd.DataFrame(numpy.arange(10.))) {code} > Make this easier with a direct conversion from Numpy arrays to Spark > Dataframes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40990) DataFrame creation from 2d NumPy array with arbitrary columns
[ https://issues.apache.org/jira/browse/SPARK-40990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627983#comment-17627983 ] Xinrong Meng commented on SPARK-40990: -- Hi [~douglas.mo...@databricks.com] Any size of the 2d ndarray works, as long as it fits into the memory since ndarray is not distributed. > DataFrame creation from 2d NumPy array with arbitrary columns > - > > Key: SPARK-40990 > URL: https://issues.apache.org/jira/browse/SPARK-40990 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.4.0 > > > Currently, DataFrame creation from 2d ndarray works only with 2 columns. We > should provide complete support for DataFrame creation with 2d ndarray. > For example, the test case below should work as shown below. > > {code:java} > >>> spark.createDataFrame(np.arange(100).reshape([10,10])).show() > +---+---+---+---+---+---+---+---+---+---+ > > | _1| _2| _3| _4| _5| _6| _7| _8| _9|_10| > +---+---+---+---+---+---+---+---+---+---+ > | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9| > | 10| 11| 12| 13| 14| 15| 16| 17| 18| 19| > | 20| 21| 22| 23| 24| 25| 26| 27| 28| 29| > | 30| 31| 32| 33| 34| 35| 36| 37| 38| 39| > | 40| 41| 42| 43| 44| 45| 46| 47| 48| 49| > | 50| 51| 52| 53| 54| 55| 56| 57| 58| 59| > | 60| 61| 62| 63| 64| 65| 66| 67| 68| 69| > | 70| 71| 72| 73| 74| 75| 76| 77| 78| 79| > | 80| 81| 82| 83| 84| 85| 86| 87| 88| 89| > | 90| 91| 92| 93| 94| 95| 96| 97| 98| 99| > +---+---+---+---+---+---+---+---+---+---+ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39405) NumPy input support in PySpark SQL
[ https://issues.apache.org/jira/browse/SPARK-39405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627984#comment-17627984 ] Xinrong Meng commented on SPARK-39405: -- Hi [~douglas.mo...@databricks.com] the commit is in. > NumPy input support in PySpark SQL > -- > > Key: SPARK-39405 > URL: https://issues.apache.org/jira/browse/SPARK-39405 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > NumPy is the fundamental package for scientific computing with Python. It is > very commonly used, especially in the data science world. For example, Pandas > is backed by NumPy, and Tensors also supports interchangeable conversion > from/to NumPy arrays. > > However, PySpark only supports Python built-in types with the exception of > “SparkSession.createDataFrame(pandas.DataFrame)” and “DataFrame.toPandas”. > > This issue has been raised multiple times internally and externally, see also > SPARK-2012, SPARK-37697, SPARK-31776, and SPARK-6857. > > With the NumPy support in SQL, we expect more adaptations from naive data > scientists and newcomers leveraging their existing background and codebase > with NumPy. > > See more > [https://docs.google.com/document/d/1WsBiHoQB3UWERP47C47n_frffxZ9YIoGRwXSwIeMank/edit#] > . -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41000) Make CommandResult extend Command
[ https://issues.apache.org/jira/browse/SPARK-41000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627971#comment-17627971 ] Apache Spark commented on SPARK-41000: -- User 'kelvinjian-db' has created a pull request for this issue: https://github.com/apache/spark/pull/38486 > Make CommandResult extend Command > - > > Key: SPARK-41000 > URL: https://issues.apache.org/jira/browse/SPARK-41000 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kelvin Jiang >Priority: Major > > CommandResult is the logical plan node that stores the results from a > command. We want this to still be considered a command, rather than e.g. a > query, so extending the trait Command would allow it to pass various checks > for commands (such as [this > one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41000) Make CommandResult extend Command
[ https://issues.apache.org/jira/browse/SPARK-41000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41000: Assignee: Apache Spark > Make CommandResult extend Command > - > > Key: SPARK-41000 > URL: https://issues.apache.org/jira/browse/SPARK-41000 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kelvin Jiang >Assignee: Apache Spark >Priority: Major > > CommandResult is the logical plan node that stores the results from a > command. We want this to still be considered a command, rather than e.g. a > query, so extending the trait Command would allow it to pass various checks > for commands (such as [this > one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41000) Make CommandResult extend Command
[ https://issues.apache.org/jira/browse/SPARK-41000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41000: Assignee: (was: Apache Spark) > Make CommandResult extend Command > - > > Key: SPARK-41000 > URL: https://issues.apache.org/jira/browse/SPARK-41000 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kelvin Jiang >Priority: Major > > CommandResult is the logical plan node that stores the results from a > command. We want this to still be considered a command, rather than e.g. a > query, so extending the trait Command would allow it to pass various checks > for commands (such as [this > one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41001) Connection string support for Python client
[ https://issues.apache.org/jira/browse/SPARK-41001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627936#comment-17627936 ] Apache Spark commented on SPARK-41001: -- User 'grundprinzip' has created a pull request for this issue: https://github.com/apache/spark/pull/38485 > Connection string support for Python client > --- > > Key: SPARK-41001 > URL: https://issues.apache.org/jira/browse/SPARK-41001 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41001) Connection string support for Python client
[ https://issues.apache.org/jira/browse/SPARK-41001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41001: Assignee: (was: Apache Spark) > Connection string support for Python client > --- > > Key: SPARK-41001 > URL: https://issues.apache.org/jira/browse/SPARK-41001 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41001) Connection string support for Python client
[ https://issues.apache.org/jira/browse/SPARK-41001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41001: Assignee: Apache Spark > Connection string support for Python client > --- > > Key: SPARK-41001 > URL: https://issues.apache.org/jira/browse/SPARK-41001 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41001) Connection string support for Python client
Martin Grund created SPARK-41001: Summary: Connection string support for Python client Key: SPARK-41001 URL: https://issues.apache.org/jira/browse/SPARK-41001 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41000) Make CommandResult extend Command
Kelvin Jiang created SPARK-41000: Summary: Make CommandResult extend Command Key: SPARK-41000 URL: https://issues.apache.org/jira/browse/SPARK-41000 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Kelvin Jiang CommandResult is the logical plan node that stores the results from a command. We want this to still be considered a command, rather than e.g. a query, so extending the trait Command would allow it to pass various checks for commands (such as [this one|https://github.com/apache/spark/blob/f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L54-L57]). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40999) Hints on subqueries are not properly propagated
[ https://issues.apache.org/jira/browse/SPARK-40999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fredrik Klauß updated SPARK-40999: -- Affects Version/s: 3.4.0 > Hints on subqueries are not properly propagated > --- > > Key: SPARK-40999 > URL: https://issues.apache.org/jira/browse/SPARK-40999 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, > 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.4.0, 3.3.1 >Reporter: Fredrik Klauß >Priority: Major > Fix For: 3.4.0 > > > Currently, if a user tries to specify a query like the following, the hints > on the subquery will be lost. > {code:java} > SELECT * FROM target t WHERE EXISTS > (SELECT /*+ BROADCAST */ * FROM source s WHERE s.key = t.key){code} > This happens as hints are removed from the plan and pulled into joins in the > beginning of the optimization stage, but subqueries are only turned into > joins during optimization. As we remove any hints that are not below a join, > we end up removing hints that are below a subquery. > > To resolve this, we add a hint field to SubqueryExpression that any hints > inside a subquery's plan can be pulled into during EliminateResolvedHint, and > then pass this hint on when the subquery is turned into a join. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40999) Hints on subqueries are not properly propagated
Fredrik Klauß created SPARK-40999: - Summary: Hints on subqueries are not properly propagated Key: SPARK-40999 URL: https://issues.apache.org/jira/browse/SPARK-40999 Project: Spark Issue Type: Bug Components: Optimizer, Spark Core Affects Versions: 3.3.1, 3.2.2, 3.3.0, 3.2.1, 3.1.3, 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0 Reporter: Fredrik Klauß Fix For: 3.4.0 Currently, if a user tries to specify a query like the following, the hints on the subquery will be lost. {code:java} SELECT * FROM target t WHERE EXISTS (SELECT /*+ BROADCAST */ * FROM source s WHERE s.key = t.key){code} This happens as hints are removed from the plan and pulled into joins in the beginning of the optimization stage, but subqueries are only turned into joins during optimization. As we remove any hints that are not below a join, we end up removing hints that are below a subquery. To resolve this, we add a hint field to SubqueryExpression that any hints inside a subquery's plan can be pulled into during EliminateResolvedHint, and then pass this hint on when the subquery is turned into a join. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40985) Upgrade RoaringBitmap to 0.9.35
[ https://issues.apache.org/jira/browse/SPARK-40985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-40985: Assignee: Yang Jie > Upgrade RoaringBitmap to 0.9.35 > --- > > Key: SPARK-40985 > URL: https://issues.apache.org/jira/browse/SPARK-40985 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40985) Upgrade RoaringBitmap to 0.9.35
[ https://issues.apache.org/jira/browse/SPARK-40985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40985. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38465 [https://github.com/apache/spark/pull/38465] > Upgrade RoaringBitmap to 0.9.35 > --- > > Key: SPARK-40985 > URL: https://issues.apache.org/jira/browse/SPARK-40985 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40998) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040
[ https://issues.apache.org/jira/browse/SPARK-40998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627783#comment-17627783 ] Apache Spark commented on SPARK-40998: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/38484 > Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040 > --- > > Key: SPARK-40998 > URL: https://issues.apache.org/jira/browse/SPARK-40998 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40998) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040
[ https://issues.apache.org/jira/browse/SPARK-40998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40998: Assignee: Apache Spark (was: Max Gekk) > Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040 > --- > > Key: SPARK-40998 > URL: https://issues.apache.org/jira/browse/SPARK-40998 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40998) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040
[ https://issues.apache.org/jira/browse/SPARK-40998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40998: Assignee: Max Gekk (was: Apache Spark) > Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040 > --- > > Key: SPARK-40998 > URL: https://issues.apache.org/jira/browse/SPARK-40998 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40998) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040
[ https://issues.apache.org/jira/browse/SPARK-40998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627785#comment-17627785 ] Apache Spark commented on SPARK-40998: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/38484 > Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040 > --- > > Key: SPARK-40998 > URL: https://issues.apache.org/jira/browse/SPARK-40998 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40998) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040
Max Gekk created SPARK-40998: Summary: Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0040 Key: SPARK-40998 URL: https://issues.apache.org/jira/browse/SPARK-40998 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Max Gekk Assignee: Max Gekk Fix For: 3.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33782) Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode
[ https://issues.apache.org/jira/browse/SPARK-33782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627743#comment-17627743 ] Daniel Glöckner edited comment on SPARK-33782 at 11/2/22 2:34 PM: -- Will this fix repair the {{--jars}} flag and will JARs be added automatically to the driver and executor class path when using {{spark.kubernetes.file.upload.path}} / {{file://}} URIs? https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management https://spark.apache.org/docs/3.2.0/submitting-applications.html {quote} When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included in the driver and executor classpaths. {quote} was (Author: JIRAUSER288949): Will this fix repair the {{--jars}} flag and will JARs be added automatically to the driver and executor class path when using {{spark.kubernetes.file.upload.path}} / {{file://}} URIs? https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management https://spark.apache.org/docs/3.2.0/submitting-applications.html ?? When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included in the driver and executor classpaths. ?? > Place spark.files, spark.jars and spark.files under the current working > directory on the driver in K8S cluster mode > --- > > Key: SPARK-33782 > URL: https://issues.apache.org/jira/browse/SPARK-33782 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > In Yarn cluster modes, the passed files are able to be accessed in the > current working directory. Looks like this is not the case in Kubernates > cluset mode. > By doing this, users can, for example, leverage PEX to manage Python > dependences in Apache Spark: > {code} > pex pyspark==3.0.1 pyarrow==0.15.1 pandas==0.25.3 -o myarchive.pex > PYSPARK_PYTHON=./myarchive.pex spark-submit --files myarchive.pex > {code} > See also https://github.com/apache/spark/pull/30735/files#r540935585. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33782) Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode
[ https://issues.apache.org/jira/browse/SPARK-33782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627743#comment-17627743 ] Daniel Glöckner edited comment on SPARK-33782 at 11/2/22 2:33 PM: -- Will this fix repair the {{--jars}} flag and will JARs be added automatically to the driver and executor class path when using {{spark.kubernetes.file.upload.path}} / {{file://}} URIs? https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management https://spark.apache.org/docs/3.2.0/submitting-applications.html ?? When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included in the driver and executor classpaths. ?? was (Author: JIRAUSER288949): The this fix repair the {{--jars}} flag and will JARs be added automatically to the driver and executor class path when using {{spark.kubernetes.file.upload.path}} / {{file://}} URIs? https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management https://spark.apache.org/docs/3.2.0/submitting-applications.html ?? When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included in the driver and executor classpaths. ?? > Place spark.files, spark.jars and spark.files under the current working > directory on the driver in K8S cluster mode > --- > > Key: SPARK-33782 > URL: https://issues.apache.org/jira/browse/SPARK-33782 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > In Yarn cluster modes, the passed files are able to be accessed in the > current working directory. Looks like this is not the case in Kubernates > cluset mode. > By doing this, users can, for example, leverage PEX to manage Python > dependences in Apache Spark: > {code} > pex pyspark==3.0.1 pyarrow==0.15.1 pandas==0.25.3 -o myarchive.pex > PYSPARK_PYTHON=./myarchive.pex spark-submit --files myarchive.pex > {code} > See also https://github.com/apache/spark/pull/30735/files#r540935585. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33782) Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode
[ https://issues.apache.org/jira/browse/SPARK-33782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627743#comment-17627743 ] Daniel Glöckner commented on SPARK-33782: - The this fix repair the {{--jars}} flag and will JARs be added automatically to the driver and executor class path when using {{spark.kubernetes.file.upload.path}} / {{file://}} URIs? https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management https://spark.apache.org/docs/3.2.0/submitting-applications.html ?? When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included in the driver and executor classpaths. ?? > Place spark.files, spark.jars and spark.files under the current working > directory on the driver in K8S cluster mode > --- > > Key: SPARK-33782 > URL: https://issues.apache.org/jira/browse/SPARK-33782 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > In Yarn cluster modes, the passed files are able to be accessed in the > current working directory. Looks like this is not the case in Kubernates > cluset mode. > By doing this, users can, for example, leverage PEX to manage Python > dependences in Apache Spark: > {code} > pex pyspark==3.0.1 pyarrow==0.15.1 pandas==0.25.3 -o myarchive.pex > PYSPARK_PYTHON=./myarchive.pex spark-submit --files myarchive.pex > {code} > See also https://github.com/apache/spark/pull/30735/files#r540935585. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32380) sparksql cannot access hive table while data in hbase
[ https://issues.apache.org/jira/browse/SPARK-32380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627704#comment-17627704 ] Ranga Reddy commented on SPARK-32380: - The below pull request will solve the issue but needs to check if there are any other issues. [https://github.com/apache/spark/pull/29178] > sparksql cannot access hive table while data in hbase > - > > Key: SPARK-32380 > URL: https://issues.apache.org/jira/browse/SPARK-32380 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 > Environment: ||component||version|| > |hadoop|2.8.5| > |hive|2.3.7| > |spark|3.0.0| > |hbase|1.4.9| >Reporter: deyzhong >Priority: Major > Original Estimate: 72h > Remaining Estimate: 72h > > * step1: create hbase table > {code:java} > hbase(main):001:0>create 'hbase_test1', 'cf1' > hbase(main):001:0> put 'hbase_test', 'r1', 'cf1:c1', '123' > {code} > * step2: create hive table related to hbase table > > {code:java} > hive> > CREATE EXTERNAL TABLE `hivetest.hbase_test`( > `key` string COMMENT '', > `value` string COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.hbase.HBaseSerDe' > STORED BY > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ( > 'hbase.columns.mapping'=':key,cf1:v1', > 'serialization.format'='1') > TBLPROPERTIES ( > 'hbase.table.name'='hbase_test') > {code} > * step3: sparksql query hive table while data in hbase > {code:java} > spark-sql --master yarn -e "select * from hivetest.hbase_test" > {code} > > The error log as follow: > java.io.IOException: Cannot create a record reader because of a previous > error. Please look at the previous logs lines from the task's full log for > more details. > at > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:270) > at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:131) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:272) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:272) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:272) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:272) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:272) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158) > at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:388) > at org.apache.spark.rdd.RDD.collect(RDD.scala:1003) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:385) > at > org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:412) > at > org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:58) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496) > at scala.collection.Iterator.foreach(Iterato
[jira] [Assigned] (SPARK-40957) Add in memory cache in HDFSMetadataLog
[ https://issues.apache.org/jira/browse/SPARK-40957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-40957: Assignee: Boyang Jerry Peng > Add in memory cache in HDFSMetadataLog > -- > > Key: SPARK-40957 > URL: https://issues.apache.org/jira/browse/SPARK-40957 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Boyang Jerry Peng >Assignee: Boyang Jerry Peng >Priority: Major > > Every time entries in offset log or commit log needs to be access, we read > from disk which is slow. Can a cache of recent entries to speed up reads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40957) Add in memory cache in HDFSMetadataLog
[ https://issues.apache.org/jira/browse/SPARK-40957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-40957. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38430 [https://github.com/apache/spark/pull/38430] > Add in memory cache in HDFSMetadataLog > -- > > Key: SPARK-40957 > URL: https://issues.apache.org/jira/browse/SPARK-40957 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Boyang Jerry Peng >Assignee: Boyang Jerry Peng >Priority: Major > Fix For: 3.4.0 > > > Every time entries in offset log or commit log needs to be access, we read > from disk which is slow. Can a cache of recent entries to speed up reads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40997) K8s resource name prefix should start w/ alphanumeric
[ https://issues.apache.org/jira/browse/SPARK-40997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40997: Assignee: Apache Spark > K8s resource name prefix should start w/ alphanumeric > - > > Key: SPARK-40997 > URL: https://issues.apache.org/jira/browse/SPARK-40997 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.1 >Reporter: Cheng Pan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40997) K8s resource name prefix should start w/ alphanumeric
[ https://issues.apache.org/jira/browse/SPARK-40997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627695#comment-17627695 ] Apache Spark commented on SPARK-40997: -- User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/38483 > K8s resource name prefix should start w/ alphanumeric > - > > Key: SPARK-40997 > URL: https://issues.apache.org/jira/browse/SPARK-40997 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.1 >Reporter: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40997) K8s resource name prefix should start w/ alphanumeric
[ https://issues.apache.org/jira/browse/SPARK-40997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40997: Assignee: (was: Apache Spark) > K8s resource name prefix should start w/ alphanumeric > - > > Key: SPARK-40997 > URL: https://issues.apache.org/jira/browse/SPARK-40997 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.1 >Reporter: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40997) K8s resource name prefix should start w/ alphanumeric
Cheng Pan created SPARK-40997: - Summary: K8s resource name prefix should start w/ alphanumeric Key: SPARK-40997 URL: https://issues.apache.org/jira/browse/SPARK-40997 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.3.1 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40749) Migrate type check failures of generators onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40749: Assignee: Apache Spark > Migrate type check failures of generators onto error classes > > > Key: SPARK-40749 > URL: https://issues.apache.org/jira/browse/SPARK-40749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in the generator > expressions: > 1. Stack (3): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L163-L170 > 2. ExplodeBase (1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L299 > 3. Inline (1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L441 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40749) Migrate type check failures of generators onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627668#comment-17627668 ] Apache Spark commented on SPARK-40749: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38482 > Migrate type check failures of generators onto error classes > > > Key: SPARK-40749 > URL: https://issues.apache.org/jira/browse/SPARK-40749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in the generator > expressions: > 1. Stack (3): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L163-L170 > 2. ExplodeBase (1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L299 > 3. Inline (1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L441 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40996) Upgrade `sbt-checkstyle-plugin` to 4.0.0
[ https://issues.apache.org/jira/browse/SPARK-40996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40996: Assignee: Apache Spark > Upgrade `sbt-checkstyle-plugin` to 4.0.0 > > > Key: SPARK-40996 > URL: https://issues.apache.org/jira/browse/SPARK-40996 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > This is a precondition for upgrading sbt 1.7.3 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40996) Upgrade `sbt-checkstyle-plugin` to 4.0.0
[ https://issues.apache.org/jira/browse/SPARK-40996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40996: Assignee: (was: Apache Spark) > Upgrade `sbt-checkstyle-plugin` to 4.0.0 > > > Key: SPARK-40996 > URL: https://issues.apache.org/jira/browse/SPARK-40996 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > This is a precondition for upgrading sbt 1.7.3 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40749) Migrate type check failures of generators onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40749: Assignee: (was: Apache Spark) > Migrate type check failures of generators onto error classes > > > Key: SPARK-40749 > URL: https://issues.apache.org/jira/browse/SPARK-40749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in the generator > expressions: > 1. Stack (3): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L163-L170 > 2. ExplodeBase (1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L299 > 3. Inline (1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L441 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40996) Upgrade `sbt-checkstyle-plugin` to 4.0.0
Yang Jie created SPARK-40996: Summary: Upgrade `sbt-checkstyle-plugin` to 4.0.0 Key: SPARK-40996 URL: https://issues.apache.org/jira/browse/SPARK-40996 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie This is a precondition for upgrading sbt 1.7.3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40374) Migrate type check failures of type creators onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40374. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38463 [https://github.com/apache/spark/pull/38463] > Migrate type check failures of type creators onto error classes > --- > > Key: SPARK-40374 > URL: https://issues.apache.org/jira/browse/SPARK-40374 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: BingKun Pan >Priority: Major > Fix For: 3.4.0 > > > Replace TypeCheckFailure by DataTypeMismatch in type checks in the complex > type creator expressions: > 1. CreateMap(3): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L205-L214 > 2. CreateNamedStruct(3): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L445-L457 > 3. UpdateFields(2): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L670-L673 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40374) Migrate type check failures of type creators onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-40374: Assignee: BingKun Pan > Migrate type check failures of type creators onto error classes > --- > > Key: SPARK-40374 > URL: https://issues.apache.org/jira/browse/SPARK-40374 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: BingKun Pan >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in the complex > type creator expressions: > 1. CreateMap(3): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L205-L214 > 2. CreateNamedStruct(3): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L445-L457 > 3. UpdateFields(2): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L670-L673 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40995) Developer Documentation for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40995: Assignee: (was: Apache Spark) > Developer Documentation for Spark Connect > - > > Key: SPARK-40995 > URL: https://issues.apache.org/jira/browse/SPARK-40995 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > > Move the existing minimal doc into the right top level connect readme and add > new docs folder. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40248) Use larger number of bits to build bloom filter
[ https://issues.apache.org/jira/browse/SPARK-40248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-40248. - Fix Version/s: 3.4.0 Assignee: Yuming Wang Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/37697 > Use larger number of bits to build bloom filter > > > Key: SPARK-40248 > URL: https://issues.apache.org/jira/browse/SPARK-40248 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40995) Developer Documentation for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627594#comment-17627594 ] Apache Spark commented on SPARK-40995: -- User 'grundprinzip' has created a pull request for this issue: https://github.com/apache/spark/pull/38470 > Developer Documentation for Spark Connect > - > > Key: SPARK-40995 > URL: https://issues.apache.org/jira/browse/SPARK-40995 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > > Move the existing minimal doc into the right top level connect readme and add > new docs folder. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40995) Developer Documentation for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627593#comment-17627593 ] Apache Spark commented on SPARK-40995: -- User 'grundprinzip' has created a pull request for this issue: https://github.com/apache/spark/pull/38470 > Developer Documentation for Spark Connect > - > > Key: SPARK-40995 > URL: https://issues.apache.org/jira/browse/SPARK-40995 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > > Move the existing minimal doc into the right top level connect readme and add > new docs folder. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40995) Developer Documentation for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-40995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40995: Assignee: Apache Spark > Developer Documentation for Spark Connect > - > > Key: SPARK-40995 > URL: https://issues.apache.org/jira/browse/SPARK-40995 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Apache Spark >Priority: Major > > Move the existing minimal doc into the right top level connect readme and add > new docs folder. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40995) Developer Documentation for Spark Connect
Martin Grund created SPARK-40995: Summary: Developer Documentation for Spark Connect Key: SPARK-40995 URL: https://issues.apache.org/jira/browse/SPARK-40995 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Move the existing minimal doc into the right top level connect readme and add new docs folder. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40994) Add code example for JDBC data source with partitionColumn
[ https://issues.apache.org/jira/browse/SPARK-40994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40994: Assignee: Apache Spark > Add code example for JDBC data source with partitionColumn > -- > > Key: SPARK-40994 > URL: https://issues.apache.org/jira/browse/SPARK-40994 > Project: Spark > Issue Type: Documentation > Components: Documentation, SQL >Affects Versions: 3.4.0 >Reporter: Cheng Su >Assignee: Apache Spark >Priority: Minor > > We should add code example for JDBC data source with partitionColumn in our > documentation - > [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html,] to better > guide users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40994) Add code example for JDBC data source with partitionColumn
[ https://issues.apache.org/jira/browse/SPARK-40994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40994: Assignee: (was: Apache Spark) > Add code example for JDBC data source with partitionColumn > -- > > Key: SPARK-40994 > URL: https://issues.apache.org/jira/browse/SPARK-40994 > Project: Spark > Issue Type: Documentation > Components: Documentation, SQL >Affects Versions: 3.4.0 >Reporter: Cheng Su >Priority: Minor > > We should add code example for JDBC data source with partitionColumn in our > documentation - > [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html,] to better > guide users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40994) Add code example for JDBC data source with partitionColumn
[ https://issues.apache.org/jira/browse/SPARK-40994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627560#comment-17627560 ] Apache Spark commented on SPARK-40994: -- User 'c21' has created a pull request for this issue: https://github.com/apache/spark/pull/38480 > Add code example for JDBC data source with partitionColumn > -- > > Key: SPARK-40994 > URL: https://issues.apache.org/jira/browse/SPARK-40994 > Project: Spark > Issue Type: Documentation > Components: Documentation, SQL >Affects Versions: 3.4.0 >Reporter: Cheng Su >Priority: Minor > > We should add code example for JDBC data source with partitionColumn in our > documentation - > [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html,] to better > guide users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40994) Add code example for JDBC data source with partitionColumn
Cheng Su created SPARK-40994: Summary: Add code example for JDBC data source with partitionColumn Key: SPARK-40994 URL: https://issues.apache.org/jira/browse/SPARK-40994 Project: Spark Issue Type: Documentation Components: Documentation, SQL Affects Versions: 3.4.0 Reporter: Cheng Su We should add code example for JDBC data source with partitionColumn in our documentation - [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html,] to better guide users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39399) proxy-user not working for Spark on k8s in cluster deploy mode
[ https://issues.apache.org/jira/browse/SPARK-39399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627528#comment-17627528 ] JiangHua Zhu commented on SPARK-39399: -- It looks like HIVE_DELEGATION_TOKEN is not loaded and populated to Token#tokenKindMap. Here are some sources of reference: !screenshot-1.png! We should first check the dependencies related to hive. [~unamesk15] > proxy-user not working for Spark on k8s in cluster deploy mode > -- > > Key: SPARK-39399 > URL: https://issues.apache.org/jira/browse/SPARK-39399 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 3.2.0 >Reporter: Shrikant Prasad >Priority: Major > Attachments: screenshot-1.png > > > As part of https://issues.apache.org/jira/browse/SPARK-25355 Proxy user > support was added for Spark on K8s. But the PR only added proxy user argument > on the spark-submit command. The actual functionality of authentication using > the proxy user is not working in case of cluster deploy mode. > We get AccessControlException when trying to access the kerberized HDFS > through a proxy user. > Spark-Submit: > $SPARK_HOME/bin/spark-submit \ > --master \ > --deploy-mode cluster \ > --name with_proxy_user_di \ > --proxy-user \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.kubernetes.container.image= \ > --conf spark.kubernetes.driver.limit.cores=1 \ > --conf spark.executor.instances=1 \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ > --conf spark.kubernetes.namespace= \ > --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ > --conf spark.eventLog.enabled=true \ > --conf spark.eventLog.dir=hdfs:///scaas/shs_logs \ > --conf spark.kubernetes.file.upload.path=hdfs:///tmp \ > --conf spark.kubernetes.container.image.pullPolicy=Always \ > $SPARK_HOME/examples/jars/spark-examples_2.12-3.2.0-1.jar > Driver Logs: > {code:java} > ++ id -u > + myuid=185 > ++ id -g > + mygid=0 > + set +e > ++ getent passwd 185 > + uidentry= > + set -e > + '[' -z '' ']' > + '[' -w /etc/passwd ']' > + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' -z ']' > + '[' -z ']' > + '[' -n '' ']' > + '[' -z x ']' > + SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*' > + '[' -z x ']' > + SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*' > + case "$1" in > + shift 1 > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress= --deploy-mode client --proxy-user proxy_user > --properties-file /opt/spark/conf/spark.properties --class > org.apache.spark.examples.SparkPi spark-internal > WARNING: An illegal reflective access operation has occurred > WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform > (file:/opt/spark/jars/spark-unsafe_2.12-3.2.0-1.jar) to constructor > java.nio.DirectByteBuffer(long,int) > WARNING: Please consider reporting this to the maintainers of > org.apache.spark.unsafe.Platform > WARNING: Use --illegal-access=warn to enable warnings of further illegal > reflective access operations > WARNING: All illegal access operations will be denied in a future release > 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", > sampleName="Ops", always=false, type=DEFAULT, value={"Rate of successful > kerberos logins and latency (milliseconds)"}, valueName="Time") > 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", > sampleName="Ops", always=false, type=DEFAULT, value={"Rate of failed kerberos > logins and latency (milliseconds)"}, valueName="Time") > 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", > sampleName="Ops", always=false, type=DEFAULT, value={"GetGroups"}, > valueName="Time") > 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field private > org.apache.hadoop.metrics2.lib.MutableGaugeLong > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.r
[jira] [Updated] (SPARK-39399) proxy-user not working for Spark on k8s in cluster deploy mode
[ https://issues.apache.org/jira/browse/SPARK-39399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangHua Zhu updated SPARK-39399: - Attachment: screenshot-1.png > proxy-user not working for Spark on k8s in cluster deploy mode > -- > > Key: SPARK-39399 > URL: https://issues.apache.org/jira/browse/SPARK-39399 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 3.2.0 >Reporter: Shrikant Prasad >Priority: Major > Attachments: screenshot-1.png > > > As part of https://issues.apache.org/jira/browse/SPARK-25355 Proxy user > support was added for Spark on K8s. But the PR only added proxy user argument > on the spark-submit command. The actual functionality of authentication using > the proxy user is not working in case of cluster deploy mode. > We get AccessControlException when trying to access the kerberized HDFS > through a proxy user. > Spark-Submit: > $SPARK_HOME/bin/spark-submit \ > --master \ > --deploy-mode cluster \ > --name with_proxy_user_di \ > --proxy-user \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.kubernetes.container.image= \ > --conf spark.kubernetes.driver.limit.cores=1 \ > --conf spark.executor.instances=1 \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ > --conf spark.kubernetes.namespace= \ > --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ > --conf spark.eventLog.enabled=true \ > --conf spark.eventLog.dir=hdfs:///scaas/shs_logs \ > --conf spark.kubernetes.file.upload.path=hdfs:///tmp \ > --conf spark.kubernetes.container.image.pullPolicy=Always \ > $SPARK_HOME/examples/jars/spark-examples_2.12-3.2.0-1.jar > Driver Logs: > {code:java} > ++ id -u > + myuid=185 > ++ id -g > + mygid=0 > + set +e > ++ getent passwd 185 > + uidentry= > + set -e > + '[' -z '' ']' > + '[' -w /etc/passwd ']' > + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' -z ']' > + '[' -z ']' > + '[' -n '' ']' > + '[' -z x ']' > + SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*' > + '[' -z x ']' > + SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*' > + case "$1" in > + shift 1 > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress= --deploy-mode client --proxy-user proxy_user > --properties-file /opt/spark/conf/spark.properties --class > org.apache.spark.examples.SparkPi spark-internal > WARNING: An illegal reflective access operation has occurred > WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform > (file:/opt/spark/jars/spark-unsafe_2.12-3.2.0-1.jar) to constructor > java.nio.DirectByteBuffer(long,int) > WARNING: Please consider reporting this to the maintainers of > org.apache.spark.unsafe.Platform > WARNING: Use --illegal-access=warn to enable warnings of further illegal > reflective access operations > WARNING: All illegal access operations will be denied in a future release > 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", > sampleName="Ops", always=false, type=DEFAULT, value={"Rate of successful > kerberos logins and latency (milliseconds)"}, valueName="Time") > 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", > sampleName="Ops", always=false, type=DEFAULT, value={"Rate of failed kerberos > logins and latency (milliseconds)"}, valueName="Time") > 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", > sampleName="Ops", always=false, type=DEFAULT, value={"GetGroups"}, > valueName="Time") > 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field private > org.apache.hadoop.metrics2.lib.MutableGaugeLong > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal > with annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", > sampleName="Ops", always=false, type=DEFAULT, value={"Renewal failures since > startup"}, valueName="Time") > 22/04/26 08:54:38 DEBUG Mutable
[jira] [Commented] (SPARK-40697) Add read-side char/varchar handling to cover external data files
[ https://issues.apache.org/jira/browse/SPARK-40697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627526#comment-17627526 ] Apache Spark commented on SPARK-40697: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/38479 > Add read-side char/varchar handling to cover external data files > > > Key: SPARK-40697 > URL: https://issues.apache.org/jira/browse/SPARK-40697 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40697) Add read-side char/varchar handling to cover external data files
[ https://issues.apache.org/jira/browse/SPARK-40697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627527#comment-17627527 ] Apache Spark commented on SPARK-40697: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/38479 > Add read-side char/varchar handling to cover external data files > > > Key: SPARK-40697 > URL: https://issues.apache.org/jira/browse/SPARK-40697 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40993) Migrate markdown style README to python/docs/development/testing.rst
[ https://issues.apache.org/jira/browse/SPARK-40993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627502#comment-17627502 ] Vivek Garg commented on SPARK-40993: Hii, I think you got the answer. [Salesforce Marketing Cloud Certification|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/] > Migrate markdown style README to python/docs/development/testing.rst > > > Key: SPARK-40993 > URL: https://issues.apache.org/jira/browse/SPARK-40993 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org