[jira] [Commented] (SPARK-41835) Implement `transform_keys` function
[ https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653731#comment-17653731 ] Sandeep Singh commented on SPARK-41835: --- My bad, error is about expected input types. > Implement `transform_keys` function > --- > > Key: SPARK-41835 > URL: https://issues.apache.org/jira/browse/SPARK-41835 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1611, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.select(transform_keys( > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve > "transform_keys(data, lambdafunction(upper(x_11), x_11, y_12))" due to data > type mismatch: Parameter 1 requires the "MAP" type, however "data" has the > type "STRUCT". > Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda > 'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496] > +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493] > +- LocalRelation [0#4488L, 1#4489] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41843) Implement SparkSession.udf
[ https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41843: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 2331, in pyspark.sql.connect.functions.call_udf Failed example: _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType()) Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType()) AttributeError: 'SparkSession' object has no attribute 'udf'{code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1966, in pyspark.sql.connect.functions.hour Failed example: df.select(hour('ts').alias('hour')).collect() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.select(hour('ts').alias('hour')).collect() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1017, in collect pdf = self.toPandas() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1031, in toPandas return self._session.client.to_pandas(query) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 413, in to_pandas return self._execute_and_fetch(req) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 573, in _execute_and_fetch self._handle_error(rpc_error) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 623, in _handle_error raise SparkConnectException(status.message, info.reason) from None pyspark.sql.connect.client.SparkConnectException: (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: Timestamp(NANOSECOND, null){code} > Implement SparkSession.udf > -- > > Key: SPARK-41843 > URL: https://issues.apache.org/jira/browse/SPARK-41843 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 2331, in pyspark.sql.connect.functions.call_udf > Failed example: > _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType()) > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType()) > AttributeError: 'SparkSession' object has no attribute 'udf'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41843) Implement SparkSession.udf
Sandeep Singh created SPARK-41843: - Summary: Implement SparkSession.udf Key: SPARK-41843 URL: https://issues.apache.org/jira/browse/SPARK-41843 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1966, in pyspark.sql.connect.functions.hour Failed example: df.select(hour('ts').alias('hour')).collect() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.select(hour('ts').alias('hour')).collect() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1017, in collect pdf = self.toPandas() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1031, in toPandas return self._session.client.to_pandas(query) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 413, in to_pandas return self._execute_and_fetch(req) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 573, in _execute_and_fetch self._handle_error(rpc_error) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 623, in _handle_error raise SparkConnectException(status.message, info.reason) from None pyspark.sql.connect.client.SparkConnectException: (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: Timestamp(NANOSECOND, null){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41842) Support data type Timestamp(NANOSECOND, null)
[ https://issues.apache.org/jira/browse/SPARK-41842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653728#comment-17653728 ] Sandeep Singh commented on SPARK-41842: --- Not sure about the EPIC for this one. > Support data type Timestamp(NANOSECOND, null) > - > > Key: SPARK-41842 > URL: https://issues.apache.org/jira/browse/SPARK-41842 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1966, in pyspark.sql.connect.functions.hour > Failed example: > df.select(hour('ts').alias('hour')).collect() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.select(hour('ts').alias('hour')).collect() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1017, in collect > pdf = self.toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: > Timestamp(NANOSECOND, null){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41842) Support data type Timestamp(NANOSECOND, null)
[ https://issues.apache.org/jira/browse/SPARK-41842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41842: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1966, in pyspark.sql.connect.functions.hour Failed example: df.select(hour('ts').alias('hour')).collect() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.select(hour('ts').alias('hour')).collect() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1017, in collect pdf = self.toPandas() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1031, in toPandas return self._session.client.to_pandas(query) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 413, in to_pandas return self._execute_and_fetch(req) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 573, in _execute_and_fetch self._handle_error(rpc_error) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 623, in _handle_error raise SparkConnectException(status.message, info.reason) from None pyspark.sql.connect.client.SparkConnectException: (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: Timestamp(NANOSECOND, null){code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty Failed example: df_empty = spark.createDataFrame([], 'a STRING') Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df_empty = spark.createDataFrame([], 'a STRING') File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 186, in createDataFrame raise ValueError("Input data cannot be empty") ValueError: Input data cannot be empty{code} > Support data type Timestamp(NANOSECOND, null) > - > > Key: SPARK-41842 > URL: https://issues.apache.org/jira/browse/SPARK-41842 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1966, in pyspark.sql.connect.functions.hour > Failed example: > df.select(hour('ts').alias('hour')).collect() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.select(hour('ts').alias('hour')).collect() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1017, in collect > pdf = self.toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: > Timestamp(NANOSECOND, null){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41842) Support data type Timestamp(NANOSECOND, null)
Sandeep Singh created SPARK-41842: - Summary: Support data type Timestamp(NANOSECOND, null) Key: SPARK-41842 URL: https://issues.apache.org/jira/browse/SPARK-41842 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty Failed example: df_empty = spark.createDataFrame([], 'a STRING') Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df_empty = spark.createDataFrame([], 'a STRING') File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 186, in createDataFrame raise ValueError("Input data cannot be empty") ValueError: Input data cannot be empty{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe
[ https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41656. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39346 [https://github.com/apache/spark/pull/39346] > Enable doctests in pyspark.sql.connect.dataframe > > > Key: SPARK-41656 > URL: https://issues.apache.org/jira/browse/SPARK-41656 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe
[ https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41656: Assignee: Sandeep Singh > Enable doctests in pyspark.sql.connect.dataframe > > > Key: SPARK-41656 > URL: https://issues.apache.org/jira/browse/SPARK-41656 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41835) Implement `transform_keys` function
[ https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653722#comment-17653722 ] Ruifeng Zheng commented on SPARK-41835: --- this function was added > Implement `transform_keys` function > --- > > Key: SPARK-41835 > URL: https://issues.apache.org/jira/browse/SPARK-41835 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41841) Support PyPI packaging without JVM
Hyukjin Kwon created SPARK-41841: Summary: Support PyPI packaging without JVM Key: SPARK-41841 URL: https://issues.apache.org/jira/browse/SPARK-41841 Project: Spark Issue Type: Sub-task Components: Build, Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon We should support pip install pyspark without JVM so Spark Connect can be real lightweight library. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs
[ https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41804. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39349 [https://github.com/apache/spark/pull/39349] > InterpretedUnsafeProjection doesn't properly handle an array of UDTs > > > Key: SPARK-41804 > URL: https://issues.apache.org/jira/browse/SPARK-41804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Fix For: 3.4.0 > > > Reproduction steps: > {noformat} > // create a file of vector data > import org.apache.spark.ml.linalg.{DenseVector, Vector} > case class TestRow(varr: Array[Vector]) > val values = Array(0.1d, 0.2d, 0.3d) > val dv = new DenseVector(values).asInstanceOf[Vector] > val ds = Seq(TestRow(Array(dv, dv))).toDS > ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data") > // this works > spark.read.format("parquet").load("vector_data").collect > sql("set spark.sql.codegen.wholeStage=false") > sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > // this will get an error > spark.read.format("parquet").load("vector_data").collect > {noformat} > The error varies each time you run it, e.g.: > {noformat} > Sparse vectors require that the dimension of the indices match the dimension > of the values. > You provided 2 indices and 6619240 values. > {noformat} > or > {noformat} > org.apache.spark.SparkRuntimeException: Error while decoding: > java.lang.NegativeArraySizeException > {noformat} > or > {noformat} > java.lang.OutOfMemoryError: Java heap space > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414) > {noformat} > or > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build > 1.8.0_311-b11) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 > compressed oops) > # Problematic frame: > # V [libjvm.dylib+0xc9d30] acl_CopyRight+0x29 > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # //hs_err_pid64213.log > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs
[ https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41804: Assignee: Bruce Robbins > InterpretedUnsafeProjection doesn't properly handle an array of UDTs > > > Key: SPARK-41804 > URL: https://issues.apache.org/jira/browse/SPARK-41804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > > Reproduction steps: > {noformat} > // create a file of vector data > import org.apache.spark.ml.linalg.{DenseVector, Vector} > case class TestRow(varr: Array[Vector]) > val values = Array(0.1d, 0.2d, 0.3d) > val dv = new DenseVector(values).asInstanceOf[Vector] > val ds = Seq(TestRow(Array(dv, dv))).toDS > ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data") > // this works > spark.read.format("parquet").load("vector_data").collect > sql("set spark.sql.codegen.wholeStage=false") > sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > // this will get an error > spark.read.format("parquet").load("vector_data").collect > {noformat} > The error varies each time you run it, e.g.: > {noformat} > Sparse vectors require that the dimension of the indices match the dimension > of the values. > You provided 2 indices and 6619240 values. > {noformat} > or > {noformat} > org.apache.spark.SparkRuntimeException: Error while decoding: > java.lang.NegativeArraySizeException > {noformat} > or > {noformat} > java.lang.OutOfMemoryError: Java heap space > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414) > {noformat} > or > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build > 1.8.0_311-b11) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 > compressed oops) > # Problematic frame: > # V [libjvm.dylib+0xc9d30] acl_CopyRight+0x29 > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # //hs_err_pid64213.log > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41653) Test parity: enable doctests in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41653: Assignee: Sandeep Singh (was: Hyukjin Kwon) > Test parity: enable doctests in Spark Connect > - > > Key: SPARK-41653 > URL: https://issues.apache.org/jira/browse/SPARK-41653 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Sandeep Singh >Priority: Major > > We should actually run the doctests of Spark Connect. > We should add something like > https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247 > to Spark Connect modules, and add the module into > https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41654) Enable doctests in pyspark.sql.connect.window
[ https://issues.apache.org/jira/browse/SPARK-41654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41654: Assignee: Sandeep Singh (was: Hyukjin Kwon) > Enable doctests in pyspark.sql.connect.window > - > > Key: SPARK-41654 > URL: https://issues.apache.org/jira/browse/SPARK-41654 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41655) Enable doctests in pyspark.sql.connect.column
[ https://issues.apache.org/jira/browse/SPARK-41655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41655: Assignee: Sandeep Singh (was: Hyukjin Kwon) > Enable doctests in pyspark.sql.connect.column > - > > Key: SPARK-41655 > URL: https://issues.apache.org/jira/browse/SPARK-41655 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41659) Enable doctests in pyspark.sql.connect.readwriter
[ https://issues.apache.org/jira/browse/SPARK-41659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41659: Assignee: Sandeep Singh (was: Hyukjin Kwon) > Enable doctests in pyspark.sql.connect.readwriter > - > > Key: SPARK-41659 > URL: https://issues.apache.org/jira/browse/SPARK-41659 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41803) log() function variations are missing
[ https://issues.apache.org/jira/browse/SPARK-41803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41803. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39339 [https://github.com/apache/spark/pull/39339] > log() function variations are missing > - > > Key: SPARK-41803 > URL: https://issues.apache.org/jira/browse/SPARK-41803 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41803) log() function variations are missing
[ https://issues.apache.org/jira/browse/SPARK-41803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41803: Assignee: Ruifeng Zheng (was: Martin Grund) > log() function variations are missing > - > > Key: SPARK-41803 > URL: https://issues.apache.org/jira/browse/SPARK-41803 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41803) log() function variations are missing
[ https://issues.apache.org/jira/browse/SPARK-41803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41803: Assignee: Martin Grund > log() function variations are missing > - > > Key: SPARK-41803 > URL: https://issues.apache.org/jira/browse/SPARK-41803 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41836) Implement `transform_values` function
[ https://issues.apache.org/jira/browse/SPARK-41836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653714#comment-17653714 ] Hyukjin Kwon commented on SPARK-41836: -- test output? > Implement `transform_values` function > - > > Key: SPARK-41836 > URL: https://issues.apache.org/jira/browse/SPARK-41836 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41839) Implement SparkSession.sparkContext
[ https://issues.apache.org/jira/browse/SPARK-41839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653715#comment-17653715 ] Hyukjin Kwon commented on SPARK-41839: -- test output? > Implement SparkSession.sparkContext > --- > > Key: SPARK-41839 > URL: https://issues.apache.org/jira/browse/SPARK-41839 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41835) Implement `transform_keys` function
[ https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653713#comment-17653713 ] Hyukjin Kwon commented on SPARK-41835: -- test output? > Implement `transform_keys` function > --- > > Key: SPARK-41835 > URL: https://issues.apache.org/jira/browse/SPARK-41835 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41835) Implement `transform_keys` function
[ https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41835: Assignee: (was: Ruifeng Zheng) > Implement `transform_keys` function > --- > > Key: SPARK-41835 > URL: https://issues.apache.org/jira/browse/SPARK-41835 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41828) Implement creating empty Dataframe
[ https://issues.apache.org/jira/browse/SPARK-41828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41828: - Epic Link: (was: SPARK-39375) > Implement creating empty Dataframe > -- > > Key: SPARK-41828 > URL: https://issues.apache.org/jira/browse/SPARK-41828 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty > Failed example: > df_empty = spark.createDataFrame([], 'a STRING') > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df_empty = spark.createDataFrame([], 'a STRING') > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 186, in createDataFrame > raise ValueError("Input data cannot be empty") > ValueError: Input data cannot be empty{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41828) Implement creating empty Dataframe
[ https://issues.apache.org/jira/browse/SPARK-41828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41828: - Parent: SPARK-41281 Issue Type: Sub-task (was: Bug) > Implement creating empty Dataframe > -- > > Key: SPARK-41828 > URL: https://issues.apache.org/jira/browse/SPARK-41828 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty > Failed example: > df_empty = spark.createDataFrame([], 'a STRING') > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df_empty = spark.createDataFrame([], 'a STRING') > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 186, in createDataFrame > raise ValueError("Input data cannot be empty") > ValueError: Input data cannot be empty{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41828) Implement creating empty Dataframe
[ https://issues.apache.org/jira/browse/SPARK-41828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41828: - Parent: (was: SPARK-41279) Issue Type: Bug (was: Sub-task) > Implement creating empty Dataframe > -- > > Key: SPARK-41828 > URL: https://issues.apache.org/jira/browse/SPARK-41828 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty > Failed example: > df_empty = spark.createDataFrame([], 'a STRING') > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df_empty = spark.createDataFrame([], 'a STRING') > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 186, in createDataFrame > raise ValueError("Input data cannot be empty") > ValueError: Input data cannot be empty{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41828) Implement creating empty Dataframe
[ https://issues.apache.org/jira/browse/SPARK-41828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41828: - Epic Link: SPARK-39375 > Implement creating empty Dataframe > -- > > Key: SPARK-41828 > URL: https://issues.apache.org/jira/browse/SPARK-41828 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty > Failed example: > df_empty = spark.createDataFrame([], 'a STRING') > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df_empty = spark.createDataFrame([], 'a STRING') > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 186, in createDataFrame > raise ValueError("Input data cannot be empty") > ValueError: Input data cannot be empty{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41819) Implement Dataframe.rdd getNumPartitions
[ https://issues.apache.org/jira/browse/SPARK-41819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41819: - Epic Link: (was: SPARK-39375) > Implement Dataframe.rdd getNumPartitions > > > Key: SPARK-41819 > URL: https://issues.apache.org/jira/browse/SPARK-41819 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce > Failed example: > df.coalesce(1).rdd.getNumPartitions() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df.coalesce(1).rdd.getNumPartitions() > AttributeError: 'function' object has no attribute > 'getNumPartitions'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41819) Implement Dataframe.rdd getNumPartitions
[ https://issues.apache.org/jira/browse/SPARK-41819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41819: - Parent: SPARK-41279 Issue Type: Sub-task (was: Bug) > Implement Dataframe.rdd getNumPartitions > > > Key: SPARK-41819 > URL: https://issues.apache.org/jira/browse/SPARK-41819 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce > Failed example: > df.coalesce(1).rdd.getNumPartitions() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df.coalesce(1).rdd.getNumPartitions() > AttributeError: 'function' object has no attribute > 'getNumPartitions'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41819) Implement Dataframe.rdd getNumPartitions
[ https://issues.apache.org/jira/browse/SPARK-41819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41819: - Parent: (was: SPARK-41281) Issue Type: Bug (was: Sub-task) > Implement Dataframe.rdd getNumPartitions > > > Key: SPARK-41819 > URL: https://issues.apache.org/jira/browse/SPARK-41819 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce > Failed example: > df.coalesce(1).rdd.getNumPartitions() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df.coalesce(1).rdd.getNumPartitions() > AttributeError: 'function' object has no attribute > 'getNumPartitions'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41659) Enable doctests in pyspark.sql.connect.readwriter
[ https://issues.apache.org/jira/browse/SPARK-41659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41659: Assignee: Hyukjin Kwon > Enable doctests in pyspark.sql.connect.readwriter > - > > Key: SPARK-41659 > URL: https://issues.apache.org/jira/browse/SPARK-41659 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41819) Implement Dataframe.rdd getNumPartitions
[ https://issues.apache.org/jira/browse/SPARK-41819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41819: - Epic Link: SPARK-39375 > Implement Dataframe.rdd getNumPartitions > > > Key: SPARK-41819 > URL: https://issues.apache.org/jira/browse/SPARK-41819 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce > Failed example: > df.coalesce(1).rdd.getNumPartitions() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df.coalesce(1).rdd.getNumPartitions() > AttributeError: 'function' object has no attribute > 'getNumPartitions'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41659) Enable doctests in pyspark.sql.connect.readwriter
[ https://issues.apache.org/jira/browse/SPARK-41659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41659. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39331 [https://github.com/apache/spark/pull/39331] > Enable doctests in pyspark.sql.connect.readwriter > - > > Key: SPARK-41659 > URL: https://issues.apache.org/jira/browse/SPARK-41659 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41817) SparkSession.read support reading with schema
[ https://issues.apache.org/jira/browse/SPARK-41817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41817: - Parent: SPARK-41284 Issue Type: Sub-task (was: Bug) > SparkSession.read support reading with schema > - > > Key: SPARK-41817 > URL: https://issues.apache.org/jira/browse/SPARK-41817 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 122, in pyspark.sql.connect.readwriter.DataFrameReader.load > Failed example: > with tempfile.TemporaryDirectory() as d: > # Write a DataFrame into a CSV file with a header > df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}]) > df.write.option("header", > True).mode("overwrite").format("csv").save(d) > # Read the CSV file as a DataFrame with 'nullValue' option set to > 'Hyukjin Kwon', > # and 'header' option set to `True`. > df = spark.read.load( > d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", > header=True) > df.printSchema() > df.show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameReader.load[1]>", line 10, in > df.printSchema() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1039, in printSchema > print(self._tree_string()) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1035, in _tree_string > query = self._plan.to_proto(self._session.client) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line > 92, in to_proto > plan.root.CopyFrom(self.plan(session)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line > 245, in plan > plan.read.data_source.schema = self.schema > TypeError: bad argument type for built-in operation {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41817) SparkSession.read support reading with schema
[ https://issues.apache.org/jira/browse/SPARK-41817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41817: - Epic Link: (was: SPARK-39375) > SparkSession.read support reading with schema > - > > Key: SPARK-41817 > URL: https://issues.apache.org/jira/browse/SPARK-41817 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 122, in pyspark.sql.connect.readwriter.DataFrameReader.load > Failed example: > with tempfile.TemporaryDirectory() as d: > # Write a DataFrame into a CSV file with a header > df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}]) > df.write.option("header", > True).mode("overwrite").format("csv").save(d) > # Read the CSV file as a DataFrame with 'nullValue' option set to > 'Hyukjin Kwon', > # and 'header' option set to `True`. > df = spark.read.load( > d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", > header=True) > df.printSchema() > df.show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameReader.load[1]>", line 10, in > df.printSchema() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1039, in printSchema > print(self._tree_string()) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1035, in _tree_string > query = self._plan.to_proto(self._session.client) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line > 92, in to_proto > plan.root.CopyFrom(self.plan(session)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line > 245, in plan > plan.read.data_source.schema = self.schema > TypeError: bad argument type for built-in operation {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41818) Support DataFrameWriter.saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41818: - Parent: SPARK-41284 Issue Type: Sub-task (was: Bug) > Support DataFrameWriter.saveAsTable > --- > > Key: SPARK-41818 > URL: https://issues.apache.org/jira/browse/SPARK-41818 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto > Failed example: > df.write.saveAsTable("tblA") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in > > df.write.saveAsTable("tblA") > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 350, in saveAsTable > > self._spark.client.execute_command(self._write.command(self._spark.client)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 459, in execute_command > self._execute(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 547, in _execute > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (java.lang.ClassNotFoundException) .DefaultSource{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41818) Support DataFrameWriter.saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41818: - Epic Link: (was: SPARK-39375) > Support DataFrameWriter.saveAsTable > --- > > Key: SPARK-41818 > URL: https://issues.apache.org/jira/browse/SPARK-41818 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto > Failed example: > df.write.saveAsTable("tblA") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in > > df.write.saveAsTable("tblA") > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 350, in saveAsTable > > self._spark.client.execute_command(self._write.command(self._spark.client)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 459, in execute_command > self._execute(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 547, in _execute > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (java.lang.ClassNotFoundException) .DefaultSource{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41817) SparkSession.read support reading with schema
[ https://issues.apache.org/jira/browse/SPARK-41817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41817: - Epic Link: SPARK-39375 > SparkSession.read support reading with schema > - > > Key: SPARK-41817 > URL: https://issues.apache.org/jira/browse/SPARK-41817 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 122, in pyspark.sql.connect.readwriter.DataFrameReader.load > Failed example: > with tempfile.TemporaryDirectory() as d: > # Write a DataFrame into a CSV file with a header > df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}]) > df.write.option("header", > True).mode("overwrite").format("csv").save(d) > # Read the CSV file as a DataFrame with 'nullValue' option set to > 'Hyukjin Kwon', > # and 'header' option set to `True`. > df = spark.read.load( > d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", > header=True) > df.printSchema() > df.show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameReader.load[1]>", line 10, in > df.printSchema() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1039, in printSchema > print(self._tree_string()) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1035, in _tree_string > query = self._plan.to_proto(self._session.client) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line > 92, in to_proto > plan.root.CopyFrom(self.plan(session)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line > 245, in plan > plan.read.data_source.schema = self.schema > TypeError: bad argument type for built-in operation {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41817) SparkSession.read support reading with schema
[ https://issues.apache.org/jira/browse/SPARK-41817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41817: - Parent: (was: SPARK-41281) Issue Type: Bug (was: Sub-task) > SparkSession.read support reading with schema > - > > Key: SPARK-41817 > URL: https://issues.apache.org/jira/browse/SPARK-41817 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 122, in pyspark.sql.connect.readwriter.DataFrameReader.load > Failed example: > with tempfile.TemporaryDirectory() as d: > # Write a DataFrame into a CSV file with a header > df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}]) > df.write.option("header", > True).mode("overwrite").format("csv").save(d) > # Read the CSV file as a DataFrame with 'nullValue' option set to > 'Hyukjin Kwon', > # and 'header' option set to `True`. > df = spark.read.load( > d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", > header=True) > df.printSchema() > df.show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameReader.load[1]>", line 10, in > df.printSchema() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1039, in printSchema > print(self._tree_string()) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1035, in _tree_string > query = self._plan.to_proto(self._session.client) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line > 92, in to_proto > plan.root.CopyFrom(self.plan(session)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line > 245, in plan > plan.read.data_source.schema = self.schema > TypeError: bad argument type for built-in operation {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41818) Support DataFrameWriter.saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41818: - Parent: (was: SPARK-41281) Issue Type: Bug (was: Sub-task) > Support DataFrameWriter.saveAsTable > --- > > Key: SPARK-41818 > URL: https://issues.apache.org/jira/browse/SPARK-41818 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto > Failed example: > df.write.saveAsTable("tblA") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in > > df.write.saveAsTable("tblA") > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 350, in saveAsTable > > self._spark.client.execute_command(self._write.command(self._spark.client)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 459, in execute_command > self._execute(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 547, in _execute > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (java.lang.ClassNotFoundException) .DefaultSource{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41818) Support DataFrameWriter.saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41818: - Epic Link: SPARK-39375 > Support DataFrameWriter.saveAsTable > --- > > Key: SPARK-41818 > URL: https://issues.apache.org/jira/browse/SPARK-41818 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto > Failed example: > df.write.saveAsTable("tblA") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in > > df.write.saveAsTable("tblA") > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 350, in saveAsTable > > self._spark.client.execute_command(self._write.command(self._spark.client)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 459, in execute_command > self._execute(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 547, in _execute > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (java.lang.ClassNotFoundException) .DefaultSource{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41818) Support DataFrameWriter.saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41818. -- Resolution: Fixed > Support DataFrameWriter.saveAsTable > --- > > Key: SPARK-41818 > URL: https://issues.apache.org/jira/browse/SPARK-41818 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto > Failed example: > df.write.saveAsTable("tblA") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in > > df.write.saveAsTable("tblA") > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 350, in saveAsTable > > self._spark.client.execute_command(self._write.command(self._spark.client)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 459, in execute_command > self._execute(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 547, in _execute > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (java.lang.ClassNotFoundException) .DefaultSource{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-41818) Support DataFrameWriter.saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-41818: -- > Support DataFrameWriter.saveAsTable > --- > > Key: SPARK-41818 > URL: https://issues.apache.org/jira/browse/SPARK-41818 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto > Failed example: > df.write.saveAsTable("tblA") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in > > df.write.saveAsTable("tblA") > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 350, in saveAsTable > > self._spark.client.execute_command(self._write.command(self._spark.client)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 459, in execute_command > self._execute(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 547, in _execute > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (java.lang.ClassNotFoundException) .DefaultSource{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs
[ https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41804: Assignee: Apache Spark > InterpretedUnsafeProjection doesn't properly handle an array of UDTs > > > Key: SPARK-41804 > URL: https://issues.apache.org/jira/browse/SPARK-41804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bruce Robbins >Assignee: Apache Spark >Priority: Major > > Reproduction steps: > {noformat} > // create a file of vector data > import org.apache.spark.ml.linalg.{DenseVector, Vector} > case class TestRow(varr: Array[Vector]) > val values = Array(0.1d, 0.2d, 0.3d) > val dv = new DenseVector(values).asInstanceOf[Vector] > val ds = Seq(TestRow(Array(dv, dv))).toDS > ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data") > // this works > spark.read.format("parquet").load("vector_data").collect > sql("set spark.sql.codegen.wholeStage=false") > sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > // this will get an error > spark.read.format("parquet").load("vector_data").collect > {noformat} > The error varies each time you run it, e.g.: > {noformat} > Sparse vectors require that the dimension of the indices match the dimension > of the values. > You provided 2 indices and 6619240 values. > {noformat} > or > {noformat} > org.apache.spark.SparkRuntimeException: Error while decoding: > java.lang.NegativeArraySizeException > {noformat} > or > {noformat} > java.lang.OutOfMemoryError: Java heap space > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414) > {noformat} > or > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build > 1.8.0_311-b11) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 > compressed oops) > # Problematic frame: > # V [libjvm.dylib+0xc9d30] acl_CopyRight+0x29 > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # //hs_err_pid64213.log > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs
[ https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41804: Assignee: (was: Apache Spark) > InterpretedUnsafeProjection doesn't properly handle an array of UDTs > > > Key: SPARK-41804 > URL: https://issues.apache.org/jira/browse/SPARK-41804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bruce Robbins >Priority: Major > > Reproduction steps: > {noformat} > // create a file of vector data > import org.apache.spark.ml.linalg.{DenseVector, Vector} > case class TestRow(varr: Array[Vector]) > val values = Array(0.1d, 0.2d, 0.3d) > val dv = new DenseVector(values).asInstanceOf[Vector] > val ds = Seq(TestRow(Array(dv, dv))).toDS > ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data") > // this works > spark.read.format("parquet").load("vector_data").collect > sql("set spark.sql.codegen.wholeStage=false") > sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > // this will get an error > spark.read.format("parquet").load("vector_data").collect > {noformat} > The error varies each time you run it, e.g.: > {noformat} > Sparse vectors require that the dimension of the indices match the dimension > of the values. > You provided 2 indices and 6619240 values. > {noformat} > or > {noformat} > org.apache.spark.SparkRuntimeException: Error while decoding: > java.lang.NegativeArraySizeException > {noformat} > or > {noformat} > java.lang.OutOfMemoryError: Java heap space > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414) > {noformat} > or > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build > 1.8.0_311-b11) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 > compressed oops) > # Problematic frame: > # V [libjvm.dylib+0xc9d30] acl_CopyRight+0x29 > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # //hs_err_pid64213.log > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs
[ https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653709#comment-17653709 ] Apache Spark commented on SPARK-41804: -- User 'bersprockets' has created a pull request for this issue: https://github.com/apache/spark/pull/39349 > InterpretedUnsafeProjection doesn't properly handle an array of UDTs > > > Key: SPARK-41804 > URL: https://issues.apache.org/jira/browse/SPARK-41804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bruce Robbins >Priority: Major > > Reproduction steps: > {noformat} > // create a file of vector data > import org.apache.spark.ml.linalg.{DenseVector, Vector} > case class TestRow(varr: Array[Vector]) > val values = Array(0.1d, 0.2d, 0.3d) > val dv = new DenseVector(values).asInstanceOf[Vector] > val ds = Seq(TestRow(Array(dv, dv))).toDS > ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data") > // this works > spark.read.format("parquet").load("vector_data").collect > sql("set spark.sql.codegen.wholeStage=false") > sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") > // this will get an error > spark.read.format("parquet").load("vector_data").collect > {noformat} > The error varies each time you run it, e.g.: > {noformat} > Sparse vectors require that the dimension of the indices match the dimension > of the values. > You provided 2 indices and 6619240 values. > {noformat} > or > {noformat} > org.apache.spark.SparkRuntimeException: Error while decoding: > java.lang.NegativeArraySizeException > {noformat} > or > {noformat} > java.lang.OutOfMemoryError: Java heap space > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414) > {noformat} > or > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build > 1.8.0_311-b11) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 > compressed oops) > # Problematic frame: > # V [libjvm.dylib+0xc9d30] acl_CopyRight+0x29 > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # //hs_err_pid64213.log > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x00011efa8890,0x00011efa8be8] = 856 > relocation [0x00011efa89b8,0x00011efa89f8] = 64 > main code [0x00011efa8a00,0x00011efa8be8] = 488 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41840) DataFrame.show(): 'Column' object is not callable
[ https://issues.apache.org/jira/browse/SPARK-41840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41840: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 855, in pyspark.sql.connect.functions.first Failed example: df.groupby("name").agg(first("age", ignorenulls=True)).orderBy("name").show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.groupby("name").agg(first("age", ignorenulls=True)).orderBy("name").show() TypeError: 'Column' object is not callable{code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1472, in pyspark.sql.connect.functions.posexplode_outer Failed example: df.select("id", "a_map", posexplode_outer("an_array")).show() Expected: +---+--+++ | id| a_map| pos| col| +---+--+++ | 1|{x -> 1.0}| 0| foo| | 1|{x -> 1.0}| 1| bar| | 2| {}|null|null| | 3| null|null|null| +---+--+++ Got: +---+--+++ | id| a_map| pos| col| +---+--+++ | 1| {1.0}| 0| foo| | 1| {1.0}| 1| bar| | 2|{null}|null|null| | 3| null|null|null| +---+--+++ {code} > DataFrame.show(): 'Column' object is not callable > - > > Key: SPARK-41840 > URL: https://issues.apache.org/jira/browse/SPARK-41840 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 855, in pyspark.sql.connect.functions.first > Failed example: > df.groupby("name").agg(first("age", > ignorenulls=True)).orderBy("name").show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.groupby("name").agg(first("age", > ignorenulls=True)).orderBy("name").show() > TypeError: 'Column' object is not callable{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41840) DataFrame.show(): 'Column' object is not callable
Sandeep Singh created SPARK-41840: - Summary: DataFrame.show(): 'Column' object is not callable Key: SPARK-41840 URL: https://issues.apache.org/jira/browse/SPARK-41840 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1472, in pyspark.sql.connect.functions.posexplode_outer Failed example: df.select("id", "a_map", posexplode_outer("an_array")).show() Expected: +---+--+++ | id| a_map| pos| col| +---+--+++ | 1|{x -> 1.0}| 0| foo| | 1|{x -> 1.0}| 1| bar| | 2| {}|null|null| | 3| null|null|null| +---+--+++ Got: +---+--+++ | id| a_map| pos| col| +---+--+++ | 1| {1.0}| 0| foo| | 1| {1.0}| 1| bar| | 2|{null}|null|null| | 3| null|null|null| +---+--+++ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41839) Implement SparkSession.sparkContext
Sandeep Singh created SPARK-41839: - Summary: Implement SparkSession.sparkContext Key: SPARK-41839 URL: https://issues.apache.org/jira/browse/SPARK-41839 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 2119, in pyspark.sql.connect.functions.unix_timestamp Failed example: spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") AttributeError: 'SparkSession' object has no attribute 'conf'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41839) Implement SparkSession.sparkContext
[ https://issues.apache.org/jira/browse/SPARK-41839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41839: -- Description: (was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 2119, in pyspark.sql.connect.functions.unix_timestamp Failed example: spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") AttributeError: 'SparkSession' object has no attribute 'conf'{code}) > Implement SparkSession.sparkContext > --- > > Key: SPARK-41839 > URL: https://issues.apache.org/jira/browse/SPARK-41839 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41838) DataFrame.show() fix map printing
[ https://issues.apache.org/jira/browse/SPARK-41838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41838: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1472, in pyspark.sql.connect.functions.posexplode_outer Failed example: df.select("id", "a_map", posexplode_outer("an_array")).show() Expected: +---+--+++ | id| a_map| pos| col| +---+--+++ | 1|{x -> 1.0}| 0| foo| | 1|{x -> 1.0}| 1| bar| | 2| {}|null|null| | 3| null|null|null| +---+--+++ Got: +---+--+++ | id| a_map| pos| col| +---+--+++ | 1| {1.0}| 0| foo| | 1| {1.0}| 1| bar| | 2|{null}|null|null| | 3| null|null|null| +---+--+++ {code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1594, in pyspark.sql.connect.functions.to_json Failed example: df = spark.createDataFrame(data, ("key", "value")) Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df = spark.createDataFrame(data, ("key", "value")) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 252, in createDataFrame table = pa.Table.from_pandas(pdf) File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, in dataframe_to_arrays arrays = [convert_column(c, f) File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, in arrays = [convert_column(c, f) File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 598, in convert_column raise e File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 592, in convert_column result = pa.array(col, type=type_, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 316, in pyarrow.lib.array File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ("Could not convert 'Alice' with type str: tried to convert to int64", 'Conversion failed for column 1 with type object'){code} > DataFrame.show() fix map printing > - > > Key: SPARK-41838 > URL: https://issues.apache.org/jira/browse/SPARK-41838 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1472, in pyspark.sql.connect.functions.posexplode_outer > Failed example: > df.select("id", "a_map", posexplode_outer("an_array")).show() > Expected: > +---+--+++ > | id| a_map| pos| col| > +---+--+++ > | 1|{x -> 1.0}| 0| foo| > | 1|{x -> 1.0}| 1| bar| > | 2| {}|null|null| > | 3| null|null|null| > +---+--+++ > Got: > +---+--+++ > | id| a_map| pos| col| > +---+--+++ > | 1| {1.0}| 0| foo| > | 1| {1.0}| 1| bar| > | 2|{null}|null|null| > | 3| null|null|null| > +---+--+++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41838) DataFrame.show() fix map printing
Sandeep Singh created SPARK-41838: - Summary: DataFrame.show() fix map printing Key: SPARK-41838 URL: https://issues.apache.org/jira/browse/SPARK-41838 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1594, in pyspark.sql.connect.functions.to_json Failed example: df = spark.createDataFrame(data, ("key", "value")) Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df = spark.createDataFrame(data, ("key", "value")) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 252, in createDataFrame table = pa.Table.from_pandas(pdf) File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, in dataframe_to_arrays arrays = [convert_column(c, f) File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, in arrays = [convert_column(c, f) File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 598, in convert_column raise e File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 592, in convert_column result = pa.array(col, type=type_, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 316, in pyarrow.lib.array File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ("Could not convert 'Alice' with type str: tried to convert to int64", 'Conversion failed for column 1 with type object'){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41837) DataFrame.createDataFrame datatype conversion error
Sandeep Singh created SPARK-41837: - Summary: DataFrame.createDataFrame datatype conversion error Key: SPARK-41837 URL: https://issues.apache.org/jira/browse/SPARK-41837 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1117, in pyspark.sql.connect.functions.array Failed example: df.select(array('age', 'age').alias("arr")).collect() Expected: [Row(arr=[2, 2]), Row(arr=[5, 5])] Got: [Row(arr=array([2, 2])), Row(arr=array([5, 5]))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1119, in pyspark.sql.connect.functions.array Failed example: df.select(array([df.age, df.age]).alias("arr")).collect() Expected: [Row(arr=[2, 2]), Row(arr=[5, 5])] Got: [Row(arr=array([2, 2])), Row(arr=array([5, 5]))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1124, in pyspark.sql.connect.functions.array_distinct Failed example: df.select(array_distinct(df.data)).collect() Expected: [Row(array_distinct(data)=[1, 2, 3]), Row(array_distinct(data)=[4, 5])] Got: [Row(array_distinct(data)=array([1, 2, 3])), Row(array_distinct(data)=array([4, 5]))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1135, in pyspark.sql.connect.functions.array_except Failed example: df.select(array_except(df.c1, df.c2)).collect() Expected: [Row(array_except(c1, c2)=['b'])] Got: [Row(array_except(c1, c2)=array(['b'], dtype=object))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1142, in pyspark.sql.connect.functions.array_intersect Failed example: df.select(array_intersect(df.c1, df.c2)).collect() Expected: [Row(array_intersect(c1, c2)=['a', 'c'])] Got: [Row(array_intersect(c1, c2)=array(['a', 'c'], dtype=object))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1180, in pyspark.sql.connect.functions.array_remove Failed example: df.select(array_remove(df.data, 1)).collect() Expected: [Row(array_remove(data, 1)=[2, 3]), Row(array_remove(data, 1)=[])] Got: [Row(array_remove(data, 1)=array([2, 3])), Row(array_remove(data, 1)=array([], dtype=int64))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1187, in pyspark.sql.connect.functions.array_repeat Failed example: df.select(array_repeat(df.data, 3).alias('r')).collect() Expected: [Row(r=['ab', 'ab', 'ab'])] Got: [Row(r=array(['ab', 'ab', 'ab'], dtype=object))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1204, in pyspark.sql.connect.functions.array_sort Failed example: df.select(array_sort(df.data).alias('r')).collect() Expected: [Row(r=[1, 2, 3, None]), Row(r=[1]), Row(r=[])] Got: [Row(r=array([ 1., 2., 3., nan])), Row(r=array([1])), Row(r=array([], dtype=int64))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1207, in pyspark.sql.connect.functions.array_sort Failed example: df.select(array_sort( "data", lambda x, y: when(x.isNull() | y.isNull(), lit(0)).otherwise(length(y) - length(x)) ).alias("r")).collect() Expected: [Row(r=['foobar', 'foo', None, 'bar']), Row(r=['foo']), Row(r=[])] Got: [Row(r=array(['foobar', 'foo', None, 'bar'], dtype=object)), Row(r=array(['foo'], dtype=object)), Row(r=array([], dtype=object))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1209, in pyspark.sql.connect.functions.array_union Failed example: df.select(array_union(df.c1, df.c2)).collect() Expected: [Row(array_union(c1, c2)=['b', 'a', 'c', 'd', 'f'])] Got: [Row(array_union(c1, c2)=array(['b', 'a', 'c', 'd', 'f'], dtype=object))]{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail:
[jira] [Updated] (SPARK-41837) DataFrame.createDataFrame datatype conversion error
[ https://issues.apache.org/jira/browse/SPARK-41837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41837: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1594, in pyspark.sql.connect.functions.to_json Failed example: df = spark.createDataFrame(data, ("key", "value")) Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df = spark.createDataFrame(data, ("key", "value")) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 252, in createDataFrame table = pa.Table.from_pandas(pdf) File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, in dataframe_to_arrays arrays = [convert_column(c, f) File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, in arrays = [convert_column(c, f) File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 598, in convert_column raise e File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 592, in convert_column result = pa.array(col, type=type_, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 316, in pyarrow.lib.array File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ("Could not convert 'Alice' with type str: tried to convert to int64", 'Conversion failed for column 1 with type object'){code} was: {code:java} ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1117, in pyspark.sql.connect.functions.array Failed example: df.select(array('age', 'age').alias("arr")).collect() Expected: [Row(arr=[2, 2]), Row(arr=[5, 5])] Got: [Row(arr=array([2, 2])), Row(arr=array([5, 5]))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1119, in pyspark.sql.connect.functions.array Failed example: df.select(array([df.age, df.age]).alias("arr")).collect() Expected: [Row(arr=[2, 2]), Row(arr=[5, 5])] Got: [Row(arr=array([2, 2])), Row(arr=array([5, 5]))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1124, in pyspark.sql.connect.functions.array_distinct Failed example: df.select(array_distinct(df.data)).collect() Expected: [Row(array_distinct(data)=[1, 2, 3]), Row(array_distinct(data)=[4, 5])] Got: [Row(array_distinct(data)=array([1, 2, 3])), Row(array_distinct(data)=array([4, 5]))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1135, in pyspark.sql.connect.functions.array_except Failed example: df.select(array_except(df.c1, df.c2)).collect() Expected: [Row(array_except(c1, c2)=['b'])] Got: [Row(array_except(c1, c2)=array(['b'], dtype=object))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1142, in pyspark.sql.connect.functions.array_intersect Failed example: df.select(array_intersect(df.c1, df.c2)).collect() Expected: [Row(array_intersect(c1, c2)=['a', 'c'])] Got: [Row(array_intersect(c1, c2)=array(['a', 'c'], dtype=object))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1180, in pyspark.sql.connect.functions.array_remove Failed example: df.select(array_remove(df.data, 1)).collect() Expected: [Row(array_remove(data, 1)=[2, 3]), Row(array_remove(data, 1)=[])] Got: [Row(array_remove(data, 1)=array([2, 3])), Row(array_remove(data, 1)=array([], dtype=int64))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1187, in pyspark.sql.connect.functions.array_repeat Failed example: df.select(array_repeat(df.data, 3).alias('r')).collect() Expected: [Row(r=['ab', 'ab', 'ab'])] Got: [Row(r=array(['ab', 'ab', 'ab'], dtype=object))] ** File
[jira] [Updated] (SPARK-41836) Implement `transform_values` function
[ https://issues.apache.org/jira/browse/SPARK-41836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41836: -- Summary: Implement `transform_values` function (was: CLONE - Implement `transform_values` function) > Implement `transform_values` function > - > > Key: SPARK-41836 > URL: https://issues.apache.org/jira/browse/SPARK-41836 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41836) CLONE - Implement `transform_values` function
Sandeep Singh created SPARK-41836: - Summary: CLONE - Implement `transform_values` function Key: SPARK-41836 URL: https://issues.apache.org/jira/browse/SPARK-41836 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Sandeep Singh Assignee: Ruifeng Zheng Fix For: 3.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41835) Implement `transform_keys` function
Sandeep Singh created SPARK-41835: - Summary: Implement `transform_keys` function Key: SPARK-41835 URL: https://issues.apache.org/jira/browse/SPARK-41835 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Sandeep Singh Assignee: Ruifeng Zheng Fix For: 3.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41311) Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space
[ https://issues.apache.org/jira/browse/SPARK-41311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653702#comment-17653702 ] Apache Spark commented on SPARK-41311: -- User 'ibuder' has created a pull request for this issue: https://github.com/apache/spark/pull/39348 > Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space > --- > > Key: SPARK-41311 > URL: https://issues.apache.org/jira/browse/SPARK-41311 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Immanuel Buder >Priority: Minor > > Rewrite the test for error class *RENAME_SRC_PATH_NOT_FOUND* in > [QueryExecutionErrorsSuite.scala|https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18be] > to trigger the error from user space. The current test uses non-user-facing > class FileSystemBasedCheckpointFileManager directly to trigger the error. > (see > [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR680] > ) > Done when: the test uses user-facing APIs as much as possible. > Proposed solution: rewrite the test following the example of > [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR641] > See [https://github.com/apache/spark/pull/38782#discussion_r1033013064] for > more context -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41311) Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space
[ https://issues.apache.org/jira/browse/SPARK-41311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41311: Assignee: Apache Spark > Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space > --- > > Key: SPARK-41311 > URL: https://issues.apache.org/jira/browse/SPARK-41311 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Immanuel Buder >Assignee: Apache Spark >Priority: Minor > > Rewrite the test for error class *RENAME_SRC_PATH_NOT_FOUND* in > [QueryExecutionErrorsSuite.scala|https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18be] > to trigger the error from user space. The current test uses non-user-facing > class FileSystemBasedCheckpointFileManager directly to trigger the error. > (see > [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR680] > ) > Done when: the test uses user-facing APIs as much as possible. > Proposed solution: rewrite the test following the example of > [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR641] > See [https://github.com/apache/spark/pull/38782#discussion_r1033013064] for > more context -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41311) Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space
[ https://issues.apache.org/jira/browse/SPARK-41311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41311: Assignee: (was: Apache Spark) > Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space > --- > > Key: SPARK-41311 > URL: https://issues.apache.org/jira/browse/SPARK-41311 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Immanuel Buder >Priority: Minor > > Rewrite the test for error class *RENAME_SRC_PATH_NOT_FOUND* in > [QueryExecutionErrorsSuite.scala|https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18be] > to trigger the error from user space. The current test uses non-user-facing > class FileSystemBasedCheckpointFileManager directly to trigger the error. > (see > [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR680] > ) > Done when: the test uses user-facing APIs as much as possible. > Proposed solution: rewrite the test following the example of > [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR641] > See [https://github.com/apache/spark/pull/38782#discussion_r1033013064] for > more context -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41311) Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space
[ https://issues.apache.org/jira/browse/SPARK-41311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653701#comment-17653701 ] Apache Spark commented on SPARK-41311: -- User 'ibuder' has created a pull request for this issue: https://github.com/apache/spark/pull/39348 > Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space > --- > > Key: SPARK-41311 > URL: https://issues.apache.org/jira/browse/SPARK-41311 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Immanuel Buder >Priority: Minor > > Rewrite the test for error class *RENAME_SRC_PATH_NOT_FOUND* in > [QueryExecutionErrorsSuite.scala|https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18be] > to trigger the error from user space. The current test uses non-user-facing > class FileSystemBasedCheckpointFileManager directly to trigger the error. > (see > [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR680] > ) > Done when: the test uses user-facing APIs as much as possible. > Proposed solution: rewrite the test following the example of > [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR641] > See [https://github.com/apache/spark/pull/38782#discussion_r1033013064] for > more context -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41311) Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space
[ https://issues.apache.org/jira/browse/SPARK-41311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653682#comment-17653682 ] Immanuel Buder commented on SPARK-41311: working on this > Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space > --- > > Key: SPARK-41311 > URL: https://issues.apache.org/jira/browse/SPARK-41311 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Immanuel Buder >Priority: Minor > > Rewrite the test for error class *RENAME_SRC_PATH_NOT_FOUND* in > [QueryExecutionErrorsSuite.scala|https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18be] > to trigger the error from user space. The current test uses non-user-facing > class FileSystemBasedCheckpointFileManager directly to trigger the error. > (see > [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR680] > ) > Done when: the test uses user-facing APIs as much as possible. > Proposed solution: rewrite the test following the example of > [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR641] > See [https://github.com/apache/spark/pull/38782#discussion_r1033013064] for > more context -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41834) Implement SparkSession.conf
[ https://issues.apache.org/jira/browse/SPARK-41834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41834: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 2119, in pyspark.sql.connect.functions.unix_timestamp Failed example: spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") AttributeError: 'SparkSession' object has no attribute 'conf'{code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line ?, in pyspark.sql.connect.dataframe.DataFrame.isStreaming Failed example: df = spark.readStream.format("rate").load() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df = spark.readStream.format("rate").load() AttributeError: 'SparkSession' object has no attribute 'readStream'{code} > Implement SparkSession.conf > --- > > Key: SPARK-41834 > URL: https://issues.apache.org/jira/browse/SPARK-41834 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 2119, in pyspark.sql.connect.functions.unix_timestamp > Failed example: > spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > AttributeError: 'SparkSession' object has no attribute 'conf'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41834) Implement SparkSession.conf
Sandeep Singh created SPARK-41834: - Summary: Implement SparkSession.conf Key: SPARK-41834 URL: https://issues.apache.org/jira/browse/SPARK-41834 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line ?, in pyspark.sql.connect.dataframe.DataFrame.isStreaming Failed example: df = spark.readStream.format("rate").load() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df = spark.readStream.format("rate").load() AttributeError: 'SparkSession' object has no attribute 'readStream'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions
[ https://issues.apache.org/jira/browse/SPARK-41658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653679#comment-17653679 ] Apache Spark commented on SPARK-41658: -- User 'techaddict' has created a pull request for this issue: https://github.com/apache/spark/pull/39347 > Enable doctests in pyspark.sql.connect.functions > > > Key: SPARK-41658 > URL: https://issues.apache.org/jira/browse/SPARK-41658 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions
[ https://issues.apache.org/jira/browse/SPARK-41658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41658: Assignee: (was: Apache Spark) > Enable doctests in pyspark.sql.connect.functions > > > Key: SPARK-41658 > URL: https://issues.apache.org/jira/browse/SPARK-41658 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions
[ https://issues.apache.org/jira/browse/SPARK-41658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41658: Assignee: Apache Spark > Enable doctests in pyspark.sql.connect.functions > > > Key: SPARK-41658 > URL: https://issues.apache.org/jira/browse/SPARK-41658 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions
[ https://issues.apache.org/jira/browse/SPARK-41658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653678#comment-17653678 ] Apache Spark commented on SPARK-41658: -- User 'techaddict' has created a pull request for this issue: https://github.com/apache/spark/pull/39347 > Enable doctests in pyspark.sql.connect.functions > > > Key: SPARK-41658 > URL: https://issues.apache.org/jira/browse/SPARK-41658 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41833) DataFrame.collect() output parity with pyspark
[ https://issues.apache.org/jira/browse/SPARK-41833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41833: -- Description: {code:java} ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1117, in pyspark.sql.connect.functions.array Failed example: df.select(array('age', 'age').alias("arr")).collect() Expected: [Row(arr=[2, 2]), Row(arr=[5, 5])] Got: [Row(arr=array([2, 2])), Row(arr=array([5, 5]))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1119, in pyspark.sql.connect.functions.array Failed example: df.select(array([df.age, df.age]).alias("arr")).collect() Expected: [Row(arr=[2, 2]), Row(arr=[5, 5])] Got: [Row(arr=array([2, 2])), Row(arr=array([5, 5]))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1124, in pyspark.sql.connect.functions.array_distinct Failed example: df.select(array_distinct(df.data)).collect() Expected: [Row(array_distinct(data)=[1, 2, 3]), Row(array_distinct(data)=[4, 5])] Got: [Row(array_distinct(data)=array([1, 2, 3])), Row(array_distinct(data)=array([4, 5]))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1135, in pyspark.sql.connect.functions.array_except Failed example: df.select(array_except(df.c1, df.c2)).collect() Expected: [Row(array_except(c1, c2)=['b'])] Got: [Row(array_except(c1, c2)=array(['b'], dtype=object))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1142, in pyspark.sql.connect.functions.array_intersect Failed example: df.select(array_intersect(df.c1, df.c2)).collect() Expected: [Row(array_intersect(c1, c2)=['a', 'c'])] Got: [Row(array_intersect(c1, c2)=array(['a', 'c'], dtype=object))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1180, in pyspark.sql.connect.functions.array_remove Failed example: df.select(array_remove(df.data, 1)).collect() Expected: [Row(array_remove(data, 1)=[2, 3]), Row(array_remove(data, 1)=[])] Got: [Row(array_remove(data, 1)=array([2, 3])), Row(array_remove(data, 1)=array([], dtype=int64))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1187, in pyspark.sql.connect.functions.array_repeat Failed example: df.select(array_repeat(df.data, 3).alias('r')).collect() Expected: [Row(r=['ab', 'ab', 'ab'])] Got: [Row(r=array(['ab', 'ab', 'ab'], dtype=object))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1204, in pyspark.sql.connect.functions.array_sort Failed example: df.select(array_sort(df.data).alias('r')).collect() Expected: [Row(r=[1, 2, 3, None]), Row(r=[1]), Row(r=[])] Got: [Row(r=array([ 1., 2., 3., nan])), Row(r=array([1])), Row(r=array([], dtype=int64))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1207, in pyspark.sql.connect.functions.array_sort Failed example: df.select(array_sort( "data", lambda x, y: when(x.isNull() | y.isNull(), lit(0)).otherwise(length(y) - length(x)) ).alias("r")).collect() Expected: [Row(r=['foobar', 'foo', None, 'bar']), Row(r=['foo']), Row(r=[])] Got: [Row(r=array(['foobar', 'foo', None, 'bar'], dtype=object)), Row(r=array(['foo'], dtype=object)), Row(r=array([], dtype=object))] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1209, in pyspark.sql.connect.functions.array_union Failed example: df.select(array_union(df.c1, df.c2)).collect() Expected: [Row(array_union(c1, c2)=['b', 'a', 'c', 'd', 'f'])] Got: [Row(array_union(c1, c2)=array(['b', 'a', 'c', 'd', 'f'], dtype=object))]{code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1117, in pyspark.sql.connect.functions.array Failed example: df.select(array('age', 'age').alias("arr")).collect() Expected: [Row(arr=[2, 2]), Row(arr=[5, 5])] Got: [Row(arr=array([2, 2])), Row(arr=array([5, 5]))]{code} > DataFrame.collect() output
[jira] [Updated] (SPARK-41833) DataFrame.collect() output parity with pyspark
[ https://issues.apache.org/jira/browse/SPARK-41833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41833: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", line 1117, in pyspark.sql.connect.functions.array Failed example: df.select(array('age', 'age').alias("arr")).collect() Expected: [Row(arr=[2, 2]), Row(arr=[5, 5])] Got: [Row(arr=array([2, 2])), Row(arr=array([5, 5]))]{code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 584, in pyspark.sql.connect.dataframe.DataFrame.unionByName Failed example: df1.unionByName(df2).show() Expected: ++++ |col0|col1|col2| ++++ | 1| 2| 3| | 6| 4| 5| ++++ Got: ++++ |col0|col1|col2| ++++ | 1| 2| 3| | 4| 5| 6| ++++ {code} > DataFrame.collect() output parity with pyspark > -- > > Key: SPARK-41833 > URL: https://issues.apache.org/jira/browse/SPARK-41833 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1117, in pyspark.sql.connect.functions.array > Failed example: > df.select(array('age', 'age').alias("arr")).collect() > Expected: > [Row(arr=[2, 2]), Row(arr=[5, 5])] > Got: > [Row(arr=array([2, 2])), Row(arr=array([5, 5]))]{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41833) DataFrame.collect() output parity with pyspark
Sandeep Singh created SPARK-41833: - Summary: DataFrame.collect() output parity with pyspark Key: SPARK-41833 URL: https://issues.apache.org/jira/browse/SPARK-41833 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 584, in pyspark.sql.connect.dataframe.DataFrame.unionByName Failed example: df1.unionByName(df2).show() Expected: ++++ |col0|col1|col2| ++++ | 1| 2| 3| | 6| 4| 5| ++++ Got: ++++ |col0|col1|col2| ++++ | 1| 2| 3| | 4| 5| 6| ++++ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe
[ https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41656: Assignee: (was: Apache Spark) > Enable doctests in pyspark.sql.connect.dataframe > > > Key: SPARK-41656 > URL: https://issues.apache.org/jira/browse/SPARK-41656 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe
[ https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653674#comment-17653674 ] Apache Spark commented on SPARK-41656: -- User 'techaddict' has created a pull request for this issue: https://github.com/apache/spark/pull/39346 > Enable doctests in pyspark.sql.connect.dataframe > > > Key: SPARK-41656 > URL: https://issues.apache.org/jira/browse/SPARK-41656 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe
[ https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653673#comment-17653673 ] Apache Spark commented on SPARK-41656: -- User 'techaddict' has created a pull request for this issue: https://github.com/apache/spark/pull/39346 > Enable doctests in pyspark.sql.connect.dataframe > > > Key: SPARK-41656 > URL: https://issues.apache.org/jira/browse/SPARK-41656 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe
[ https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41656: Assignee: Apache Spark > Enable doctests in pyspark.sql.connect.dataframe > > > Key: SPARK-41656 > URL: https://issues.apache.org/jira/browse/SPARK-41656 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41832) DataFrame.unionByName output is wrong
[ https://issues.apache.org/jira/browse/SPARK-41832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41832: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 584, in pyspark.sql.connect.dataframe.DataFrame.unionByName Failed example: df1.unionByName(df2).show() Expected: ++++ |col0|col1|col2| ++++ | 1| 2| 3| | 6| 4| 5| ++++ Got: ++++ |col0|col1|col2| ++++ | 1| 2| 3| | 4| 5| 6| ++++ {code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1168, in pyspark.sql.connect.dataframe.DataFrame.transform Failed example: df.transform(cast_all_to_int).transform(sort_columns_asc).show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.transform(cast_all_to_int).transform(sort_columns_asc).show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1102, in transform result = func(self, *args, **kwargs) File "", line 2, in cast_all_to_int return input_df.select([col(col_name).cast("int") for col_name in input_df.columns]) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 86, in select return DataFrame.withPlan(plan.Project(self._plan, *cols), session=self._session) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 344, in __init__ self._verify_expressions() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 350, in _verify_expressions raise InputValidationError( pyspark.sql.connect.plan.InputValidationError: Only Column or String can be used for projections: '[Column<'(ColumnReference(int) (int))'>, Column<'(ColumnReference(float) (int))'>]'. ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1179, in pyspark.sql.connect.dataframe.DataFrame.transform Failed example: df.transform(add_n, 1).transform(add_n, n=10).show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.transform(add_n, 1).transform(add_n, n=10).show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1102, in transform result = func(self, *args, **kwargs) File "", line 2, in add_n return input_df.select([(col(col_name) + n).alias(col_name) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 86, in select return DataFrame.withPlan(plan.Project(self._plan, *cols), session=self._session) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 344, in __init__ self._verify_expressions() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 350, in _verify_expressions raise InputValidationError( pyspark.sql.connect.plan.InputValidationError: Only Column or String can be used for projections: '[Column<'Alias(+(ColumnReference(int), Literal(1)), (int))'>, Column<'Alias(+(ColumnReference(float), Literal(1)), (float))'>]'.{code} > DataFrame.unionByName output is wrong > - > > Key: SPARK-41832 > URL: https://issues.apache.org/jira/browse/SPARK-41832 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 584, in pyspark.sql.connect.dataframe.DataFrame.unionByName > Failed example: > df1.unionByName(df2).show() > Expected: > ++++ > |col0|col1|col2| > ++++ > | 1| 2| 3| > | 6| 4| 5| > ++++ > Got: > ++++ > |col0|col1|col2| > ++++ > | 1| 2| 3| > | 4| 5| 6| > ++++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail:
[jira] [Created] (SPARK-41832) DataFrame.unionByName output is wrong
Sandeep Singh created SPARK-41832: - Summary: DataFrame.unionByName output is wrong Key: SPARK-41832 URL: https://issues.apache.org/jira/browse/SPARK-41832 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1168, in pyspark.sql.connect.dataframe.DataFrame.transform Failed example: df.transform(cast_all_to_int).transform(sort_columns_asc).show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.transform(cast_all_to_int).transform(sort_columns_asc).show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1102, in transform result = func(self, *args, **kwargs) File "", line 2, in cast_all_to_int return input_df.select([col(col_name).cast("int") for col_name in input_df.columns]) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 86, in select return DataFrame.withPlan(plan.Project(self._plan, *cols), session=self._session) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 344, in __init__ self._verify_expressions() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 350, in _verify_expressions raise InputValidationError( pyspark.sql.connect.plan.InputValidationError: Only Column or String can be used for projections: '[Column<'(ColumnReference(int) (int))'>, Column<'(ColumnReference(float) (int))'>]'. ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1179, in pyspark.sql.connect.dataframe.DataFrame.transform Failed example: df.transform(add_n, 1).transform(add_n, n=10).show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.transform(add_n, 1).transform(add_n, n=10).show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1102, in transform result = func(self, *args, **kwargs) File "", line 2, in add_n return input_df.select([(col(col_name) + n).alias(col_name) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 86, in select return DataFrame.withPlan(plan.Project(self._plan, *cols), session=self._session) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 344, in __init__ self._verify_expressions() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 350, in _verify_expressions raise InputValidationError( pyspark.sql.connect.plan.InputValidationError: Only Column or String can be used for projections: '[Column<'Alias(+(ColumnReference(int), Literal(1)), (int))'>, Column<'Alias(+(ColumnReference(float), Literal(1)), (float))'>]'.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41831) DataFrame.transform: Only Column or String can be used for projections
[ https://issues.apache.org/jira/browse/SPARK-41831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41831: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1168, in pyspark.sql.connect.dataframe.DataFrame.transform Failed example: df.transform(cast_all_to_int).transform(sort_columns_asc).show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.transform(cast_all_to_int).transform(sort_columns_asc).show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1102, in transform result = func(self, *args, **kwargs) File "", line 2, in cast_all_to_int return input_df.select([col(col_name).cast("int") for col_name in input_df.columns]) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 86, in select return DataFrame.withPlan(plan.Project(self._plan, *cols), session=self._session) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 344, in __init__ self._verify_expressions() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 350, in _verify_expressions raise InputValidationError( pyspark.sql.connect.plan.InputValidationError: Only Column or String can be used for projections: '[Column<'(ColumnReference(int) (int))'>, Column<'(ColumnReference(float) (int))'>]'. ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1179, in pyspark.sql.connect.dataframe.DataFrame.transform Failed example: df.transform(add_n, 1).transform(add_n, n=10).show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.transform(add_n, 1).transform(add_n, n=10).show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1102, in transform result = func(self, *args, **kwargs) File "", line 2, in add_n return input_df.select([(col(col_name) + n).alias(col_name) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 86, in select return DataFrame.withPlan(plan.Project(self._plan, *cols), session=self._session) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 344, in __init__ self._verify_expressions() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 350, in _verify_expressions raise InputValidationError( pyspark.sql.connect.plan.InputValidationError: Only Column or String can be used for projections: '[Column<'Alias(+(ColumnReference(int), Literal(1)), (int))'>, Column<'Alias(+(ColumnReference(float), Literal(1)), (float))'>]'.{code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 401, in pyspark.sql.connect.dataframe.DataFrame.sample Failed example: df.sample(0.5, 3).count() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.sample(0.5, 3).count() TypeError: DataFrame.sample() takes 2 positional arguments but 3 were given ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 411, in pyspark.sql.connect.dataframe.DataFrame.sample Failed example: df.sample(False, fraction=1.0).count() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.sample(False, fraction=1.0).count() TypeError: DataFrame.sample() got multiple values for argument 'fraction'{code} > DataFrame.transform: Only Column or String can be used for projections > -- > > Key: SPARK-41831 > URL: https://issues.apache.org/jira/browse/SPARK-41831 >
[jira] [Created] (SPARK-41831) Fix DataFrame.transform: Only Column or String can be used for projections
Sandeep Singh created SPARK-41831: - Summary: Fix DataFrame.transform: Only Column or String can be used for projections Key: SPARK-41831 URL: https://issues.apache.org/jira/browse/SPARK-41831 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 401, in pyspark.sql.connect.dataframe.DataFrame.sample Failed example: df.sample(0.5, 3).count() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.sample(0.5, 3).count() TypeError: DataFrame.sample() takes 2 positional arguments but 3 were given ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 411, in pyspark.sql.connect.dataframe.DataFrame.sample Failed example: df.sample(False, fraction=1.0).count() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.sample(False, fraction=1.0).count() TypeError: DataFrame.sample() got multiple values for argument 'fraction'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41831) DataFrame.transform: Only Column or String can be used for projections
[ https://issues.apache.org/jira/browse/SPARK-41831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41831: -- Summary: DataFrame.transform: Only Column or String can be used for projections (was: Fix DataFrame.transform: Only Column or String can be used for projections) > DataFrame.transform: Only Column or String can be used for projections > -- > > Key: SPARK-41831 > URL: https://issues.apache.org/jira/browse/SPARK-41831 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 401, in pyspark.sql.connect.dataframe.DataFrame.sample > Failed example: > df.sample(0.5, 3).count() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df.sample(0.5, 3).count() > TypeError: DataFrame.sample() takes 2 positional arguments but 3 were > given > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 411, in pyspark.sql.connect.dataframe.DataFrame.sample > Failed example: > df.sample(False, fraction=1.0).count() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df.sample(False, fraction=1.0).count() > TypeError: DataFrame.sample() got multiple values for argument > 'fraction'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41830) Fix DataFrame.sample parameters
Sandeep Singh created SPARK-41830: - Summary: Fix DataFrame.sample parameters Key: SPARK-41830 URL: https://issues.apache.org/jira/browse/SPARK-41830 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 422, in pyspark.sql.connect.dataframe.DataFrame.sort Failed example: df.orderBy(["age", "name"], ascending=[False, False]).show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.orderBy(["age", "name"], ascending=[False, False]).show() TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending' ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions Failed example: df.sortWithinPartitions("age", ascending=False) Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.sortWithinPartitions("age", ascending=False) TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword argument 'ascending'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41830) Fix DataFrame.sample parameters
[ https://issues.apache.org/jira/browse/SPARK-41830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41830: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 401, in pyspark.sql.connect.dataframe.DataFrame.sample Failed example: df.sample(0.5, 3).count() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.sample(0.5, 3).count() TypeError: DataFrame.sample() takes 2 positional arguments but 3 were given ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 411, in pyspark.sql.connect.dataframe.DataFrame.sample Failed example: df.sample(False, fraction=1.0).count() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.sample(False, fraction=1.0).count() TypeError: DataFrame.sample() got multiple values for argument 'fraction'{code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 422, in pyspark.sql.connect.dataframe.DataFrame.sort Failed example: df.orderBy(["age", "name"], ascending=[False, False]).show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.orderBy(["age", "name"], ascending=[False, False]).show() TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending' ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions Failed example: df.sortWithinPartitions("age", ascending=False) Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.sortWithinPartitions("age", ascending=False) TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword argument 'ascending'{code} > Fix DataFrame.sample parameters > --- > > Key: SPARK-41830 > URL: https://issues.apache.org/jira/browse/SPARK-41830 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 401, in pyspark.sql.connect.dataframe.DataFrame.sample > Failed example: > df.sample(0.5, 3).count() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df.sample(0.5, 3).count() > TypeError: DataFrame.sample() takes 2 positional arguments but 3 were > given > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 411, in pyspark.sql.connect.dataframe.DataFrame.sample > Failed example: > df.sample(False, fraction=1.0).count() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df.sample(False, fraction=1.0).count() > TypeError: DataFrame.sample() got multiple values for argument > 'fraction'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41829) Implement Dataframe.sort,sortWithinPartitions Ordering
[ https://issues.apache.org/jira/browse/SPARK-41829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41829: -- Summary: Implement Dataframe.sort,sortWithinPartitions Ordering (was: Implement Dataframe.sort ordering) > Implement Dataframe.sort,sortWithinPartitions Ordering > -- > > Key: SPARK-41829 > URL: https://issues.apache.org/jira/browse/SPARK-41829 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty > Failed example: > df_empty = spark.createDataFrame([], 'a STRING') > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df_empty = spark.createDataFrame([], 'a STRING') > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 186, in createDataFrame > raise ValueError("Input data cannot be empty") > ValueError: Input data cannot be empty{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41829) Implement Dataframe.sort,sortWithinPartitions Ordering
[ https://issues.apache.org/jira/browse/SPARK-41829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41829: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 422, in pyspark.sql.connect.dataframe.DataFrame.sort Failed example: df.orderBy(["age", "name"], ascending=[False, False]).show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.orderBy(["age", "name"], ascending=[False, False]).show() TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending' ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions Failed example: df.sortWithinPartitions("age", ascending=False) Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.sortWithinPartitions("age", ascending=False) TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword argument 'ascending'{code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty Failed example: df_empty = spark.createDataFrame([], 'a STRING') Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df_empty = spark.createDataFrame([], 'a STRING') File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 186, in createDataFrame raise ValueError("Input data cannot be empty") ValueError: Input data cannot be empty{code} > Implement Dataframe.sort,sortWithinPartitions Ordering > -- > > Key: SPARK-41829 > URL: https://issues.apache.org/jira/browse/SPARK-41829 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 422, in pyspark.sql.connect.dataframe.DataFrame.sort > Failed example: > df.orderBy(["age", "name"], ascending=[False, False]).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.orderBy(["age", "name"], ascending=[False, False]).show() > TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending' > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions > Failed example: > df.sortWithinPartitions("age", ascending=False) > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions[1]>", line 1, in > > df.sortWithinPartitions("age", ascending=False) > TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword > argument 'ascending'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41829) Implement Dataframe.sort ordering
Sandeep Singh created SPARK-41829: - Summary: Implement Dataframe.sort ordering Key: SPARK-41829 URL: https://issues.apache.org/jira/browse/SPARK-41829 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty Failed example: df_empty = spark.createDataFrame([], 'a STRING') Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df_empty = spark.createDataFrame([], 'a STRING') File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 186, in createDataFrame raise ValueError("Input data cannot be empty") ValueError: Input data cannot be empty{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41828) Implement creating empty Dataframe
[ https://issues.apache.org/jira/browse/SPARK-41828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41828: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty Failed example: df_empty = spark.createDataFrame([], 'a STRING') Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df_empty = spark.createDataFrame([], 'a STRING') File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 186, in createDataFrame raise ValueError("Input data cannot be empty") ValueError: Input data cannot be empty{code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 302, in pyspark.sql.connect.dataframe.DataFrame.groupBy Failed example: df.groupBy(["name", df.age]).count().sort("name", "age").show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.groupBy(["name", df.age]).count().sort("name", "age").show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 251, in groupBy raise TypeError( TypeError: groupBy requires all cols be Column or str, but got list ['name', Column<'ColumnReference(age)'>]{code} > Implement creating empty Dataframe > -- > > Key: SPARK-41828 > URL: https://issues.apache.org/jira/browse/SPARK-41828 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty > Failed example: > df_empty = spark.createDataFrame([], 'a STRING') > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df_empty = spark.createDataFrame([], 'a STRING') > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 186, in createDataFrame > raise ValueError("Input data cannot be empty") > ValueError: Input data cannot be empty{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41828) Implement creating empty Dataframe
Sandeep Singh created SPARK-41828: - Summary: Implement creating empty Dataframe Key: SPARK-41828 URL: https://issues.apache.org/jira/browse/SPARK-41828 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 302, in pyspark.sql.connect.dataframe.DataFrame.groupBy Failed example: df.groupBy(["name", df.age]).count().sort("name", "age").show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.groupBy(["name", df.age]).count().sort("name", "age").show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 251, in groupBy raise TypeError( TypeError: groupBy requires all cols be Column or str, but got list ['name', Column<'ColumnReference(age)'>]{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41827) DataFrame.groupBy requires all cols be Column or str
[ https://issues.apache.org/jira/browse/SPARK-41827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41827: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 302, in pyspark.sql.connect.dataframe.DataFrame.groupBy Failed example: df.groupBy(["name", df.age]).count().sort("name", "age").show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.groupBy(["name", df.age]).count().sort("name", "age").show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 251, in groupBy raise TypeError( TypeError: groupBy requires all cols be Column or str, but got list ['name', Column<'ColumnReference(age)'>]{code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna Failed example: df.na.fill(50).show() Expected: +---+--+-++ |age|height| name|bool| +---+--+-++ | 10| 80.5|Alice|null| | 5| 50.0| Bob|null| | 50| 50.0| Tom|null| | 50| 50.0| null|true| +---+--+-++ Got: ++--+-++ | age|height| name|bool| ++--+-++ |10.0| 80.5|Alice|null| | 5.0| 50.0| Bob|null| |50.0| 50.0| Tom|null| |50.0| 50.0| null|true| ++--+-++ {code} > DataFrame.groupBy requires all cols be Column or str > > > Key: SPARK-41827 > URL: https://issues.apache.org/jira/browse/SPARK-41827 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 302, in pyspark.sql.connect.dataframe.DataFrame.groupBy > Failed example: > df.groupBy(["name", df.age]).count().sort("name", "age").show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df.groupBy(["name", df.age]).count().sort("name", "age").show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 251, in groupBy > raise TypeError( > TypeError: groupBy requires all cols be Column or str, but got list > ['name', Column<'ColumnReference(age)'>]{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41827) DataFrame.groupBy requires all cols be Column or str
Sandeep Singh created SPARK-41827: - Summary: DataFrame.groupBy requires all cols be Column or str Key: SPARK-41827 URL: https://issues.apache.org/jira/browse/SPARK-41827 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna Failed example: df.na.fill(50).show() Expected: +---+--+-++ |age|height| name|bool| +---+--+-++ | 10| 80.5|Alice|null| | 5| 50.0| Bob|null| | 50| 50.0| Tom|null| | 50| 50.0| null|true| +---+--+-++ Got: ++--+-++ | age|height| name|bool| ++--+-++ |10.0| 80.5|Alice|null| | 5.0| 50.0| Bob|null| |50.0| 50.0| Tom|null| |50.0| 50.0| null|true| ++--+-++ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41826) Implement Dataframe.readStream
[ https://issues.apache.org/jira/browse/SPARK-41826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41826: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line ?, in pyspark.sql.connect.dataframe.DataFrame.isStreaming Failed example: df = spark.readStream.format("rate").load() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df = spark.readStream.format("rate").load() AttributeError: 'SparkSession' object has no attribute 'readStream'{code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce Failed example: df.coalesce(1).rdd.getNumPartitions() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.coalesce(1).rdd.getNumPartitions() AttributeError: 'function' object has no attribute 'getNumPartitions'{code} > Implement Dataframe.readStream > -- > > Key: SPARK-41826 > URL: https://issues.apache.org/jira/browse/SPARK-41826 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line ?, in pyspark.sql.connect.dataframe.DataFrame.isStreaming > Failed example: > df = spark.readStream.format("rate").load() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.dataframe.DataFrame.isStreaming[0]>", line 1, in > df = spark.readStream.format("rate").load() > AttributeError: 'SparkSession' object has no attribute 'readStream'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41826) Implement Dataframe.readStream
Sandeep Singh created SPARK-41826: - Summary: Implement Dataframe.readStream Key: SPARK-41826 URL: https://issues.apache.org/jira/browse/SPARK-41826 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce Failed example: df.coalesce(1).rdd.getNumPartitions() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.coalesce(1).rdd.getNumPartitions() AttributeError: 'function' object has no attribute 'getNumPartitions'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41825) DataFrame.show formatting int as double
[ https://issues.apache.org/jira/browse/SPARK-41825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41825: -- Summary: DataFrame.show formatting int as double (was: DataFrame.show formating int as double) > DataFrame.show formatting int as double > --- > > Key: SPARK-41825 > URL: https://issues.apache.org/jira/browse/SPARK-41825 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna > Failed example: > df.na.fill(50).show() > Expected: > +---+--+-++ > |age|height| name|bool| > +---+--+-++ > | 10| 80.5|Alice|null| > | 5| 50.0| Bob|null| > | 50| 50.0| Tom|null| > | 50| 50.0| null|true| > +---+--+-++ > Got: > ++--+-++ > | age|height| name|bool| > ++--+-++ > |10.0| 80.5|Alice|null| > | 5.0| 50.0| Bob|null| > |50.0| 50.0| Tom|null| > |50.0| 50.0| null|true| > ++--+-++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41825) DataFrame.show formating int as double
[ https://issues.apache.org/jira/browse/SPARK-41825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41825: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna Failed example: df.na.fill(50).show() Expected: +---+--+-++ |age|height| name|bool| +---+--+-++ | 10| 80.5|Alice|null| | 5| 50.0| Bob|null| | 50| 50.0| Tom|null| | 50| 50.0| null|true| +---+--+-++ Got: ++--+-++ | age|height| name|bool| ++--+-++ |10.0| 80.5|Alice|null| | 5.0| 50.0| Bob|null| |50.0| 50.0| Tom|null| |50.0| 50.0| null|true| ++--+-++ {code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1296, in pyspark.sql.connect.dataframe.DataFrame.explain Failed example: df.explain() Expected: == Physical Plan == *(1) Scan ExistingRDD[age...,name...] Got: == Physical Plan == LocalTableScan [age#1148L, name#1149] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1314, in pyspark.sql.connect.dataframe.DataFrame.explain Failed example: df.explain(mode="formatted") Expected: == Physical Plan == * Scan ExistingRDD (...) (1) Scan ExistingRDD [codegen id : ...] Output [2]: [age..., name...] ... Got: == Physical Plan == LocalTableScan (1) (1) LocalTableScan Output [2]: [age#1170L, name#1171] Arguments: [age#1170L, name#1171] {code} > DataFrame.show formating int as double > -- > > Key: SPARK-41825 > URL: https://issues.apache.org/jira/browse/SPARK-41825 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna > Failed example: > df.na.fill(50).show() > Expected: > +---+--+-++ > |age|height| name|bool| > +---+--+-++ > | 10| 80.5|Alice|null| > | 5| 50.0| Bob|null| > | 50| 50.0| Tom|null| > | 50| 50.0| null|true| > +---+--+-++ > Got: > ++--+-++ > | age|height| name|bool| > ++--+-++ > |10.0| 80.5|Alice|null| > | 5.0| 50.0| Bob|null| > |50.0| 50.0| Tom|null| > |50.0| 50.0| null|true| > ++--+-++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41825) DataFrame.show formating int as double
Sandeep Singh created SPARK-41825: - Summary: DataFrame.show formating int as double Key: SPARK-41825 URL: https://issues.apache.org/jira/browse/SPARK-41825 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1296, in pyspark.sql.connect.dataframe.DataFrame.explain Failed example: df.explain() Expected: == Physical Plan == *(1) Scan ExistingRDD[age...,name...] Got: == Physical Plan == LocalTableScan [age#1148L, name#1149] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1314, in pyspark.sql.connect.dataframe.DataFrame.explain Failed example: df.explain(mode="formatted") Expected: == Physical Plan == * Scan ExistingRDD (...) (1) Scan ExistingRDD [codegen id : ...] Output [2]: [age..., name...] ... Got: == Physical Plan == LocalTableScan (1) (1) LocalTableScan Output [2]: [age#1170L, name#1171] Arguments: [age#1170L, name#1171] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41824) Implement DataFrame.explain format to be similar to PySpark
Sandeep Singh created SPARK-41824: - Summary: Implement DataFrame.explain format to be similar to PySpark Key: SPARK-41824 URL: https://issues.apache.org/jira/browse/SPARK-41824 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 254, in pyspark.sql.connect.dataframe.DataFrame.drop Failed example: df.join(df2, df.name == df2.name, 'inner').drop('name').show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.join(df2, df.name == df2.name, 'inner').drop('name').show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 534, in show print(self._show_string(n, truncate, vertical)) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 423, in _show_string ).toPandas() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1031, in toPandas return self._session.client.to_pandas(query) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 413, in to_pandas return self._execute_and_fetch(req) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 573, in _execute_and_fetch self._handle_error(rpc_error) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 619, in _handle_error raise SparkConnectAnalysisException( pyspark.sql.connect.client.SparkConnectAnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, `name`]. Plan: {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41824) Implement DataFrame.explain format to be similar to PySpark
[ https://issues.apache.org/jira/browse/SPARK-41824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41824: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1296, in pyspark.sql.connect.dataframe.DataFrame.explain Failed example: df.explain() Expected: == Physical Plan == *(1) Scan ExistingRDD[age...,name...] Got: == Physical Plan == LocalTableScan [age#1148L, name#1149] ** File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1314, in pyspark.sql.connect.dataframe.DataFrame.explain Failed example: df.explain(mode="formatted") Expected: == Physical Plan == * Scan ExistingRDD (...) (1) Scan ExistingRDD [codegen id : ...] Output [2]: [age..., name...] ... Got: == Physical Plan == LocalTableScan (1) (1) LocalTableScan Output [2]: [age#1170L, name#1171] Arguments: [age#1170L, name#1171] {code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 254, in pyspark.sql.connect.dataframe.DataFrame.drop Failed example: df.join(df2, df.name == df2.name, 'inner').drop('name').show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.join(df2, df.name == df2.name, 'inner').drop('name').show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 534, in show print(self._show_string(n, truncate, vertical)) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 423, in _show_string ).toPandas() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1031, in toPandas return self._session.client.to_pandas(query) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 413, in to_pandas return self._execute_and_fetch(req) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 573, in _execute_and_fetch self._handle_error(rpc_error) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 619, in _handle_error raise SparkConnectAnalysisException( pyspark.sql.connect.client.SparkConnectAnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, `name`]. Plan: {code} > Implement DataFrame.explain format to be similar to PySpark > --- > > Key: SPARK-41824 > URL: https://issues.apache.org/jira/browse/SPARK-41824 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1296, in pyspark.sql.connect.dataframe.DataFrame.explain > Failed example: > df.explain() > Expected: > == Physical Plan == > *(1) Scan ExistingRDD[age...,name...] > Got: > == Physical Plan == > LocalTableScan [age#1148L, name#1149] > > > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1314, in pyspark.sql.connect.dataframe.DataFrame.explain > Failed example: > df.explain(mode="formatted") > Expected: > == Physical Plan == > * Scan ExistingRDD (...) > (1) Scan ExistingRDD [codegen id : ...] > Output [2]: [age..., name...] > ... > Got: > == Physical Plan == > LocalTableScan (1) > > > (1) LocalTableScan > Output [2]: [age#1170L, name#1171] > Arguments: [age#1170L, name#1171] > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41822) Setup Scala/JVM Client Connection
[ https://issues.apache.org/jira/browse/SPARK-41822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-41822: - Summary: Setup Scala/JVM Client Connection (was: Setup Scala Client Connection) > Setup Scala/JVM Client Connection > - > > Key: SPARK-41822 > URL: https://issues.apache.org/jira/browse/SPARK-41822 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > Set up the gRPC connection for the Scala/JVM client to enable communication > with the Spark Connect server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41823) DataFrame.join creating ambiguous column names
[ https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41823: -- Description: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 254, in pyspark.sql.connect.dataframe.DataFrame.drop Failed example: df.join(df2, df.name == df2.name, 'inner').drop('name').show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.join(df2, df.name == df2.name, 'inner').drop('name').show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 534, in show print(self._show_string(n, truncate, vertical)) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 423, in _show_string ).toPandas() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1031, in toPandas return self._session.client.to_pandas(query) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 413, in to_pandas return self._execute_and_fetch(req) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 573, in _execute_and_fetch self._handle_error(rpc_error) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 619, in _handle_error raise SparkConnectAnalysisException( pyspark.sql.connect.client.SparkConnectAnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, `name`]. Plan: {code} was: {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 898, in pyspark.sql.connect.dataframe.DataFrame.describe Failed example: df.describe(['age']).show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.describe(['age']).show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 832, in describe raise TypeError(f"'cols' must be list[str], but got {type(s).__name__}") TypeError: 'cols' must be list[str], but got list {code} > DataFrame.join creating ambiguous column names > -- > > Key: SPARK-41823 > URL: https://issues.apache.org/jira/browse/SPARK-41823 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 254, in pyspark.sql.connect.dataframe.DataFrame.drop > Failed example: > df.join(df2, df.name == df2.name, 'inner').drop('name').show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.join(df2, df.name == df2.name, 'inner').drop('name').show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, > `name`]. > Plan: {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (SPARK-41823) DataFrame.join creating ambiguous column names
[ https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41823: -- Summary: DataFrame.join creating ambiguous column names (was: Fix DataFrame.join creating ambiguous column names) > DataFrame.join creating ambiguous column names > -- > > Key: SPARK-41823 > URL: https://issues.apache.org/jira/browse/SPARK-41823 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 898, in pyspark.sql.connect.dataframe.DataFrame.describe > Failed example: > df.describe(['age']).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df.describe(['age']).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 832, in describe > raise TypeError(f"'cols' must be list[str], but got > {type(s).__name__}") > TypeError: 'cols' must be list[str], but got list {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41823) Fix DataFrame.join creating ambiguous column names
Sandeep Singh created SPARK-41823: - Summary: Fix DataFrame.join creating ambiguous column names Key: SPARK-41823 URL: https://issues.apache.org/jira/browse/SPARK-41823 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 898, in pyspark.sql.connect.dataframe.DataFrame.describe Failed example: df.describe(['age']).show() Exception raised: Traceback (most recent call last): File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in df.describe(['age']).show() File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 832, in describe raise TypeError(f"'cols' must be list[str], but got {type(s).__name__}") TypeError: 'cols' must be list[str], but got list {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org