[jira] [Commented] (SPARK-41835) Implement `transform_keys` function

2023-01-02 Thread Sandeep Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653731#comment-17653731
 ] 

Sandeep Singh commented on SPARK-41835:
---

My bad, error is about expected input types.

> Implement `transform_keys` function
> ---
>
> Key: SPARK-41835
> URL: https://issues.apache.org/jira/browse/SPARK-41835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1611, in pyspark.sql.connect.functions.transform_keys
> Failed example:
>     df.select(transform_keys(
>         "data", lambda k, _: upper(k)).alias("data_upper")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.select(transform_keys(
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve 
> "transform_keys(data, lambdafunction(upper(x_11), x_11, y_12))" due to data 
> type mismatch: Parameter 1 requires the "MAP" type, however "data" has the 
> type "STRUCT".
>     Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda 
> 'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496]
>     +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493]
>        +- LocalRelation [0#4488L, 1#4489] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41843) Implement SparkSession.udf

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41843:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 2331, in pyspark.sql.connect.functions.call_udf
Failed example:
    _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
    AttributeError: 'SparkSession' object has no attribute 'udf'{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1966, in pyspark.sql.connect.functions.hour
Failed example:
    df.select(hour('ts').alias('hour')).collect()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(hour('ts').alias('hour')).collect()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1017, in collect
        pdf = self.toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
623, in _handle_error
        raise SparkConnectException(status.message, info.reason) from None
    pyspark.sql.connect.client.SparkConnectException: 
(org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: 
Timestamp(NANOSECOND, null){code}


> Implement SparkSession.udf
> --
>
> Key: SPARK-41843
> URL: https://issues.apache.org/jira/browse/SPARK-41843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2331, in pyspark.sql.connect.functions.call_udf
> Failed example:
>     _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
>     AttributeError: 'SparkSession' object has no attribute 'udf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41843) Implement SparkSession.udf

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41843:
-

 Summary: Implement SparkSession.udf
 Key: SPARK-41843
 URL: https://issues.apache.org/jira/browse/SPARK-41843
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1966, in pyspark.sql.connect.functions.hour
Failed example:
    df.select(hour('ts').alias('hour')).collect()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(hour('ts').alias('hour')).collect()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1017, in collect
        pdf = self.toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
623, in _handle_error
        raise SparkConnectException(status.message, info.reason) from None
    pyspark.sql.connect.client.SparkConnectException: 
(org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: 
Timestamp(NANOSECOND, null){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41842) Support data type Timestamp(NANOSECOND, null)

2023-01-02 Thread Sandeep Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653728#comment-17653728
 ] 

Sandeep Singh commented on SPARK-41842:
---

Not sure about the EPIC for this one.

> Support data type Timestamp(NANOSECOND, null)
> -
>
> Key: SPARK-41842
> URL: https://issues.apache.org/jira/browse/SPARK-41842
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1966, in pyspark.sql.connect.functions.hour
> Failed example:
>     df.select(hour('ts').alias('hour')).collect()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.select(hour('ts').alias('hour')).collect()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1017, in collect
>         pdf = self.toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>         raise SparkConnectException(status.message, info.reason) from None
>     pyspark.sql.connect.client.SparkConnectException: 
> (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: 
> Timestamp(NANOSECOND, null){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41842) Support data type Timestamp(NANOSECOND, null)

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41842:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1966, in pyspark.sql.connect.functions.hour
Failed example:
    df.select(hour('ts').alias('hour')).collect()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(hour('ts').alias('hour')).collect()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1017, in collect
        pdf = self.toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
623, in _handle_error
        raise SparkConnectException(status.message, info.reason) from None
    pyspark.sql.connect.client.SparkConnectException: 
(org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: 
Timestamp(NANOSECOND, null){code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty
Failed example:
    df_empty = spark.createDataFrame([], 'a STRING')
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df_empty = spark.createDataFrame([], 'a STRING')
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 
186, in createDataFrame
        raise ValueError("Input data cannot be empty")
    ValueError: Input data cannot be empty{code}


> Support data type Timestamp(NANOSECOND, null)
> -
>
> Key: SPARK-41842
> URL: https://issues.apache.org/jira/browse/SPARK-41842
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1966, in pyspark.sql.connect.functions.hour
> Failed example:
>     df.select(hour('ts').alias('hour')).collect()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.select(hour('ts').alias('hour')).collect()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1017, in collect
>         pdf = self.toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>         raise SparkConnectException(status.message, info.reason) from None
>     pyspark.sql.connect.client.SparkConnectException: 
> (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: 
> Timestamp(NANOSECOND, null){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41842) Support data type Timestamp(NANOSECOND, null)

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41842:
-

 Summary: Support data type Timestamp(NANOSECOND, null)
 Key: SPARK-41842
 URL: https://issues.apache.org/jira/browse/SPARK-41842
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty
Failed example:
    df_empty = spark.createDataFrame([], 'a STRING')
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df_empty = spark.createDataFrame([], 'a STRING')
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 
186, in createDataFrame
        raise ValueError("Input data cannot be empty")
    ValueError: Input data cannot be empty{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41656.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39346
[https://github.com/apache/spark/pull/39346]

> Enable doctests in pyspark.sql.connect.dataframe
> 
>
> Key: SPARK-41656
> URL: https://issues.apache.org/jira/browse/SPARK-41656
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41656:


Assignee: Sandeep Singh

> Enable doctests in pyspark.sql.connect.dataframe
> 
>
> Key: SPARK-41656
> URL: https://issues.apache.org/jira/browse/SPARK-41656
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41835) Implement `transform_keys` function

2023-01-02 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653722#comment-17653722
 ] 

Ruifeng Zheng commented on SPARK-41835:
---

this function was added 

> Implement `transform_keys` function
> ---
>
> Key: SPARK-41835
> URL: https://issues.apache.org/jira/browse/SPARK-41835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41841) Support PyPI packaging without JVM

2023-01-02 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41841:


 Summary: Support PyPI packaging without JVM
 Key: SPARK-41841
 URL: https://issues.apache.org/jira/browse/SPARK-41841
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon


We should support pip install pyspark without JVM so Spark Connect can be real 
lightweight library.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41804.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39349
[https://github.com/apache/spark/pull/39349]

> InterpretedUnsafeProjection doesn't properly handle an array of UDTs
> 
>
> Key: SPARK-41804
> URL: https://issues.apache.org/jira/browse/SPARK-41804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
> Fix For: 3.4.0
>
>
> Reproduction steps:
> {noformat}
> // create a file of vector data
> import org.apache.spark.ml.linalg.{DenseVector, Vector}
> case class TestRow(varr: Array[Vector])
> val values = Array(0.1d, 0.2d, 0.3d)
> val dv = new DenseVector(values).asInstanceOf[Vector]
> val ds = Seq(TestRow(Array(dv, dv))).toDS
> ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data")
> // this works
> spark.read.format("parquet").load("vector_data").collect
> sql("set spark.sql.codegen.wholeStage=false")
> sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> // this will get an error
> spark.read.format("parquet").load("vector_data").collect
> {noformat}
> The error varies each time you run it, e.g.:
> {noformat}
> Sparse vectors require that the dimension of the indices match the dimension 
> of the values.
> You provided 2 indices and  6619240 values.
> {noformat}
> or
> {noformat}
> org.apache.spark.SparkRuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException
> {noformat}
> or
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414)
> {noformat}
> or
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 
> 1.8.0_311-b11)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0xc9d30]  acl_CopyRight+0x29
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # //hs_err_pid64213.log
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41804:


Assignee: Bruce Robbins

> InterpretedUnsafeProjection doesn't properly handle an array of UDTs
> 
>
> Key: SPARK-41804
> URL: https://issues.apache.org/jira/browse/SPARK-41804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>
> Reproduction steps:
> {noformat}
> // create a file of vector data
> import org.apache.spark.ml.linalg.{DenseVector, Vector}
> case class TestRow(varr: Array[Vector])
> val values = Array(0.1d, 0.2d, 0.3d)
> val dv = new DenseVector(values).asInstanceOf[Vector]
> val ds = Seq(TestRow(Array(dv, dv))).toDS
> ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data")
> // this works
> spark.read.format("parquet").load("vector_data").collect
> sql("set spark.sql.codegen.wholeStage=false")
> sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> // this will get an error
> spark.read.format("parquet").load("vector_data").collect
> {noformat}
> The error varies each time you run it, e.g.:
> {noformat}
> Sparse vectors require that the dimension of the indices match the dimension 
> of the values.
> You provided 2 indices and  6619240 values.
> {noformat}
> or
> {noformat}
> org.apache.spark.SparkRuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException
> {noformat}
> or
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414)
> {noformat}
> or
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 
> 1.8.0_311-b11)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0xc9d30]  acl_CopyRight+0x29
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # //hs_err_pid64213.log
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41653) Test parity: enable doctests in Spark Connect

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41653:


Assignee: Sandeep Singh  (was: Hyukjin Kwon)

> Test parity: enable doctests in Spark Connect
> -
>
> Key: SPARK-41653
> URL: https://issues.apache.org/jira/browse/SPARK-41653
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
>
> We should actually run the doctests of Spark Connect.
> We should add something like 
> https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247
>  to Spark Connect modules, and add the module into 
> https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41654) Enable doctests in pyspark.sql.connect.window

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41654:


Assignee: Sandeep Singh  (was: Hyukjin Kwon)

> Enable doctests in pyspark.sql.connect.window
> -
>
> Key: SPARK-41654
> URL: https://issues.apache.org/jira/browse/SPARK-41654
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41655) Enable doctests in pyspark.sql.connect.column

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41655:


Assignee: Sandeep Singh  (was: Hyukjin Kwon)

> Enable doctests in pyspark.sql.connect.column
> -
>
> Key: SPARK-41655
> URL: https://issues.apache.org/jira/browse/SPARK-41655
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41659) Enable doctests in pyspark.sql.connect.readwriter

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41659:


Assignee: Sandeep Singh  (was: Hyukjin Kwon)

> Enable doctests in pyspark.sql.connect.readwriter
> -
>
> Key: SPARK-41659
> URL: https://issues.apache.org/jira/browse/SPARK-41659
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41803) log() function variations are missing

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41803.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39339
[https://github.com/apache/spark/pull/39339]

> log() function variations are missing
> -
>
> Key: SPARK-41803
> URL: https://issues.apache.org/jira/browse/SPARK-41803
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41803) log() function variations are missing

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41803:


Assignee: Ruifeng Zheng  (was: Martin Grund)

> log() function variations are missing
> -
>
> Key: SPARK-41803
> URL: https://issues.apache.org/jira/browse/SPARK-41803
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41803) log() function variations are missing

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41803:


Assignee: Martin Grund

> log() function variations are missing
> -
>
> Key: SPARK-41803
> URL: https://issues.apache.org/jira/browse/SPARK-41803
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41836) Implement `transform_values` function

2023-01-02 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653714#comment-17653714
 ] 

Hyukjin Kwon commented on SPARK-41836:
--

test output?

> Implement `transform_values` function
> -
>
> Key: SPARK-41836
> URL: https://issues.apache.org/jira/browse/SPARK-41836
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41839) Implement SparkSession.sparkContext

2023-01-02 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653715#comment-17653715
 ] 

Hyukjin Kwon commented on SPARK-41839:
--

test output?

> Implement SparkSession.sparkContext
> ---
>
> Key: SPARK-41839
> URL: https://issues.apache.org/jira/browse/SPARK-41839
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41835) Implement `transform_keys` function

2023-01-02 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653713#comment-17653713
 ] 

Hyukjin Kwon commented on SPARK-41835:
--

test output?

> Implement `transform_keys` function
> ---
>
> Key: SPARK-41835
> URL: https://issues.apache.org/jira/browse/SPARK-41835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41835) Implement `transform_keys` function

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41835:


Assignee: (was: Ruifeng Zheng)

> Implement `transform_keys` function
> ---
>
> Key: SPARK-41835
> URL: https://issues.apache.org/jira/browse/SPARK-41835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41828) Implement creating empty Dataframe

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41828:
-
Epic Link: (was: SPARK-39375)

> Implement creating empty Dataframe
> --
>
> Key: SPARK-41828
> URL: https://issues.apache.org/jira/browse/SPARK-41828
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty
> Failed example:
>     df_empty = spark.createDataFrame([], 'a STRING')
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df_empty = spark.createDataFrame([], 'a STRING')
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 186, in createDataFrame
>         raise ValueError("Input data cannot be empty")
>     ValueError: Input data cannot be empty{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41828) Implement creating empty Dataframe

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41828:
-
Parent: SPARK-41281
Issue Type: Sub-task  (was: Bug)

> Implement creating empty Dataframe
> --
>
> Key: SPARK-41828
> URL: https://issues.apache.org/jira/browse/SPARK-41828
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty
> Failed example:
>     df_empty = spark.createDataFrame([], 'a STRING')
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df_empty = spark.createDataFrame([], 'a STRING')
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 186, in createDataFrame
>         raise ValueError("Input data cannot be empty")
>     ValueError: Input data cannot be empty{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41828) Implement creating empty Dataframe

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41828:
-
Parent: (was: SPARK-41279)
Issue Type: Bug  (was: Sub-task)

> Implement creating empty Dataframe
> --
>
> Key: SPARK-41828
> URL: https://issues.apache.org/jira/browse/SPARK-41828
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty
> Failed example:
>     df_empty = spark.createDataFrame([], 'a STRING')
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df_empty = spark.createDataFrame([], 'a STRING')
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 186, in createDataFrame
>         raise ValueError("Input data cannot be empty")
>     ValueError: Input data cannot be empty{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41828) Implement creating empty Dataframe

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41828:
-
Epic Link: SPARK-39375

> Implement creating empty Dataframe
> --
>
> Key: SPARK-41828
> URL: https://issues.apache.org/jira/browse/SPARK-41828
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty
> Failed example:
>     df_empty = spark.createDataFrame([], 'a STRING')
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df_empty = spark.createDataFrame([], 'a STRING')
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 186, in createDataFrame
>         raise ValueError("Input data cannot be empty")
>     ValueError: Input data cannot be empty{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41819) Implement Dataframe.rdd getNumPartitions

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41819:
-
Epic Link: (was: SPARK-39375)

> Implement Dataframe.rdd getNumPartitions
> 
>
> Key: SPARK-41819
> URL: https://issues.apache.org/jira/browse/SPARK-41819
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce
> Failed example:
>     df.coalesce(1).rdd.getNumPartitions()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df.coalesce(1).rdd.getNumPartitions()
>     AttributeError: 'function' object has no attribute 
> 'getNumPartitions'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41819) Implement Dataframe.rdd getNumPartitions

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41819:
-
Parent: SPARK-41279
Issue Type: Sub-task  (was: Bug)

> Implement Dataframe.rdd getNumPartitions
> 
>
> Key: SPARK-41819
> URL: https://issues.apache.org/jira/browse/SPARK-41819
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce
> Failed example:
>     df.coalesce(1).rdd.getNumPartitions()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df.coalesce(1).rdd.getNumPartitions()
>     AttributeError: 'function' object has no attribute 
> 'getNumPartitions'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41819) Implement Dataframe.rdd getNumPartitions

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41819:
-
Parent: (was: SPARK-41281)
Issue Type: Bug  (was: Sub-task)

> Implement Dataframe.rdd getNumPartitions
> 
>
> Key: SPARK-41819
> URL: https://issues.apache.org/jira/browse/SPARK-41819
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce
> Failed example:
>     df.coalesce(1).rdd.getNumPartitions()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df.coalesce(1).rdd.getNumPartitions()
>     AttributeError: 'function' object has no attribute 
> 'getNumPartitions'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41659) Enable doctests in pyspark.sql.connect.readwriter

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41659:


Assignee: Hyukjin Kwon

> Enable doctests in pyspark.sql.connect.readwriter
> -
>
> Key: SPARK-41659
> URL: https://issues.apache.org/jira/browse/SPARK-41659
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41819) Implement Dataframe.rdd getNumPartitions

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41819:
-
Epic Link: SPARK-39375

> Implement Dataframe.rdd getNumPartitions
> 
>
> Key: SPARK-41819
> URL: https://issues.apache.org/jira/browse/SPARK-41819
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce
> Failed example:
>     df.coalesce(1).rdd.getNumPartitions()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df.coalesce(1).rdd.getNumPartitions()
>     AttributeError: 'function' object has no attribute 
> 'getNumPartitions'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41659) Enable doctests in pyspark.sql.connect.readwriter

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41659.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39331
[https://github.com/apache/spark/pull/39331]

> Enable doctests in pyspark.sql.connect.readwriter
> -
>
> Key: SPARK-41659
> URL: https://issues.apache.org/jira/browse/SPARK-41659
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41817) SparkSession.read support reading with schema

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41817:
-
Parent: SPARK-41284
Issue Type: Sub-task  (was: Bug)

> SparkSession.read support reading with schema
> -
>
> Key: SPARK-41817
> URL: https://issues.apache.org/jira/browse/SPARK-41817
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 122, in pyspark.sql.connect.readwriter.DataFrameReader.load
> Failed example:
> with tempfile.TemporaryDirectory() as d:
> # Write a DataFrame into a CSV file with a header
> df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}])
> df.write.option("header", 
> True).mode("overwrite").format("csv").save(d)
> # Read the CSV file as a DataFrame with 'nullValue' option set to 
> 'Hyukjin Kwon',
> # and 'header' option set to `True`.
> df = spark.read.load(
> d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", 
> header=True)
> df.printSchema()
> df.show()
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
> exec(compile(example.source, filename, "single",
>   File " pyspark.sql.connect.readwriter.DataFrameReader.load[1]>", line 10, in 
> df.printSchema()
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1039, in printSchema
> print(self._tree_string())
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1035, in _tree_string
> query = self._plan.to_proto(self._session.client)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 92, in to_proto
> plan.root.CopyFrom(self.plan(session))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 245, in plan
> plan.read.data_source.schema = self.schema
> TypeError: bad argument type for built-in operation {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41817) SparkSession.read support reading with schema

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41817:
-
Epic Link: (was: SPARK-39375)

> SparkSession.read support reading with schema
> -
>
> Key: SPARK-41817
> URL: https://issues.apache.org/jira/browse/SPARK-41817
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 122, in pyspark.sql.connect.readwriter.DataFrameReader.load
> Failed example:
> with tempfile.TemporaryDirectory() as d:
> # Write a DataFrame into a CSV file with a header
> df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}])
> df.write.option("header", 
> True).mode("overwrite").format("csv").save(d)
> # Read the CSV file as a DataFrame with 'nullValue' option set to 
> 'Hyukjin Kwon',
> # and 'header' option set to `True`.
> df = spark.read.load(
> d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", 
> header=True)
> df.printSchema()
> df.show()
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
> exec(compile(example.source, filename, "single",
>   File " pyspark.sql.connect.readwriter.DataFrameReader.load[1]>", line 10, in 
> df.printSchema()
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1039, in printSchema
> print(self._tree_string())
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1035, in _tree_string
> query = self._plan.to_proto(self._session.client)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 92, in to_proto
> plan.root.CopyFrom(self.plan(session))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 245, in plan
> plan.read.data_source.schema = self.schema
> TypeError: bad argument type for built-in operation {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41818) Support DataFrameWriter.saveAsTable

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41818:
-
Parent: SPARK-41284
Issue Type: Sub-task  (was: Bug)

> Support DataFrameWriter.saveAsTable
> ---
>
> Key: SPARK-41818
> URL: https://issues.apache.org/jira/browse/SPARK-41818
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
> Failed example:
>     df.write.saveAsTable("tblA")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in 
> 
>         df.write.saveAsTable("tblA")
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 350, in saveAsTable
>         
> self._spark.client.execute_command(self._write.command(self._spark.client))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 459, in execute_command
>         self._execute(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 547, in _execute
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>         raise SparkConnectException(status.message, info.reason) from None
>     pyspark.sql.connect.client.SparkConnectException: 
> (java.lang.ClassNotFoundException) .DefaultSource{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41818) Support DataFrameWriter.saveAsTable

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41818:
-
Epic Link: (was: SPARK-39375)

> Support DataFrameWriter.saveAsTable
> ---
>
> Key: SPARK-41818
> URL: https://issues.apache.org/jira/browse/SPARK-41818
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
> Failed example:
>     df.write.saveAsTable("tblA")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in 
> 
>         df.write.saveAsTable("tblA")
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 350, in saveAsTable
>         
> self._spark.client.execute_command(self._write.command(self._spark.client))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 459, in execute_command
>         self._execute(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 547, in _execute
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>         raise SparkConnectException(status.message, info.reason) from None
>     pyspark.sql.connect.client.SparkConnectException: 
> (java.lang.ClassNotFoundException) .DefaultSource{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41817) SparkSession.read support reading with schema

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41817:
-
Epic Link: SPARK-39375

> SparkSession.read support reading with schema
> -
>
> Key: SPARK-41817
> URL: https://issues.apache.org/jira/browse/SPARK-41817
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 122, in pyspark.sql.connect.readwriter.DataFrameReader.load
> Failed example:
> with tempfile.TemporaryDirectory() as d:
> # Write a DataFrame into a CSV file with a header
> df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}])
> df.write.option("header", 
> True).mode("overwrite").format("csv").save(d)
> # Read the CSV file as a DataFrame with 'nullValue' option set to 
> 'Hyukjin Kwon',
> # and 'header' option set to `True`.
> df = spark.read.load(
> d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", 
> header=True)
> df.printSchema()
> df.show()
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
> exec(compile(example.source, filename, "single",
>   File " pyspark.sql.connect.readwriter.DataFrameReader.load[1]>", line 10, in 
> df.printSchema()
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1039, in printSchema
> print(self._tree_string())
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1035, in _tree_string
> query = self._plan.to_proto(self._session.client)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 92, in to_proto
> plan.root.CopyFrom(self.plan(session))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 245, in plan
> plan.read.data_source.schema = self.schema
> TypeError: bad argument type for built-in operation {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41817) SparkSession.read support reading with schema

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41817:
-
Parent: (was: SPARK-41281)
Issue Type: Bug  (was: Sub-task)

> SparkSession.read support reading with schema
> -
>
> Key: SPARK-41817
> URL: https://issues.apache.org/jira/browse/SPARK-41817
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 122, in pyspark.sql.connect.readwriter.DataFrameReader.load
> Failed example:
> with tempfile.TemporaryDirectory() as d:
> # Write a DataFrame into a CSV file with a header
> df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}])
> df.write.option("header", 
> True).mode("overwrite").format("csv").save(d)
> # Read the CSV file as a DataFrame with 'nullValue' option set to 
> 'Hyukjin Kwon',
> # and 'header' option set to `True`.
> df = spark.read.load(
> d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", 
> header=True)
> df.printSchema()
> df.show()
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
> exec(compile(example.source, filename, "single",
>   File " pyspark.sql.connect.readwriter.DataFrameReader.load[1]>", line 10, in 
> df.printSchema()
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1039, in printSchema
> print(self._tree_string())
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1035, in _tree_string
> query = self._plan.to_proto(self._session.client)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 92, in to_proto
> plan.root.CopyFrom(self.plan(session))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 245, in plan
> plan.read.data_source.schema = self.schema
> TypeError: bad argument type for built-in operation {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41818) Support DataFrameWriter.saveAsTable

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41818:
-
Parent: (was: SPARK-41281)
Issue Type: Bug  (was: Sub-task)

> Support DataFrameWriter.saveAsTable
> ---
>
> Key: SPARK-41818
> URL: https://issues.apache.org/jira/browse/SPARK-41818
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
> Failed example:
>     df.write.saveAsTable("tblA")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in 
> 
>         df.write.saveAsTable("tblA")
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 350, in saveAsTable
>         
> self._spark.client.execute_command(self._write.command(self._spark.client))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 459, in execute_command
>         self._execute(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 547, in _execute
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>         raise SparkConnectException(status.message, info.reason) from None
>     pyspark.sql.connect.client.SparkConnectException: 
> (java.lang.ClassNotFoundException) .DefaultSource{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41818) Support DataFrameWriter.saveAsTable

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41818:
-
Epic Link: SPARK-39375

> Support DataFrameWriter.saveAsTable
> ---
>
> Key: SPARK-41818
> URL: https://issues.apache.org/jira/browse/SPARK-41818
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
> Failed example:
>     df.write.saveAsTable("tblA")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in 
> 
>         df.write.saveAsTable("tblA")
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 350, in saveAsTable
>         
> self._spark.client.execute_command(self._write.command(self._spark.client))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 459, in execute_command
>         self._execute(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 547, in _execute
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>         raise SparkConnectException(status.message, info.reason) from None
>     pyspark.sql.connect.client.SparkConnectException: 
> (java.lang.ClassNotFoundException) .DefaultSource{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41818) Support DataFrameWriter.saveAsTable

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41818.
--
Resolution: Fixed

> Support DataFrameWriter.saveAsTable
> ---
>
> Key: SPARK-41818
> URL: https://issues.apache.org/jira/browse/SPARK-41818
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
> Failed example:
>     df.write.saveAsTable("tblA")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in 
> 
>         df.write.saveAsTable("tblA")
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 350, in saveAsTable
>         
> self._spark.client.execute_command(self._write.command(self._spark.client))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 459, in execute_command
>         self._execute(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 547, in _execute
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>         raise SparkConnectException(status.message, info.reason) from None
>     pyspark.sql.connect.client.SparkConnectException: 
> (java.lang.ClassNotFoundException) .DefaultSource{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-41818) Support DataFrameWriter.saveAsTable

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-41818:
--

> Support DataFrameWriter.saveAsTable
> ---
>
> Key: SPARK-41818
> URL: https://issues.apache.org/jira/browse/SPARK-41818
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
> Failed example:
>     df.write.saveAsTable("tblA")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in 
> 
>         df.write.saveAsTable("tblA")
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 350, in saveAsTable
>         
> self._spark.client.execute_command(self._write.command(self._spark.client))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 459, in execute_command
>         self._execute(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 547, in _execute
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>         raise SparkConnectException(status.message, info.reason) from None
>     pyspark.sql.connect.client.SparkConnectException: 
> (java.lang.ClassNotFoundException) .DefaultSource{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41804:


Assignee: Apache Spark

> InterpretedUnsafeProjection doesn't properly handle an array of UDTs
> 
>
> Key: SPARK-41804
> URL: https://issues.apache.org/jira/browse/SPARK-41804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Bruce Robbins
>Assignee: Apache Spark
>Priority: Major
>
> Reproduction steps:
> {noformat}
> // create a file of vector data
> import org.apache.spark.ml.linalg.{DenseVector, Vector}
> case class TestRow(varr: Array[Vector])
> val values = Array(0.1d, 0.2d, 0.3d)
> val dv = new DenseVector(values).asInstanceOf[Vector]
> val ds = Seq(TestRow(Array(dv, dv))).toDS
> ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data")
> // this works
> spark.read.format("parquet").load("vector_data").collect
> sql("set spark.sql.codegen.wholeStage=false")
> sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> // this will get an error
> spark.read.format("parquet").load("vector_data").collect
> {noformat}
> The error varies each time you run it, e.g.:
> {noformat}
> Sparse vectors require that the dimension of the indices match the dimension 
> of the values.
> You provided 2 indices and  6619240 values.
> {noformat}
> or
> {noformat}
> org.apache.spark.SparkRuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException
> {noformat}
> or
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414)
> {noformat}
> or
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 
> 1.8.0_311-b11)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0xc9d30]  acl_CopyRight+0x29
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # //hs_err_pid64213.log
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41804:


Assignee: (was: Apache Spark)

> InterpretedUnsafeProjection doesn't properly handle an array of UDTs
> 
>
> Key: SPARK-41804
> URL: https://issues.apache.org/jira/browse/SPARK-41804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Bruce Robbins
>Priority: Major
>
> Reproduction steps:
> {noformat}
> // create a file of vector data
> import org.apache.spark.ml.linalg.{DenseVector, Vector}
> case class TestRow(varr: Array[Vector])
> val values = Array(0.1d, 0.2d, 0.3d)
> val dv = new DenseVector(values).asInstanceOf[Vector]
> val ds = Seq(TestRow(Array(dv, dv))).toDS
> ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data")
> // this works
> spark.read.format("parquet").load("vector_data").collect
> sql("set spark.sql.codegen.wholeStage=false")
> sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> // this will get an error
> spark.read.format("parquet").load("vector_data").collect
> {noformat}
> The error varies each time you run it, e.g.:
> {noformat}
> Sparse vectors require that the dimension of the indices match the dimension 
> of the values.
> You provided 2 indices and  6619240 values.
> {noformat}
> or
> {noformat}
> org.apache.spark.SparkRuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException
> {noformat}
> or
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414)
> {noformat}
> or
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 
> 1.8.0_311-b11)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0xc9d30]  acl_CopyRight+0x29
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # //hs_err_pid64213.log
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653709#comment-17653709
 ] 

Apache Spark commented on SPARK-41804:
--

User 'bersprockets' has created a pull request for this issue:
https://github.com/apache/spark/pull/39349

> InterpretedUnsafeProjection doesn't properly handle an array of UDTs
> 
>
> Key: SPARK-41804
> URL: https://issues.apache.org/jira/browse/SPARK-41804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Bruce Robbins
>Priority: Major
>
> Reproduction steps:
> {noformat}
> // create a file of vector data
> import org.apache.spark.ml.linalg.{DenseVector, Vector}
> case class TestRow(varr: Array[Vector])
> val values = Array(0.1d, 0.2d, 0.3d)
> val dv = new DenseVector(values).asInstanceOf[Vector]
> val ds = Seq(TestRow(Array(dv, dv))).toDS
> ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data")
> // this works
> spark.read.format("parquet").load("vector_data").collect
> sql("set spark.sql.codegen.wholeStage=false")
> sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> // this will get an error
> spark.read.format("parquet").load("vector_data").collect
> {noformat}
> The error varies each time you run it, e.g.:
> {noformat}
> Sparse vectors require that the dimension of the indices match the dimension 
> of the values.
> You provided 2 indices and  6619240 values.
> {noformat}
> or
> {noformat}
> org.apache.spark.SparkRuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException
> {noformat}
> or
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414)
> {noformat}
> or
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0xa) at pc=0x0001120c9d30, pid=64213, tid=0x1003
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 
> 1.8.0_311-b11)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0xc9d30]  acl_CopyRight+0x29
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # //hs_err_pid64213.log
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> Compiled method (nm)  582142 11318 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x00011efa8890,0x00011efa8be8] = 856
>  relocation [0x00011efa89b8,0x00011efa89f8] = 64
>  main code  [0x00011efa8a00,0x00011efa8be8] = 488
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41840) DataFrame.show(): 'Column' object is not callable

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41840:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 855, in pyspark.sql.connect.functions.first
Failed example:
    df.groupby("name").agg(first("age", 
ignorenulls=True)).orderBy("name").show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.groupby("name").agg(first("age", 
ignorenulls=True)).orderBy("name").show()
    TypeError: 'Column' object is not callable{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1472, in pyspark.sql.connect.functions.posexplode_outer
Failed example:
    df.select("id", "a_map", posexplode_outer("an_array")).show()
Expected:
    +---+--+++
    | id|     a_map| pos| col|
    +---+--+++
    |  1|{x -> 1.0}|   0| foo|
    |  1|{x -> 1.0}|   1| bar|
    |  2|        {}|null|null|
    |  3|      null|null|null|
    +---+--+++
Got:
    +---+--+++
    | id| a_map| pos| col|
    +---+--+++
    |  1| {1.0}|   0| foo|
    |  1| {1.0}|   1| bar|
    |  2|{null}|null|null|
    |  3|  null|null|null|
    +---+--+++
    {code}


> DataFrame.show(): 'Column' object is not callable
> -
>
> Key: SPARK-41840
> URL: https://issues.apache.org/jira/browse/SPARK-41840
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 855, in pyspark.sql.connect.functions.first
> Failed example:
>     df.groupby("name").agg(first("age", 
> ignorenulls=True)).orderBy("name").show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.groupby("name").agg(first("age", 
> ignorenulls=True)).orderBy("name").show()
>     TypeError: 'Column' object is not callable{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41840) DataFrame.show(): 'Column' object is not callable

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41840:
-

 Summary: DataFrame.show(): 'Column' object is not callable
 Key: SPARK-41840
 URL: https://issues.apache.org/jira/browse/SPARK-41840
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1472, in pyspark.sql.connect.functions.posexplode_outer
Failed example:
    df.select("id", "a_map", posexplode_outer("an_array")).show()
Expected:
    +---+--+++
    | id|     a_map| pos| col|
    +---+--+++
    |  1|{x -> 1.0}|   0| foo|
    |  1|{x -> 1.0}|   1| bar|
    |  2|        {}|null|null|
    |  3|      null|null|null|
    +---+--+++
Got:
    +---+--+++
    | id| a_map| pos| col|
    +---+--+++
    |  1| {1.0}|   0| foo|
    |  1| {1.0}|   1| bar|
    |  2|{null}|null|null|
    |  3|  null|null|null|
    +---+--+++
    {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41839) Implement SparkSession.sparkContext

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41839:
-

 Summary: Implement SparkSession.sparkContext
 Key: SPARK-41839
 URL: https://issues.apache.org/jira/browse/SPARK-41839
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 2119, in pyspark.sql.connect.functions.unix_timestamp
Failed example:
    spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, 
in 
        spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
    AttributeError: 'SparkSession' object has no attribute 'conf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41839) Implement SparkSession.sparkContext

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41839:
--
Description: (was: {code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 2119, in pyspark.sql.connect.functions.unix_timestamp
Failed example:
    spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, 
in 
        spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
    AttributeError: 'SparkSession' object has no attribute 'conf'{code})

> Implement SparkSession.sparkContext
> ---
>
> Key: SPARK-41839
> URL: https://issues.apache.org/jira/browse/SPARK-41839
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41838) DataFrame.show() fix map printing

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41838:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1472, in pyspark.sql.connect.functions.posexplode_outer
Failed example:
    df.select("id", "a_map", posexplode_outer("an_array")).show()
Expected:
    +---+--+++
    | id|     a_map| pos| col|
    +---+--+++
    |  1|{x -> 1.0}|   0| foo|
    |  1|{x -> 1.0}|   1| bar|
    |  2|        {}|null|null|
    |  3|      null|null|null|
    +---+--+++
Got:
    +---+--+++
    | id| a_map| pos| col|
    +---+--+++
    |  1| {1.0}|   0| foo|
    |  1| {1.0}|   1| bar|
    |  2|{null}|null|null|
    |  3|  null|null|null|
    +---+--+++
    {code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1594, in pyspark.sql.connect.functions.to_json
Failed example:
    df = spark.createDataFrame(data, ("key", "value"))
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df = spark.createDataFrame(data, ("key", "value"))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 
252, in createDataFrame
        table = pa.Table.from_pandas(pdf)
      File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
      File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 611, in dataframe_to_arrays
        arrays = [convert_column(c, f)
      File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 611, in 
        arrays = [convert_column(c, f)
      File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 598, in convert_column
        raise e
      File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 592, in convert_column
        result = pa.array(col, type=type_, from_pandas=True, safe=safe)
      File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
      File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
      File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
    pyarrow.lib.ArrowInvalid: ("Could not convert 'Alice' with type str: tried 
to convert to int64", 'Conversion failed for column 1 with type object'){code}


> DataFrame.show() fix map printing
> -
>
> Key: SPARK-41838
> URL: https://issues.apache.org/jira/browse/SPARK-41838
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1472, in pyspark.sql.connect.functions.posexplode_outer
> Failed example:
>     df.select("id", "a_map", posexplode_outer("an_array")).show()
> Expected:
>     +---+--+++
>     | id|     a_map| pos| col|
>     +---+--+++
>     |  1|{x -> 1.0}|   0| foo|
>     |  1|{x -> 1.0}|   1| bar|
>     |  2|        {}|null|null|
>     |  3|      null|null|null|
>     +---+--+++
> Got:
>     +---+--+++
>     | id| a_map| pos| col|
>     +---+--+++
>     |  1| {1.0}|   0| foo|
>     |  1| {1.0}|   1| bar|
>     |  2|{null}|null|null|
>     |  3|  null|null|null|
>     +---+--+++
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41838) DataFrame.show() fix map printing

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41838:
-

 Summary: DataFrame.show() fix map printing
 Key: SPARK-41838
 URL: https://issues.apache.org/jira/browse/SPARK-41838
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1594, in pyspark.sql.connect.functions.to_json
Failed example:
    df = spark.createDataFrame(data, ("key", "value"))
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df = spark.createDataFrame(data, ("key", "value"))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 
252, in createDataFrame
        table = pa.Table.from_pandas(pdf)
      File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
      File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 611, in dataframe_to_arrays
        arrays = [convert_column(c, f)
      File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 611, in 
        arrays = [convert_column(c, f)
      File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 598, in convert_column
        raise e
      File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 592, in convert_column
        result = pa.array(col, type=type_, from_pandas=True, safe=safe)
      File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
      File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
      File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
    pyarrow.lib.ArrowInvalid: ("Could not convert 'Alice' with type str: tried 
to convert to int64", 'Conversion failed for column 1 with type object'){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41837) DataFrame.createDataFrame datatype conversion error

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41837:
-

 Summary: DataFrame.createDataFrame datatype conversion error
 Key: SPARK-41837
 URL: https://issues.apache.org/jira/browse/SPARK-41837
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
**          
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1117, in pyspark.sql.connect.functions.array
Failed example:
    df.select(array('age', 'age').alias("arr")).collect()
Expected:
    [Row(arr=[2, 2]), Row(arr=[5, 5])]
Got:
    [Row(arr=array([2, 2])), Row(arr=array([5, 5]))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1119, in pyspark.sql.connect.functions.array
Failed example:
    df.select(array([df.age, df.age]).alias("arr")).collect()
Expected:
    [Row(arr=[2, 2]), Row(arr=[5, 5])]
Got:
    [Row(arr=array([2, 2])), Row(arr=array([5, 5]))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1124, in pyspark.sql.connect.functions.array_distinct
Failed example:
    df.select(array_distinct(df.data)).collect()
Expected:
    [Row(array_distinct(data)=[1, 2, 3]), Row(array_distinct(data)=[4, 5])]
Got:
    [Row(array_distinct(data)=array([1, 2, 3])), 
Row(array_distinct(data)=array([4, 5]))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1135, in pyspark.sql.connect.functions.array_except
Failed example:
    df.select(array_except(df.c1, df.c2)).collect()
Expected:
    [Row(array_except(c1, c2)=['b'])]
Got:
    [Row(array_except(c1, c2)=array(['b'], dtype=object))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1142, in pyspark.sql.connect.functions.array_intersect
Failed example:
    df.select(array_intersect(df.c1, df.c2)).collect()
Expected:
    [Row(array_intersect(c1, c2)=['a', 'c'])]
Got:
    [Row(array_intersect(c1, c2)=array(['a', 'c'], dtype=object))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1180, in pyspark.sql.connect.functions.array_remove
Failed example:
    df.select(array_remove(df.data, 1)).collect()
Expected:
    [Row(array_remove(data, 1)=[2, 3]), Row(array_remove(data, 1)=[])]
Got:
    [Row(array_remove(data, 1)=array([2, 3])), Row(array_remove(data, 
1)=array([], dtype=int64))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1187, in pyspark.sql.connect.functions.array_repeat
Failed example:
    df.select(array_repeat(df.data, 3).alias('r')).collect()
Expected:
    [Row(r=['ab', 'ab', 'ab'])]
Got:
    [Row(r=array(['ab', 'ab', 'ab'], dtype=object))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1204, in pyspark.sql.connect.functions.array_sort
Failed example:
    df.select(array_sort(df.data).alias('r')).collect()
Expected:
    [Row(r=[1, 2, 3, None]), Row(r=[1]), Row(r=[])]
Got:
    [Row(r=array([ 1.,  2.,  3., nan])), Row(r=array([1])), Row(r=array([], 
dtype=int64))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1207, in pyspark.sql.connect.functions.array_sort
Failed example:
    df.select(array_sort(
        "data",
        lambda x, y: when(x.isNull() | y.isNull(), lit(0)).otherwise(length(y) 
- length(x))
    ).alias("r")).collect()
Expected:
    [Row(r=['foobar', 'foo', None, 'bar']), Row(r=['foo']), Row(r=[])]
Got:
    [Row(r=array(['foobar', 'foo', None, 'bar'], dtype=object)), 
Row(r=array(['foo'], dtype=object)), Row(r=array([], dtype=object))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1209, in pyspark.sql.connect.functions.array_union
Failed example:
    df.select(array_union(df.c1, df.c2)).collect()
Expected:
    [Row(array_union(c1, c2)=['b', 'a', 'c', 'd', 'f'])]
Got:
    [Row(array_union(c1, c2)=array(['b', 'a', 'c', 'd', 'f'], 
dtype=object))]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Updated] (SPARK-41837) DataFrame.createDataFrame datatype conversion error

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41837:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1594, in pyspark.sql.connect.functions.to_json
Failed example:
    df = spark.createDataFrame(data, ("key", "value"))
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df = spark.createDataFrame(data, ("key", "value"))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 
252, in createDataFrame
        table = pa.Table.from_pandas(pdf)
      File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
      File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 611, in dataframe_to_arrays
        arrays = [convert_column(c, f)
      File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 611, in 
        arrays = [convert_column(c, f)
      File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 598, in convert_column
        raise e
      File "/usr/local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 592, in convert_column
        result = pa.array(col, type=type_, from_pandas=True, safe=safe)
      File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
      File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
      File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
    pyarrow.lib.ArrowInvalid: ("Could not convert 'Alice' with type str: tried 
to convert to int64", 'Conversion failed for column 1 with type object'){code}

  was:
{code:java}
**          
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1117, in pyspark.sql.connect.functions.array
Failed example:
    df.select(array('age', 'age').alias("arr")).collect()
Expected:
    [Row(arr=[2, 2]), Row(arr=[5, 5])]
Got:
    [Row(arr=array([2, 2])), Row(arr=array([5, 5]))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1119, in pyspark.sql.connect.functions.array
Failed example:
    df.select(array([df.age, df.age]).alias("arr")).collect()
Expected:
    [Row(arr=[2, 2]), Row(arr=[5, 5])]
Got:
    [Row(arr=array([2, 2])), Row(arr=array([5, 5]))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1124, in pyspark.sql.connect.functions.array_distinct
Failed example:
    df.select(array_distinct(df.data)).collect()
Expected:
    [Row(array_distinct(data)=[1, 2, 3]), Row(array_distinct(data)=[4, 5])]
Got:
    [Row(array_distinct(data)=array([1, 2, 3])), 
Row(array_distinct(data)=array([4, 5]))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1135, in pyspark.sql.connect.functions.array_except
Failed example:
    df.select(array_except(df.c1, df.c2)).collect()
Expected:
    [Row(array_except(c1, c2)=['b'])]
Got:
    [Row(array_except(c1, c2)=array(['b'], dtype=object))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1142, in pyspark.sql.connect.functions.array_intersect
Failed example:
    df.select(array_intersect(df.c1, df.c2)).collect()
Expected:
    [Row(array_intersect(c1, c2)=['a', 'c'])]
Got:
    [Row(array_intersect(c1, c2)=array(['a', 'c'], dtype=object))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1180, in pyspark.sql.connect.functions.array_remove
Failed example:
    df.select(array_remove(df.data, 1)).collect()
Expected:
    [Row(array_remove(data, 1)=[2, 3]), Row(array_remove(data, 1)=[])]
Got:
    [Row(array_remove(data, 1)=array([2, 3])), Row(array_remove(data, 
1)=array([], dtype=int64))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1187, in pyspark.sql.connect.functions.array_repeat
Failed example:
    df.select(array_repeat(df.data, 3).alias('r')).collect()
Expected:
    [Row(r=['ab', 'ab', 'ab'])]
Got:
    [Row(r=array(['ab', 'ab', 'ab'], dtype=object))]
**
File 

[jira] [Updated] (SPARK-41836) Implement `transform_values` function

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41836:
--
Summary: Implement `transform_values` function  (was: CLONE - Implement 
`transform_values` function)

> Implement `transform_values` function
> -
>
> Key: SPARK-41836
> URL: https://issues.apache.org/jira/browse/SPARK-41836
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41836) CLONE - Implement `transform_values` function

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41836:
-

 Summary: CLONE - Implement `transform_values` function
 Key: SPARK-41836
 URL: https://issues.apache.org/jira/browse/SPARK-41836
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Sandeep Singh
Assignee: Ruifeng Zheng
 Fix For: 3.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41835) Implement `transform_keys` function

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41835:
-

 Summary: Implement `transform_keys` function
 Key: SPARK-41835
 URL: https://issues.apache.org/jira/browse/SPARK-41835
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Sandeep Singh
Assignee: Ruifeng Zheng
 Fix For: 3.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41311) Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653702#comment-17653702
 ] 

Apache Spark commented on SPARK-41311:
--

User 'ibuder' has created a pull request for this issue:
https://github.com/apache/spark/pull/39348

> Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space
> ---
>
> Key: SPARK-41311
> URL: https://issues.apache.org/jira/browse/SPARK-41311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Immanuel Buder
>Priority: Minor
>
> Rewrite the test for error class *RENAME_SRC_PATH_NOT_FOUND* in 
> [QueryExecutionErrorsSuite.scala|https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18be]
>  to trigger the error from user space. The current test uses non-user-facing 
> class FileSystemBasedCheckpointFileManager directly to trigger the error. 
> (see 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR680]
>  )
> Done when: the test uses user-facing APIs as much as possible.
> Proposed solution: rewrite the test following the example of 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR641]
> See [https://github.com/apache/spark/pull/38782#discussion_r1033013064] for 
> more context



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41311) Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41311:


Assignee: Apache Spark

> Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space
> ---
>
> Key: SPARK-41311
> URL: https://issues.apache.org/jira/browse/SPARK-41311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Immanuel Buder
>Assignee: Apache Spark
>Priority: Minor
>
> Rewrite the test for error class *RENAME_SRC_PATH_NOT_FOUND* in 
> [QueryExecutionErrorsSuite.scala|https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18be]
>  to trigger the error from user space. The current test uses non-user-facing 
> class FileSystemBasedCheckpointFileManager directly to trigger the error. 
> (see 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR680]
>  )
> Done when: the test uses user-facing APIs as much as possible.
> Proposed solution: rewrite the test following the example of 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR641]
> See [https://github.com/apache/spark/pull/38782#discussion_r1033013064] for 
> more context



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41311) Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41311:


Assignee: (was: Apache Spark)

> Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space
> ---
>
> Key: SPARK-41311
> URL: https://issues.apache.org/jira/browse/SPARK-41311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Immanuel Buder
>Priority: Minor
>
> Rewrite the test for error class *RENAME_SRC_PATH_NOT_FOUND* in 
> [QueryExecutionErrorsSuite.scala|https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18be]
>  to trigger the error from user space. The current test uses non-user-facing 
> class FileSystemBasedCheckpointFileManager directly to trigger the error. 
> (see 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR680]
>  )
> Done when: the test uses user-facing APIs as much as possible.
> Proposed solution: rewrite the test following the example of 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR641]
> See [https://github.com/apache/spark/pull/38782#discussion_r1033013064] for 
> more context



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41311) Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653701#comment-17653701
 ] 

Apache Spark commented on SPARK-41311:
--

User 'ibuder' has created a pull request for this issue:
https://github.com/apache/spark/pull/39348

> Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space
> ---
>
> Key: SPARK-41311
> URL: https://issues.apache.org/jira/browse/SPARK-41311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Immanuel Buder
>Priority: Minor
>
> Rewrite the test for error class *RENAME_SRC_PATH_NOT_FOUND* in 
> [QueryExecutionErrorsSuite.scala|https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18be]
>  to trigger the error from user space. The current test uses non-user-facing 
> class FileSystemBasedCheckpointFileManager directly to trigger the error. 
> (see 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR680]
>  )
> Done when: the test uses user-facing APIs as much as possible.
> Proposed solution: rewrite the test following the example of 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR641]
> See [https://github.com/apache/spark/pull/38782#discussion_r1033013064] for 
> more context



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41311) Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space

2023-01-02 Thread Immanuel Buder (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653682#comment-17653682
 ] 

Immanuel Buder commented on SPARK-41311:


working on this

> Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space
> ---
>
> Key: SPARK-41311
> URL: https://issues.apache.org/jira/browse/SPARK-41311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Immanuel Buder
>Priority: Minor
>
> Rewrite the test for error class *RENAME_SRC_PATH_NOT_FOUND* in 
> [QueryExecutionErrorsSuite.scala|https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18be]
>  to trigger the error from user space. The current test uses non-user-facing 
> class FileSystemBasedCheckpointFileManager directly to trigger the error. 
> (see 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR680]
>  )
> Done when: the test uses user-facing APIs as much as possible.
> Proposed solution: rewrite the test following the example of 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR641]
> See [https://github.com/apache/spark/pull/38782#discussion_r1033013064] for 
> more context



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41834) Implement SparkSession.conf

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41834:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 2119, in pyspark.sql.connect.functions.unix_timestamp
Failed example:
    spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, 
in 
        spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
    AttributeError: 'SparkSession' object has no attribute 'conf'{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line ?, in pyspark.sql.connect.dataframe.DataFrame.isStreaming
Failed example:
    df = spark.readStream.format("rate").load()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df = spark.readStream.format("rate").load()
    AttributeError: 'SparkSession' object has no attribute 'readStream'{code}


> Implement SparkSession.conf
> ---
>
> Key: SPARK-41834
> URL: https://issues.apache.org/jira/browse/SPARK-41834
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2119, in pyspark.sql.connect.functions.unix_timestamp
> Failed example:
>     spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
>     AttributeError: 'SparkSession' object has no attribute 'conf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41834) Implement SparkSession.conf

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41834:
-

 Summary: Implement SparkSession.conf
 Key: SPARK-41834
 URL: https://issues.apache.org/jira/browse/SPARK-41834
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line ?, in pyspark.sql.connect.dataframe.DataFrame.isStreaming
Failed example:
    df = spark.readStream.format("rate").load()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df = spark.readStream.format("rate").load()
    AttributeError: 'SparkSession' object has no attribute 'readStream'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653679#comment-17653679
 ] 

Apache Spark commented on SPARK-41658:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/39347

> Enable doctests in pyspark.sql.connect.functions
> 
>
> Key: SPARK-41658
> URL: https://issues.apache.org/jira/browse/SPARK-41658
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41658:


Assignee: (was: Apache Spark)

> Enable doctests in pyspark.sql.connect.functions
> 
>
> Key: SPARK-41658
> URL: https://issues.apache.org/jira/browse/SPARK-41658
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41658:


Assignee: Apache Spark

> Enable doctests in pyspark.sql.connect.functions
> 
>
> Key: SPARK-41658
> URL: https://issues.apache.org/jira/browse/SPARK-41658
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653678#comment-17653678
 ] 

Apache Spark commented on SPARK-41658:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/39347

> Enable doctests in pyspark.sql.connect.functions
> 
>
> Key: SPARK-41658
> URL: https://issues.apache.org/jira/browse/SPARK-41658
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41833) DataFrame.collect() output parity with pyspark

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41833:
--
Description: 
{code:java}
**          
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1117, in pyspark.sql.connect.functions.array
Failed example:
    df.select(array('age', 'age').alias("arr")).collect()
Expected:
    [Row(arr=[2, 2]), Row(arr=[5, 5])]
Got:
    [Row(arr=array([2, 2])), Row(arr=array([5, 5]))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1119, in pyspark.sql.connect.functions.array
Failed example:
    df.select(array([df.age, df.age]).alias("arr")).collect()
Expected:
    [Row(arr=[2, 2]), Row(arr=[5, 5])]
Got:
    [Row(arr=array([2, 2])), Row(arr=array([5, 5]))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1124, in pyspark.sql.connect.functions.array_distinct
Failed example:
    df.select(array_distinct(df.data)).collect()
Expected:
    [Row(array_distinct(data)=[1, 2, 3]), Row(array_distinct(data)=[4, 5])]
Got:
    [Row(array_distinct(data)=array([1, 2, 3])), 
Row(array_distinct(data)=array([4, 5]))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1135, in pyspark.sql.connect.functions.array_except
Failed example:
    df.select(array_except(df.c1, df.c2)).collect()
Expected:
    [Row(array_except(c1, c2)=['b'])]
Got:
    [Row(array_except(c1, c2)=array(['b'], dtype=object))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1142, in pyspark.sql.connect.functions.array_intersect
Failed example:
    df.select(array_intersect(df.c1, df.c2)).collect()
Expected:
    [Row(array_intersect(c1, c2)=['a', 'c'])]
Got:
    [Row(array_intersect(c1, c2)=array(['a', 'c'], dtype=object))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1180, in pyspark.sql.connect.functions.array_remove
Failed example:
    df.select(array_remove(df.data, 1)).collect()
Expected:
    [Row(array_remove(data, 1)=[2, 3]), Row(array_remove(data, 1)=[])]
Got:
    [Row(array_remove(data, 1)=array([2, 3])), Row(array_remove(data, 
1)=array([], dtype=int64))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1187, in pyspark.sql.connect.functions.array_repeat
Failed example:
    df.select(array_repeat(df.data, 3).alias('r')).collect()
Expected:
    [Row(r=['ab', 'ab', 'ab'])]
Got:
    [Row(r=array(['ab', 'ab', 'ab'], dtype=object))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1204, in pyspark.sql.connect.functions.array_sort
Failed example:
    df.select(array_sort(df.data).alias('r')).collect()
Expected:
    [Row(r=[1, 2, 3, None]), Row(r=[1]), Row(r=[])]
Got:
    [Row(r=array([ 1.,  2.,  3., nan])), Row(r=array([1])), Row(r=array([], 
dtype=int64))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1207, in pyspark.sql.connect.functions.array_sort
Failed example:
    df.select(array_sort(
        "data",
        lambda x, y: when(x.isNull() | y.isNull(), lit(0)).otherwise(length(y) 
- length(x))
    ).alias("r")).collect()
Expected:
    [Row(r=['foobar', 'foo', None, 'bar']), Row(r=['foo']), Row(r=[])]
Got:
    [Row(r=array(['foobar', 'foo', None, 'bar'], dtype=object)), 
Row(r=array(['foo'], dtype=object)), Row(r=array([], dtype=object))]
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1209, in pyspark.sql.connect.functions.array_union
Failed example:
    df.select(array_union(df.c1, df.c2)).collect()
Expected:
    [Row(array_union(c1, c2)=['b', 'a', 'c', 'd', 'f'])]
Got:
    [Row(array_union(c1, c2)=array(['b', 'a', 'c', 'd', 'f'], 
dtype=object))]{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1117, in pyspark.sql.connect.functions.array
Failed example:
    df.select(array('age', 'age').alias("arr")).collect()
Expected:
    [Row(arr=[2, 2]), Row(arr=[5, 5])]
Got:
    [Row(arr=array([2, 2])), Row(arr=array([5, 5]))]{code}


> DataFrame.collect() output 

[jira] [Updated] (SPARK-41833) DataFrame.collect() output parity with pyspark

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41833:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1117, in pyspark.sql.connect.functions.array
Failed example:
    df.select(array('age', 'age').alias("arr")).collect()
Expected:
    [Row(arr=[2, 2]), Row(arr=[5, 5])]
Got:
    [Row(arr=array([2, 2])), Row(arr=array([5, 5]))]{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 584, in pyspark.sql.connect.dataframe.DataFrame.unionByName
Failed example:
    df1.unionByName(df2).show()
Expected:
    ++++
    |col0|col1|col2|
    ++++
    |   1|   2|   3|
    |   6|   4|   5|
    ++++
Got:
    ++++
    |col0|col1|col2|
    ++++
    |   1|   2|   3|
    |   4|   5|   6|
    ++++
    {code}


> DataFrame.collect() output parity with pyspark
> --
>
> Key: SPARK-41833
> URL: https://issues.apache.org/jira/browse/SPARK-41833
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1117, in pyspark.sql.connect.functions.array
> Failed example:
>     df.select(array('age', 'age').alias("arr")).collect()
> Expected:
>     [Row(arr=[2, 2]), Row(arr=[5, 5])]
> Got:
>     [Row(arr=array([2, 2])), Row(arr=array([5, 5]))]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41833) DataFrame.collect() output parity with pyspark

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41833:
-

 Summary: DataFrame.collect() output parity with pyspark
 Key: SPARK-41833
 URL: https://issues.apache.org/jira/browse/SPARK-41833
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 584, in pyspark.sql.connect.dataframe.DataFrame.unionByName
Failed example:
    df1.unionByName(df2).show()
Expected:
    ++++
    |col0|col1|col2|
    ++++
    |   1|   2|   3|
    |   6|   4|   5|
    ++++
Got:
    ++++
    |col0|col1|col2|
    ++++
    |   1|   2|   3|
    |   4|   5|   6|
    ++++
    {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41656:


Assignee: (was: Apache Spark)

> Enable doctests in pyspark.sql.connect.dataframe
> 
>
> Key: SPARK-41656
> URL: https://issues.apache.org/jira/browse/SPARK-41656
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653674#comment-17653674
 ] 

Apache Spark commented on SPARK-41656:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/39346

> Enable doctests in pyspark.sql.connect.dataframe
> 
>
> Key: SPARK-41656
> URL: https://issues.apache.org/jira/browse/SPARK-41656
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653673#comment-17653673
 ] 

Apache Spark commented on SPARK-41656:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/39346

> Enable doctests in pyspark.sql.connect.dataframe
> 
>
> Key: SPARK-41656
> URL: https://issues.apache.org/jira/browse/SPARK-41656
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41656:


Assignee: Apache Spark

> Enable doctests in pyspark.sql.connect.dataframe
> 
>
> Key: SPARK-41656
> URL: https://issues.apache.org/jira/browse/SPARK-41656
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41832) DataFrame.unionByName output is wrong

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41832:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 584, in pyspark.sql.connect.dataframe.DataFrame.unionByName
Failed example:
    df1.unionByName(df2).show()
Expected:
    ++++
    |col0|col1|col2|
    ++++
    |   1|   2|   3|
    |   6|   4|   5|
    ++++
Got:
    ++++
    |col0|col1|col2|
    ++++
    |   1|   2|   3|
    |   4|   5|   6|
    ++++
    {code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1168, in pyspark.sql.connect.dataframe.DataFrame.transform
Failed example:
    df.transform(cast_all_to_int).transform(sort_columns_asc).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df.transform(cast_all_to_int).transform(sort_columns_asc).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1102, in transform
        result = func(self, *args, **kwargs)
      File "", 
line 2, in cast_all_to_int
        return input_df.select([col(col_name).cast("int") for col_name in 
input_df.columns])
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 86, in select
        return DataFrame.withPlan(plan.Project(self._plan, *cols), 
session=self._session)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
344, in __init__
        self._verify_expressions()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
350, in _verify_expressions
        raise InputValidationError(
    pyspark.sql.connect.plan.InputValidationError: Only Column or String can be 
used for projections: '[Column<'(ColumnReference(int) (int))'>, 
Column<'(ColumnReference(float) (int))'>]'.
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1179, in pyspark.sql.connect.dataframe.DataFrame.transform
Failed example:
    df.transform(add_n, 1).transform(add_n, n=10).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df.transform(add_n, 1).transform(add_n, n=10).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1102, in transform
        result = func(self, *args, **kwargs)
      File "", 
line 2, in add_n
        return input_df.select([(col(col_name) + n).alias(col_name)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 86, in select
        return DataFrame.withPlan(plan.Project(self._plan, *cols), 
session=self._session)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
344, in __init__
        self._verify_expressions()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
350, in _verify_expressions
        raise InputValidationError(
    pyspark.sql.connect.plan.InputValidationError: Only Column or String can be 
used for projections: '[Column<'Alias(+(ColumnReference(int), Literal(1)), 
(int))'>, Column<'Alias(+(ColumnReference(float), Literal(1)), 
(float))'>]'.{code}


> DataFrame.unionByName output is wrong
> -
>
> Key: SPARK-41832
> URL: https://issues.apache.org/jira/browse/SPARK-41832
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 584, in pyspark.sql.connect.dataframe.DataFrame.unionByName
> Failed example:
>     df1.unionByName(df2).show()
> Expected:
>     ++++
>     |col0|col1|col2|
>     ++++
>     |   1|   2|   3|
>     |   6|   4|   5|
>     ++++
> Got:
>     ++++
>     |col0|col1|col2|
>     ++++
>     |   1|   2|   3|
>     |   4|   5|   6|
>     ++++
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Created] (SPARK-41832) DataFrame.unionByName output is wrong

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41832:
-

 Summary: DataFrame.unionByName output is wrong
 Key: SPARK-41832
 URL: https://issues.apache.org/jira/browse/SPARK-41832
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1168, in pyspark.sql.connect.dataframe.DataFrame.transform
Failed example:
    df.transform(cast_all_to_int).transform(sort_columns_asc).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df.transform(cast_all_to_int).transform(sort_columns_asc).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1102, in transform
        result = func(self, *args, **kwargs)
      File "", 
line 2, in cast_all_to_int
        return input_df.select([col(col_name).cast("int") for col_name in 
input_df.columns])
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 86, in select
        return DataFrame.withPlan(plan.Project(self._plan, *cols), 
session=self._session)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
344, in __init__
        self._verify_expressions()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
350, in _verify_expressions
        raise InputValidationError(
    pyspark.sql.connect.plan.InputValidationError: Only Column or String can be 
used for projections: '[Column<'(ColumnReference(int) (int))'>, 
Column<'(ColumnReference(float) (int))'>]'.
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1179, in pyspark.sql.connect.dataframe.DataFrame.transform
Failed example:
    df.transform(add_n, 1).transform(add_n, n=10).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df.transform(add_n, 1).transform(add_n, n=10).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1102, in transform
        result = func(self, *args, **kwargs)
      File "", 
line 2, in add_n
        return input_df.select([(col(col_name) + n).alias(col_name)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 86, in select
        return DataFrame.withPlan(plan.Project(self._plan, *cols), 
session=self._session)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
344, in __init__
        self._verify_expressions()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
350, in _verify_expressions
        raise InputValidationError(
    pyspark.sql.connect.plan.InputValidationError: Only Column or String can be 
used for projections: '[Column<'Alias(+(ColumnReference(int), Literal(1)), 
(int))'>, Column<'Alias(+(ColumnReference(float), Literal(1)), 
(float))'>]'.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41831) DataFrame.transform: Only Column or String can be used for projections

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41831:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1168, in pyspark.sql.connect.dataframe.DataFrame.transform
Failed example:
    df.transform(cast_all_to_int).transform(sort_columns_asc).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df.transform(cast_all_to_int).transform(sort_columns_asc).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1102, in transform
        result = func(self, *args, **kwargs)
      File "", 
line 2, in cast_all_to_int
        return input_df.select([col(col_name).cast("int") for col_name in 
input_df.columns])
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 86, in select
        return DataFrame.withPlan(plan.Project(self._plan, *cols), 
session=self._session)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
344, in __init__
        self._verify_expressions()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
350, in _verify_expressions
        raise InputValidationError(
    pyspark.sql.connect.plan.InputValidationError: Only Column or String can be 
used for projections: '[Column<'(ColumnReference(int) (int))'>, 
Column<'(ColumnReference(float) (int))'>]'.
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1179, in pyspark.sql.connect.dataframe.DataFrame.transform
Failed example:
    df.transform(add_n, 1).transform(add_n, n=10).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df.transform(add_n, 1).transform(add_n, n=10).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1102, in transform
        result = func(self, *args, **kwargs)
      File "", 
line 2, in add_n
        return input_df.select([(col(col_name) + n).alias(col_name)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 86, in select
        return DataFrame.withPlan(plan.Project(self._plan, *cols), 
session=self._session)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
344, in __init__
        self._verify_expressions()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
350, in _verify_expressions
        raise InputValidationError(
    pyspark.sql.connect.plan.InputValidationError: Only Column or String can be 
used for projections: '[Column<'Alias(+(ColumnReference(int), Literal(1)), 
(int))'>, Column<'Alias(+(ColumnReference(float), Literal(1)), 
(float))'>]'.{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 401, in pyspark.sql.connect.dataframe.DataFrame.sample
Failed example:
    df.sample(0.5, 3).count()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df.sample(0.5, 3).count()
    TypeError: DataFrame.sample() takes 2 positional arguments but 3 were given
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 411, in pyspark.sql.connect.dataframe.DataFrame.sample
Failed example:
    df.sample(False, fraction=1.0).count()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df.sample(False, fraction=1.0).count()
    TypeError: DataFrame.sample() got multiple values for argument 
'fraction'{code}


> DataFrame.transform: Only Column or String can be used for projections
> --
>
> Key: SPARK-41831
> URL: https://issues.apache.org/jira/browse/SPARK-41831
>  

[jira] [Created] (SPARK-41831) Fix DataFrame.transform: Only Column or String can be used for projections

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41831:
-

 Summary: Fix DataFrame.transform: Only Column or String can be 
used for projections
 Key: SPARK-41831
 URL: https://issues.apache.org/jira/browse/SPARK-41831
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 401, in pyspark.sql.connect.dataframe.DataFrame.sample
Failed example:
    df.sample(0.5, 3).count()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df.sample(0.5, 3).count()
    TypeError: DataFrame.sample() takes 2 positional arguments but 3 were given
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 411, in pyspark.sql.connect.dataframe.DataFrame.sample
Failed example:
    df.sample(False, fraction=1.0).count()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df.sample(False, fraction=1.0).count()
    TypeError: DataFrame.sample() got multiple values for argument 
'fraction'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41831) DataFrame.transform: Only Column or String can be used for projections

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41831:
--
Summary: DataFrame.transform: Only Column or String can be used for 
projections  (was: Fix DataFrame.transform: Only Column or String can be used 
for projections)

> DataFrame.transform: Only Column or String can be used for projections
> --
>
> Key: SPARK-41831
> URL: https://issues.apache.org/jira/browse/SPARK-41831
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 401, in pyspark.sql.connect.dataframe.DataFrame.sample
> Failed example:
>     df.sample(0.5, 3).count()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df.sample(0.5, 3).count()
>     TypeError: DataFrame.sample() takes 2 positional arguments but 3 were 
> given
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 411, in pyspark.sql.connect.dataframe.DataFrame.sample
> Failed example:
>     df.sample(False, fraction=1.0).count()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df.sample(False, fraction=1.0).count()
>     TypeError: DataFrame.sample() got multiple values for argument 
> 'fraction'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41830) Fix DataFrame.sample parameters

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41830:
-

 Summary: Fix DataFrame.sample parameters
 Key: SPARK-41830
 URL: https://issues.apache.org/jira/browse/SPARK-41830
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 422, in pyspark.sql.connect.dataframe.DataFrame.sort
Failed example:
    df.orderBy(["age", "name"], ascending=[False, False]).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, 
in 
        df.orderBy(["age", "name"], ascending=[False, False]).show()
    TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending'
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions
Failed example:
    df.sortWithinPartitions("age", ascending=False)
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.sortWithinPartitions("age", ascending=False)
    TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword 
argument 'ascending'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41830) Fix DataFrame.sample parameters

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41830:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 401, in pyspark.sql.connect.dataframe.DataFrame.sample
Failed example:
    df.sample(0.5, 3).count()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df.sample(0.5, 3).count()
    TypeError: DataFrame.sample() takes 2 positional arguments but 3 were given
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 411, in pyspark.sql.connect.dataframe.DataFrame.sample
Failed example:
    df.sample(False, fraction=1.0).count()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df.sample(False, fraction=1.0).count()
    TypeError: DataFrame.sample() got multiple values for argument 
'fraction'{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 422, in pyspark.sql.connect.dataframe.DataFrame.sort
Failed example:
    df.orderBy(["age", "name"], ascending=[False, False]).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, 
in 
        df.orderBy(["age", "name"], ascending=[False, False]).show()
    TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending'
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions
Failed example:
    df.sortWithinPartitions("age", ascending=False)
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.sortWithinPartitions("age", ascending=False)
    TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword 
argument 'ascending'{code}


> Fix DataFrame.sample parameters
> ---
>
> Key: SPARK-41830
> URL: https://issues.apache.org/jira/browse/SPARK-41830
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 401, in pyspark.sql.connect.dataframe.DataFrame.sample
> Failed example:
>     df.sample(0.5, 3).count()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df.sample(0.5, 3).count()
>     TypeError: DataFrame.sample() takes 2 positional arguments but 3 were 
> given
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 411, in pyspark.sql.connect.dataframe.DataFrame.sample
> Failed example:
>     df.sample(False, fraction=1.0).count()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df.sample(False, fraction=1.0).count()
>     TypeError: DataFrame.sample() got multiple values for argument 
> 'fraction'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41829) Implement Dataframe.sort,sortWithinPartitions Ordering

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41829:
--
Summary: Implement Dataframe.sort,sortWithinPartitions Ordering  (was: 
Implement Dataframe.sort ordering)

> Implement Dataframe.sort,sortWithinPartitions Ordering
> --
>
> Key: SPARK-41829
> URL: https://issues.apache.org/jira/browse/SPARK-41829
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty
> Failed example:
>     df_empty = spark.createDataFrame([], 'a STRING')
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df_empty = spark.createDataFrame([], 'a STRING')
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 186, in createDataFrame
>         raise ValueError("Input data cannot be empty")
>     ValueError: Input data cannot be empty{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41829) Implement Dataframe.sort,sortWithinPartitions Ordering

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41829:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 422, in pyspark.sql.connect.dataframe.DataFrame.sort
Failed example:
    df.orderBy(["age", "name"], ascending=[False, False]).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, 
in 
        df.orderBy(["age", "name"], ascending=[False, False]).show()
    TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending'
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions
Failed example:
    df.sortWithinPartitions("age", ascending=False)
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.sortWithinPartitions("age", ascending=False)
    TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword 
argument 'ascending'{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty
Failed example:
    df_empty = spark.createDataFrame([], 'a STRING')
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df_empty = spark.createDataFrame([], 'a STRING')
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 
186, in createDataFrame
        raise ValueError("Input data cannot be empty")
    ValueError: Input data cannot be empty{code}


> Implement Dataframe.sort,sortWithinPartitions Ordering
> --
>
> Key: SPARK-41829
> URL: https://issues.apache.org/jira/browse/SPARK-41829
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 422, in pyspark.sql.connect.dataframe.DataFrame.sort
> Failed example:
>     df.orderBy(["age", "name"], ascending=[False, False]).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.orderBy(["age", "name"], ascending=[False, False]).show()
>     TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending'
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions
> Failed example:
>     df.sortWithinPartitions("age", ascending=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions[1]>", line 1, in 
> 
>         df.sortWithinPartitions("age", ascending=False)
>     TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword 
> argument 'ascending'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41829) Implement Dataframe.sort ordering

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41829:
-

 Summary: Implement Dataframe.sort ordering
 Key: SPARK-41829
 URL: https://issues.apache.org/jira/browse/SPARK-41829
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty
Failed example:
    df_empty = spark.createDataFrame([], 'a STRING')
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df_empty = spark.createDataFrame([], 'a STRING')
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 
186, in createDataFrame
        raise ValueError("Input data cannot be empty")
    ValueError: Input data cannot be empty{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41828) Implement creating empty Dataframe

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41828:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty
Failed example:
    df_empty = spark.createDataFrame([], 'a STRING')
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df_empty = spark.createDataFrame([], 'a STRING')
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", line 
186, in createDataFrame
        raise ValueError("Input data cannot be empty")
    ValueError: Input data cannot be empty{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 302, in pyspark.sql.connect.dataframe.DataFrame.groupBy
Failed example:
    df.groupBy(["name", df.age]).count().sort("name", "age").show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df.groupBy(["name", df.age]).count().sort("name", "age").show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 251, in groupBy
        raise TypeError(
    TypeError: groupBy requires all cols be Column or str, but got list 
['name', Column<'ColumnReference(age)'>]{code}


> Implement creating empty Dataframe
> --
>
> Key: SPARK-41828
> URL: https://issues.apache.org/jira/browse/SPARK-41828
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 99, in pyspark.sql.connect.dataframe.DataFrame.isEmpty
> Failed example:
>     df_empty = spark.createDataFrame([], 'a STRING')
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df_empty = spark.createDataFrame([], 'a STRING')
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 186, in createDataFrame
>         raise ValueError("Input data cannot be empty")
>     ValueError: Input data cannot be empty{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41828) Implement creating empty Dataframe

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41828:
-

 Summary: Implement creating empty Dataframe
 Key: SPARK-41828
 URL: https://issues.apache.org/jira/browse/SPARK-41828
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 302, in pyspark.sql.connect.dataframe.DataFrame.groupBy
Failed example:
    df.groupBy(["name", df.age]).count().sort("name", "age").show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df.groupBy(["name", df.age]).count().sort("name", "age").show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 251, in groupBy
        raise TypeError(
    TypeError: groupBy requires all cols be Column or str, but got list 
['name', Column<'ColumnReference(age)'>]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41827) DataFrame.groupBy requires all cols be Column or str

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41827:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 302, in pyspark.sql.connect.dataframe.DataFrame.groupBy
Failed example:
    df.groupBy(["name", df.age]).count().sort("name", "age").show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df.groupBy(["name", df.age]).count().sort("name", "age").show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 251, in groupBy
        raise TypeError(
    TypeError: groupBy requires all cols be Column or str, but got list 
['name', Column<'ColumnReference(age)'>]{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna
Failed example:
    df.na.fill(50).show()
Expected:
    +---+--+-++
    |age|height| name|bool|
    +---+--+-++
    | 10|  80.5|Alice|null|
    |  5|  50.0|  Bob|null|
    | 50|  50.0|  Tom|null|
    | 50|  50.0| null|true|
    +---+--+-++
Got:
    ++--+-++
    | age|height| name|bool|
    ++--+-++
    |10.0|  80.5|Alice|null|
    | 5.0|  50.0|  Bob|null|
    |50.0|  50.0|  Tom|null|
    |50.0|  50.0| null|true|
    ++--+-++
    {code}


> DataFrame.groupBy requires all cols be Column or str
> 
>
> Key: SPARK-41827
> URL: https://issues.apache.org/jira/browse/SPARK-41827
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 302, in pyspark.sql.connect.dataframe.DataFrame.groupBy
> Failed example:
>     df.groupBy(["name", df.age]).count().sort("name", "age").show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df.groupBy(["name", df.age]).count().sort("name", "age").show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 251, in groupBy
>         raise TypeError(
>     TypeError: groupBy requires all cols be Column or str, but got list 
> ['name', Column<'ColumnReference(age)'>]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41827) DataFrame.groupBy requires all cols be Column or str

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41827:
-

 Summary: DataFrame.groupBy requires all cols be Column or str
 Key: SPARK-41827
 URL: https://issues.apache.org/jira/browse/SPARK-41827
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna
Failed example:
    df.na.fill(50).show()
Expected:
    +---+--+-++
    |age|height| name|bool|
    +---+--+-++
    | 10|  80.5|Alice|null|
    |  5|  50.0|  Bob|null|
    | 50|  50.0|  Tom|null|
    | 50|  50.0| null|true|
    +---+--+-++
Got:
    ++--+-++
    | age|height| name|bool|
    ++--+-++
    |10.0|  80.5|Alice|null|
    | 5.0|  50.0|  Bob|null|
    |50.0|  50.0|  Tom|null|
    |50.0|  50.0| null|true|
    ++--+-++
    {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41826) Implement Dataframe.readStream

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41826:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line ?, in pyspark.sql.connect.dataframe.DataFrame.isStreaming
Failed example:
    df = spark.readStream.format("rate").load()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df = spark.readStream.format("rate").load()
    AttributeError: 'SparkSession' object has no attribute 'readStream'{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce
Failed example:
    df.coalesce(1).rdd.getNumPartitions()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df.coalesce(1).rdd.getNumPartitions()
    AttributeError: 'function' object has no attribute 'getNumPartitions'{code}


> Implement Dataframe.readStream
> --
>
> Key: SPARK-41826
> URL: https://issues.apache.org/jira/browse/SPARK-41826
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line ?, in pyspark.sql.connect.dataframe.DataFrame.isStreaming
> Failed example:
>     df = spark.readStream.format("rate").load()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.dataframe.DataFrame.isStreaming[0]>", line 1, in 
>         df = spark.readStream.format("rate").load()
>     AttributeError: 'SparkSession' object has no attribute 'readStream'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41826) Implement Dataframe.readStream

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41826:
-

 Summary: Implement Dataframe.readStream
 Key: SPARK-41826
 URL: https://issues.apache.org/jira/browse/SPARK-41826
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce
Failed example:
    df.coalesce(1).rdd.getNumPartitions()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df.coalesce(1).rdd.getNumPartitions()
    AttributeError: 'function' object has no attribute 'getNumPartitions'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41825) DataFrame.show formatting int as double

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41825:
--
Summary: DataFrame.show formatting int as double  (was: DataFrame.show 
formating int as double)

> DataFrame.show formatting int as double
> ---
>
> Key: SPARK-41825
> URL: https://issues.apache.org/jira/browse/SPARK-41825
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna
> Failed example:
>     df.na.fill(50).show()
> Expected:
>     +---+--+-++
>     |age|height| name|bool|
>     +---+--+-++
>     | 10|  80.5|Alice|null|
>     |  5|  50.0|  Bob|null|
>     | 50|  50.0|  Tom|null|
>     | 50|  50.0| null|true|
>     +---+--+-++
> Got:
>     ++--+-++
>     | age|height| name|bool|
>     ++--+-++
>     |10.0|  80.5|Alice|null|
>     | 5.0|  50.0|  Bob|null|
>     |50.0|  50.0|  Tom|null|
>     |50.0|  50.0| null|true|
>     ++--+-++
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41825) DataFrame.show formating int as double

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41825:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna
Failed example:
    df.na.fill(50).show()
Expected:
    +---+--+-++
    |age|height| name|bool|
    +---+--+-++
    | 10|  80.5|Alice|null|
    |  5|  50.0|  Bob|null|
    | 50|  50.0|  Tom|null|
    | 50|  50.0| null|true|
    +---+--+-++
Got:
    ++--+-++
    | age|height| name|bool|
    ++--+-++
    |10.0|  80.5|Alice|null|
    | 5.0|  50.0|  Bob|null|
    |50.0|  50.0|  Tom|null|
    |50.0|  50.0| null|true|
    ++--+-++
    {code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1296, in pyspark.sql.connect.dataframe.DataFrame.explain
Failed example:
    df.explain()
Expected:
    == Physical Plan ==
    *(1) Scan ExistingRDD[age...,name...]
Got:
    == Physical Plan ==
    LocalTableScan [age#1148L, name#1149]
    
    
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1314, in pyspark.sql.connect.dataframe.DataFrame.explain
Failed example:
    df.explain(mode="formatted")
Expected:
    == Physical Plan ==
    * Scan ExistingRDD (...)
    (1) Scan ExistingRDD [codegen id : ...]
    Output [2]: [age..., name...]
    ...
Got:
    == Physical Plan ==
    LocalTableScan (1)
    
    
    (1) LocalTableScan
    Output [2]: [age#1170L, name#1171]
    Arguments: [age#1170L, name#1171]
    
    {code}


> DataFrame.show formating int as double
> --
>
> Key: SPARK-41825
> URL: https://issues.apache.org/jira/browse/SPARK-41825
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna
> Failed example:
>     df.na.fill(50).show()
> Expected:
>     +---+--+-++
>     |age|height| name|bool|
>     +---+--+-++
>     | 10|  80.5|Alice|null|
>     |  5|  50.0|  Bob|null|
>     | 50|  50.0|  Tom|null|
>     | 50|  50.0| null|true|
>     +---+--+-++
> Got:
>     ++--+-++
>     | age|height| name|bool|
>     ++--+-++
>     |10.0|  80.5|Alice|null|
>     | 5.0|  50.0|  Bob|null|
>     |50.0|  50.0|  Tom|null|
>     |50.0|  50.0| null|true|
>     ++--+-++
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41825) DataFrame.show formating int as double

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41825:
-

 Summary: DataFrame.show formating int as double
 Key: SPARK-41825
 URL: https://issues.apache.org/jira/browse/SPARK-41825
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1296, in pyspark.sql.connect.dataframe.DataFrame.explain
Failed example:
    df.explain()
Expected:
    == Physical Plan ==
    *(1) Scan ExistingRDD[age...,name...]
Got:
    == Physical Plan ==
    LocalTableScan [age#1148L, name#1149]
    
    
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1314, in pyspark.sql.connect.dataframe.DataFrame.explain
Failed example:
    df.explain(mode="formatted")
Expected:
    == Physical Plan ==
    * Scan ExistingRDD (...)
    (1) Scan ExistingRDD [codegen id : ...]
    Output [2]: [age..., name...]
    ...
Got:
    == Physical Plan ==
    LocalTableScan (1)
    
    
    (1) LocalTableScan
    Output [2]: [age#1170L, name#1171]
    Arguments: [age#1170L, name#1171]
    
    {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41824) Implement DataFrame.explain format to be similar to PySpark

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41824:
-

 Summary: Implement DataFrame.explain format to be similar to 
PySpark
 Key: SPARK-41824
 URL: https://issues.apache.org/jira/browse/SPARK-41824
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
Failed example:
    df.join(df2, df.name == df2.name, 'inner').drop('name').show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, 
in 
        df.join(df2, df.name == df2.name, 'inner').drop('name').show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, `name`].
    Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41824) Implement DataFrame.explain format to be similar to PySpark

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41824:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1296, in pyspark.sql.connect.dataframe.DataFrame.explain
Failed example:
    df.explain()
Expected:
    == Physical Plan ==
    *(1) Scan ExistingRDD[age...,name...]
Got:
    == Physical Plan ==
    LocalTableScan [age#1148L, name#1149]
    
    
**
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1314, in pyspark.sql.connect.dataframe.DataFrame.explain
Failed example:
    df.explain(mode="formatted")
Expected:
    == Physical Plan ==
    * Scan ExistingRDD (...)
    (1) Scan ExistingRDD [codegen id : ...]
    Output [2]: [age..., name...]
    ...
Got:
    == Physical Plan ==
    LocalTableScan (1)
    
    
    (1) LocalTableScan
    Output [2]: [age#1170L, name#1171]
    Arguments: [age#1170L, name#1171]
    
    {code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
Failed example:
    df.join(df2, df.name == df2.name, 'inner').drop('name').show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, 
in 
        df.join(df2, df.name == df2.name, 'inner').drop('name').show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, `name`].
    Plan: {code}


> Implement DataFrame.explain format to be similar to PySpark
> ---
>
> Key: SPARK-41824
> URL: https://issues.apache.org/jira/browse/SPARK-41824
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1296, in pyspark.sql.connect.dataframe.DataFrame.explain
> Failed example:
>     df.explain()
> Expected:
>     == Physical Plan ==
>     *(1) Scan ExistingRDD[age...,name...]
> Got:
>     == Physical Plan ==
>     LocalTableScan [age#1148L, name#1149]
>     
>     
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1314, in pyspark.sql.connect.dataframe.DataFrame.explain
> Failed example:
>     df.explain(mode="formatted")
> Expected:
>     == Physical Plan ==
>     * Scan ExistingRDD (...)
>     (1) Scan ExistingRDD [codegen id : ...]
>     Output [2]: [age..., name...]
>     ...
> Got:
>     == Physical Plan ==
>     LocalTableScan (1)
>     
>     
>     (1) LocalTableScan
>     Output [2]: [age#1170L, name#1171]
>     Arguments: [age#1170L, name#1171]
>     
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41822) Setup Scala/JVM Client Connection

2023-01-02 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-41822:
-
Summary: Setup Scala/JVM Client Connection  (was: Setup Scala Client 
Connection)

> Setup Scala/JVM Client Connection
> -
>
> Key: SPARK-41822
> URL: https://issues.apache.org/jira/browse/SPARK-41822
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> Set up the gRPC connection for the Scala/JVM client to enable communication 
> with the Spark Connect server. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41823) DataFrame.join creating ambiguous column names

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41823:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
Failed example:
    df.join(df2, df.name == df2.name, 'inner').drop('name').show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, 
in 
        df.join(df2, df.name == df2.name, 'inner').drop('name').show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, `name`].
    Plan: {code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 898, in pyspark.sql.connect.dataframe.DataFrame.describe
Failed example:
    df.describe(['age']).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df.describe(['age']).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 832, in describe
        raise TypeError(f"'cols' must be list[str], but got {type(s).__name__}")
    TypeError: 'cols' must be list[str], but got list {code}


> DataFrame.join creating ambiguous column names
> --
>
> Key: SPARK-41823
> URL: https://issues.apache.org/jira/browse/SPARK-41823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
> Failed example:
>     df.join(df2, df.name == df2.name, 'inner').drop('name').show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.join(df2, df.name == df2.name, 'inner').drop('name').show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, 
> `name`].
>     Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SPARK-41823) DataFrame.join creating ambiguous column names

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41823:
--
Summary: DataFrame.join creating ambiguous column names  (was: Fix 
DataFrame.join creating ambiguous column names)

> DataFrame.join creating ambiguous column names
> --
>
> Key: SPARK-41823
> URL: https://issues.apache.org/jira/browse/SPARK-41823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 898, in pyspark.sql.connect.dataframe.DataFrame.describe
> Failed example:
>     df.describe(['age']).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df.describe(['age']).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 832, in describe
>         raise TypeError(f"'cols' must be list[str], but got 
> {type(s).__name__}")
>     TypeError: 'cols' must be list[str], but got list {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41823) Fix DataFrame.join creating ambiguous column names

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41823:
-

 Summary: Fix DataFrame.join creating ambiguous column names
 Key: SPARK-41823
 URL: https://issues.apache.org/jira/browse/SPARK-41823
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 898, in pyspark.sql.connect.dataframe.DataFrame.describe
Failed example:
    df.describe(['age']).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df.describe(['age']).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 832, in describe
        raise TypeError(f"'cols' must be list[str], but got {type(s).__name__}")
    TypeError: 'cols' must be list[str], but got list {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   >