[jira] [Assigned] (SPARK-41817) SparkSession.read support reading with schema

2023-02-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41817:


Assignee: Sandeep Singh

> SparkSession.read support reading with schema
> -
>
> Key: SPARK-41817
> URL: https://issues.apache.org/jira/browse/SPARK-41817
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 122, in pyspark.sql.connect.readwriter.DataFrameReader.load
> Failed example:
> with tempfile.TemporaryDirectory() as d:
> # Write a DataFrame into a CSV file with a header
> df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}])
> df.write.option("header", 
> True).mode("overwrite").format("csv").save(d)
> # Read the CSV file as a DataFrame with 'nullValue' option set to 
> 'Hyukjin Kwon',
> # and 'header' option set to `True`.
> df = spark.read.load(
> d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", 
> header=True)
> df.printSchema()
> df.show()
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
> exec(compile(example.source, filename, "single",
>   File " pyspark.sql.connect.readwriter.DataFrameReader.load[1]>", line 10, in 
> df.printSchema()
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1039, in printSchema
> print(self._tree_string())
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1035, in _tree_string
> query = self._plan.to_proto(self._session.client)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 92, in to_proto
> plan.root.CopyFrom(self.plan(session))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 245, in plan
> plan.read.data_source.schema = self.schema
> TypeError: bad argument type for built-in operation {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41817) SparkSession.read support reading with schema

2023-02-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41817:


Assignee: (was: Apache Spark)

> SparkSession.read support reading with schema
> -
>
> Key: SPARK-41817
> URL: https://issues.apache.org/jira/browse/SPARK-41817
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 122, in pyspark.sql.connect.readwriter.DataFrameReader.load
> Failed example:
> with tempfile.TemporaryDirectory() as d:
> # Write a DataFrame into a CSV file with a header
> df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}])
> df.write.option("header", 
> True).mode("overwrite").format("csv").save(d)
> # Read the CSV file as a DataFrame with 'nullValue' option set to 
> 'Hyukjin Kwon',
> # and 'header' option set to `True`.
> df = spark.read.load(
> d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", 
> header=True)
> df.printSchema()
> df.show()
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
> exec(compile(example.source, filename, "single",
>   File " pyspark.sql.connect.readwriter.DataFrameReader.load[1]>", line 10, in 
> df.printSchema()
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1039, in printSchema
> print(self._tree_string())
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1035, in _tree_string
> query = self._plan.to_proto(self._session.client)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 92, in to_proto
> plan.root.CopyFrom(self.plan(session))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 245, in plan
> plan.read.data_source.schema = self.schema
> TypeError: bad argument type for built-in operation {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41817) SparkSession.read support reading with schema

2023-02-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41817:


Assignee: Apache Spark

> SparkSession.read support reading with schema
> -
>
> Key: SPARK-41817
> URL: https://issues.apache.org/jira/browse/SPARK-41817
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 122, in pyspark.sql.connect.readwriter.DataFrameReader.load
> Failed example:
> with tempfile.TemporaryDirectory() as d:
> # Write a DataFrame into a CSV file with a header
> df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}])
> df.write.option("header", 
> True).mode("overwrite").format("csv").save(d)
> # Read the CSV file as a DataFrame with 'nullValue' option set to 
> 'Hyukjin Kwon',
> # and 'header' option set to `True`.
> df = spark.read.load(
> d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", 
> header=True)
> df.printSchema()
> df.show()
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
> exec(compile(example.source, filename, "single",
>   File " pyspark.sql.connect.readwriter.DataFrameReader.load[1]>", line 10, in 
> df.printSchema()
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1039, in printSchema
> print(self._tree_string())
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1035, in _tree_string
> query = self._plan.to_proto(self._session.client)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 92, in to_proto
> plan.root.CopyFrom(self.plan(session))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 
> 245, in plan
> plan.read.data_source.schema = self.schema
> TypeError: bad argument type for built-in operation {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org