HyukjinKwon commented on a change in pull request #23534: [SPARK-26610][PYTHON]
Fix inconsistency between toJSON Method in Python and Scala.
URL: https://github.com/apache/spark/pull/23534#discussion_r247393303
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -109,15 +109,18 @@ def stat(self):
@ignore_unicode_prefix
@since(1.3)
def toJSON(self, use_unicode=True):
- """Converts a :class:`DataFrame` into a :class:`RDD` of string.
+ """Converts a :class:`DataFrame` into a :class:`DataFrame` of JSON
string.
- Each row is turned into a JSON document as one element in the returned
RDD.
+ Each row is turned into a JSON document as one element in the returned
DataFrame.
>>> df.toJSON().first()
- u'{"age":2,"name":"Alice"}'
+ Row(value=u'{"age":2,"name":"Alice"}')
"""
- rdd = self._jdf.toJSON()
- return RDD(rdd.toJavaRDD(), self._sc, UTF8Deserializer(use_unicode))
+ jdf = self._jdf.toJSON()
+ if self.sql_ctx._conf.pysparkDataFrameToJSONShouldReturnDataFrame():
+ return DataFrame(jdf, self.sql_ctx)
+ else:
+ return RDD(jdf.toJavaRDD(), self._sc,
UTF8Deserializer(use_unicode))
Review comment:
@ueshin, I think Scala side returns `Dataset[String]`, not DataFrame
(`Dataset[Row]`). It is arguable because API usages will be different. For
instance,
```scala
scala> val df: DataFrame = Seq("a").toDF
df: org.apache.spark.sql.DataFrame = [value: string]
scala> df.foreach(println(_))
[a]
scala> val ds: Dataset[String] = Seq("a").toDS
ds: org.apache.spark.sql.Dataset[String] = [value: string]
scala> ds.foreach(println(_))
a
```
There's no concept of Dataset in PySpark side .. so if we should change for
consistency reason, I doubt if we should change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]