[GitHub] cloud-fan commented on a change in pull request #23534: [SPARK-26610][PYTHON] Fix inconsistency between toJSON Method in Python and Scala.

GitBox Mon, 14 Jan 2019 04:00:01 -0800

cloud-fan commented on a change in pull request #23534: [SPARK-26610][PYTHON] 
Fix inconsistency between toJSON Method in Python and Scala.
URL: https://github.com/apache/spark/pull/23534#discussion_r247466299


 ##########
 File path: python/pyspark/sql/dataframe.py
 ##########
 @@ -109,15 +109,18 @@ def stat(self):
     @ignore_unicode_prefix
     @since(1.3)
     def toJSON(self, use_unicode=True):
-        """Converts a :class:`DataFrame` into a :class:`RDD` of string.
+        """Converts a :class:`DataFrame` into a :class:`DataFrame` of JSON 
string.
 
-        Each row is turned into a JSON document as one element in the returned 
RDD.
+        Each row is turned into a JSON document as one element in the returned 
DataFrame.
 
         >>> df.toJSON().first()
-        u'{"age":2,"name":"Alice"}'
+        Row(value=u'{"age":2,"name":"Alice"}')
         """
-        rdd = self._jdf.toJSON()
-        return RDD(rdd.toJavaRDD(), self._sc, UTF8Deserializer(use_unicode))
+        jdf = self._jdf.toJSON()
+        if self.sql_ctx._conf.pysparkDataFrameToJSONShouldReturnDataFrame():
+            return DataFrame(jdf, self.sql_ctx)
+        else:
+            return RDD(jdf.toJavaRDD(), self._sc, 
UTF8Deserializer(use_unicode))
 
 Review comment:
   So which way is more consistent?
   
   If we return DataFrame here, scala side can do `df.toJSON().map(value => 
value.xxx)` and python side needs to do `df.toJSON().map(lambda row: 
row.value.xxx)`.
   
   If we return RDD here, scala side can do `df.toJSON().select(...)` and 
python needs to do `df.toJSON().toDF().select(...)`
   
   I agree that there is no perfect way since pyspark has no dataset, we should 
pick a better but imperfect way.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] cloud-fan commented on a change in pull request #23534: [SPARK-26610][PYTHON] Fix inconsistency between toJSON Method in Python and Scala.

Reply via email to