[GitHub] [spark] HyukjinKwon commented on a change in pull request #35296: [SPARK-37981][PYTHON] Add note for deleting Null and NaN

GitBox Tue, 25 Jan 2022 08:40:42 -0800


HyukjinKwon commented on a change in pull request #35296:
URL: https://github.com/apache/spark/pull/35296#discussion_r791258906




##########
File path: python/pyspark/pandas/generic.py
##########
@@ -889,6 +889,7 @@ def to_json(
         lines: bool = True,
         partition_cols: Optional[Union[str, List[str]]] = None,
         index_col: Optional[Union[str, List[str]]] = None,
+        ignoreNullFields: bool = False,

Review comment:
       maybe let's just don't add it as a parameter for now ..... I prefer 
second way in https://github.com/apache/spark/pull/35296#discussion_r790685881 
because technically `ignoreNullFields` is PySpark options to bypass, and makes 
less sense to document in pandas api doc surface .. we can just say that these 
`options` are for PySpark I/O options.

##########
File path: python/pyspark/pandas/generic.py
##########
@@ -889,6 +889,7 @@ def to_json(
         lines: bool = True,
         partition_cols: Optional[Union[str, List[str]]] = None,
         index_col: Optional[Union[str, List[str]]] = None,
+        ignoreNullFields: bool = False,

Review comment:
       Shall we remove this?

##########
File path: python/pyspark/pandas/generic.py
##########
@@ -946,6 +947,7 @@ def to_json(
             the options in PySpark's API documentation for 
`spark.write.json(...)`.
             It has a higher priority and overwrites all other options.
             This parameter only works when `path` is specified.
+        ignoreNullFields: if set to True and path is provided, writer omits 
columns with all NaN or Null values.

Review comment:
       I think it would be even better if we can just convert this to something 
like:
   
   
   ```
   .. note:: Set ignoreNullFields keyword argument to `True` to blah blah .. 
   ```

##########
File path: python/pyspark/pandas/generic.py
##########
@@ -904,6 +905,9 @@ def to_json(
 
         .. note:: output JSON format is different from pandas'. It always use 
`orient='records'`
             for its output. This behaviour might have to change in the near 
future.
+         
+        .. note:: Set ignoreNullFields keyword argument to `True` and path is 
provided, 
+            writer omits columns with all NaN or Null values. 

Review comment:
       ```suggestion
           .. note:: Set `ignoreNullFields` keyword argument to `True` to omit 
`None` or `NaN` values
               when writing JSON objects. It works only when `path` is provided.
   ```

##########
File path: python/pyspark/pandas/generic.py
##########
@@ -889,6 +889,7 @@ def to_json(
         lines: bool = True,
         partition_cols: Optional[Union[str, List[str]]] = None,
         index_col: Optional[Union[str, List[str]]] = None,
+        ignoreNullFields: bool = False,

Review comment:
       ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35296: [SPARK-37981][PYTHON] Add note for deleting Null and NaN

Reply via email to