bjornjorgensen commented on code in PR #42270:
URL: https://github.com/apache/spark/pull/42270#discussion_r1281115230


##########
python/pyspark/pandas/base.py:
##########
@@ -1625,11 +1625,10 @@ def factorize(
         Parameters
         ----------
         sort : bool, default True
-        na_sentinel : int or None, default -1
-            Value to mark "not found". If None, will not drop the NaN
-            from the uniques of the values.
-
-            .. deprecated:: 3.4.0
+        use_na_sentinel : bool, default True

Review Comment:
   I know that this is the same as pandas have. but what if we change it to
   
   If True, the sentinel -1 will be used for NaN values, effectively assigning 
them a distinct category. If False, NaN values will be encoded as non-negative 
integers, treating them as unique categories in the encoding process and 
retaining them in the set of unique categories in the data.



##########
python/pyspark/pandas/base.py:
##########
@@ -1625,11 +1625,10 @@ def factorize(
         Parameters
         ----------
         sort : bool, default True

Review Comment:
   If True, the encoding will be sorted, otherwise the order of encoding 
depends on the order of appearance of the values. 
   it that right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to