HyukjinKwon commented on a change in pull request #30181:
URL: https://github.com/apache/spark/pull/30181#discussion_r514863497



##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -935,19 +1090,32 @@ def sample(self, withReplacement=None, fraction=None, 
seed=None):
         jdf = self._jdf.sample(*args)
         return DataFrame(jdf, self.sql_ctx)
 
-    @since(1.5)
     def sampleBy(self, col, fractions, seed=None):
         """
         Returns a stratified sample without replacement based on the
         fraction given on each stratum.
 
-        :param col: column that defines strata
-        :param fractions:
+        .. versionadded:: 1.5.0
+
+        Parameters
+        ----------
+        col : :class:`Column` or str
+            column that defines strata
+
+            .. versionchanged:: 3.0
+               Added sampling by a column of :class:`Column`

Review comment:
       Yeah because it was the change into the specific parameter `col` 
(SPARK-25381). Looks like at least this is what pandas does.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to