HyukjinKwon commented on a change in pull request #34210:
URL: https://github.com/apache/spark/pull/34210#discussion_r723891511
##########
File path: python/docs/source/user_guide/pandas_on_spark/typehints.rst
##########
@@ -60,10 +60,10 @@ it as a Spark schema. As an example, you can specify the
return type hint as bel
>>> df = ps.DataFrame({'A': ['a', 'a', 'b'], 'B': [1, 2, 3], 'C': [4, 6,
5]})
>>> df.groupby('A').apply(pandas_div)
-The function ``pandas_div`` actually takes and outputs a pandas DataFrame
instead of pandas-on-Spark :class:`DataFrame`.
-However, pandas API on Spark has to force to set the mismatched type hints.
+Notice that the function ``pandas_div`` actually takes and outputs a pandas
DataFrame instead of
+pandas-on-Spark :class:`DataFrame`. So, technically the correct types should
be pandas'.
Review comment:
I think it's right one :-). I meant the correct types should be pandas'
(like of pandas). actually let me write it as of pandas since you pointed out.
##########
File path: python/docs/source/user_guide/pandas_on_spark/typehints.rst
##########
@@ -139,7 +139,8 @@ programmatically generate the return type and schema.
.. code-block:: python
- >>> def transform(pdf) -> pd.DataFrame[zip(pdf.columns, pdf.dtypes)]:
+ >>> def transform(pdf) -> pd.DataFrame[
+ .. zip(sample.columns, sample.dtypes)]:
Review comment:
We should .. but this and the below examples don't have the definition
for `psdf` too .. let's leave it in this PR for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]