Hellsen83 commented on issue #23877: [SPARK-26449][PYTHON] Add transform method 
to DataFrame API
URL: https://github.com/apache/spark/pull/23877#issuecomment-466663621
 
 
   
   > We should leave a reference to the original PR: #23414
   > 
   
   agreed
   
   > I wonder if it's worth showing chaining at least two functions to 
highlight the point of the function?
   
   Fine with me.
   
    No big deal but the doctest needs fixing anyway:
   > 
   > ```
   > **********************************************************************
   > File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
 line 2055, in pyspark.sql.dataframe.DataFrame.transform
   > Failed example:
   >     df = spark.createDataFrame([Row(a=170.1, b=75.0)])
   > Exception raised:
   >     Traceback (most recent call last):
   >       File "/usr/lib64/pypy-2.5.1/lib-python/2.7/doctest.py", line 1315, 
in __run
   >         compileflags, 1) in test.globs
   >       File "<doctest pyspark.sql.dataframe.DataFrame.transform[0]>", line 
1, in <module>
   >         df = spark.createDataFrame([Row(a=170.1, b=75.0)])
   >     NameError: global name 'Row' is not defined
   > **********************************************************************
   > File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
 line 2058, in pyspark.sql.dataframe.DataFrame.transform
   > Failed example:
   >     df.transform(cast_all_to_int).collect()
   > Exception raised:
   >     Traceback (most recent call last):
   >       File "/usr/lib64/pypy-2.5.1/lib-python/2.7/doctest.py", line 1315, 
in __run
   >         compileflags, 1) in test.globs
   >       File "<doctest pyspark.sql.dataframe.DataFrame.transform[2]>", line 
1, in <module>
   >         df.transform(cast_all_to_int).collect()
   >       File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
 line 2061, in transform
   >         result = func(self)
   >       File "<doctest pyspark.sql.dataframe.DataFrame.transform[1]>", line 
2, in cast_all_to_int
   >         return input_df.select([col(c_name).cast("int") for c_name in 
input_df.columns])
   >     NameError: global name 'col' is not defined
   > **********************************************************************
   > ```
   > No need to use `Row` anyway. Maybe best to copy the example in the other 
PR.
   
   Needs an import of col anyway for this example to work..
   
   I will provide a chained example that will pass the doctest later this 
weekend.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to