dlindelof commented on a change in pull request #26747: [SPARK-29188][PYTHON]
toPandas (without Arrow) gets wrong dtypes when applied on empty DF
URL: https://github.com/apache/spark/pull/26747#discussion_r354357933
##########
File path: python/pyspark/sql/tests/test_dataframe.py
##########
@@ -547,6 +547,27 @@ def test_to_pandas_avoid_astype(self):
self.assertEquals(types[1], np.object)
self.assertEquals(types[2], np.float64)
+ @unittest.skipIf(not have_pandas, pandas_requirement_message)
+ def test_to_pandas_from_empty_dataframe(self):
+ # SPARK-29188 test that toPandas() on an empty dataframe had the
correct dtypes
+ import numpy as np
+ schema = StructType([
+ StructField('double', DoubleType(), True),
+ StructField('float', FloatType(), True),
+ StructField('byte', ByteType(), True),
+ StructField('integer', IntegerType(), True),
+ StructField('long', LongType(), True),
+ StructField('short', ShortType(), True),
+ ])
Review comment:
Hi,
I've added some more types, I think we have the most important ones now.
I've also checked how this behaves in the presence of nulls.
Let me know if you think I'm missing something or if I should have done
something differently.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]