allisonwang-db commented on code in PR #42161:
URL: https://github.com/apache/spark/pull/42161#discussion_r1274354197


##########
python/pyspark/sql/pandas/serializers.py:
##########
@@ -385,37 +385,28 @@ def _create_struct_array(self, df, arrow_struct_type):
         """
         import pyarrow as pa
 
-        # Input partition and result pandas.DataFrame empty, make empty Arrays 
with struct
-        if len(df) == 0 and len(df.columns) == 0:
-            arrs_names = [
-                (pa.array([], type=field.type), field.name) for field in 
arrow_struct_type
-            ]
+        if len(df.columns) == 0:
+            return pa.array([{}] * len(df), arrow_struct_type)

Review Comment:
   Under what scenario would it be possible for a DataFrame to have rows but no 
columns? i.e len(df.columns) == 0 but len(df) > 0?



##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -487,6 +487,14 @@ def eval(self, x: int):
 
         self.assertEqual(TestUDTF(lit(1)).collect(), [Row(x={1: "1"})])
 
+    def test_udtf_with_empty_output_types(self):
+        @udtf(returnType=StructType())
+        class TestUDTF:
+            def eval(self):
+                yield tuple()
+
+        self.assertEqual(TestUDTF().collect(), [Row()])

Review Comment:
   Let's use assertDataFrameEqual :)



##########
python/pyspark/sql/pandas/serializers.py:
##########
@@ -385,37 +385,28 @@ def _create_struct_array(self, df, arrow_struct_type):
         """
         import pyarrow as pa
 
-        # Input partition and result pandas.DataFrame empty, make empty Arrays 
with struct
-        if len(df) == 0 and len(df.columns) == 0:
-            arrs_names = [
-                (pa.array([], type=field.type), field.name) for field in 
arrow_struct_type
-            ]
+        if len(df.columns) == 0:
+            return pa.array([{}] * len(df), arrow_struct_type)
         # Assign result columns by schema name if user labeled with strings
         elif self._assign_cols_by_name and any(isinstance(name, str) for name 
in df.columns):

Review Comment:
   ```suggestion
           if self._assign_cols_by_name and any(isinstance(name, str) for name 
in df.columns):
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to