[GitHub] [spark] pralabhkumar commented on pull request #34401: [SPARK-30537][PYTHON], Fix toPandas wrong dtypes when applied on empty DF when Arrow enabled

GitBox Thu, 11 Nov 2021 03:59:17 -0800


pralabhkumar commented on pull request #34401:
URL: https://github.com/apache/spark/pull/34401#issuecomment-966244527



   @HyukjinKwon 
   
   There is another way to do the same using pyarrow  . Below is the code for 
the same .  I am ok with any of the approach. 
   Please review the PR and suggest. If u are ok , I will resolve the merge 
conflicts. 
   
   tmp_schema = [StructField(tmp_column_names[i],
                             TimestampNTZType() if 
isinstance(self.schema.fields[i].dataType, TimestampType)
                             else self.schema.fields[i].dataType) for i in 
range(len(self.columns))]
   
   table = pyarrow.Table.from_arrays(arrays = [pyarrow.array([])] * 
len(self.schema.fields),
                                     schema = 
to_arrow_schema(StructType(tmp_schema)))
   pdf = table.to_pandas().set_index([pd.Index([])])
   pdf.columns = self.columns
   return pdf
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pralabhkumar commented on pull request #34401: [SPARK-30537][PYTHON], Fix toPandas wrong dtypes when applied on empty DF when Arrow enabled

Reply via email to