[GitHub] spark pull request #19607: [SPARK-22395][SQL][PYTHON] Fix the behavior of ti...

ueshin Mon, 27 Nov 2017 19:12:55 -0800

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19607#discussion_r153385979
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -444,11 +445,30 @@ def _get_numpy_record_dtype(self, rec):
                 record_type_list.append((str(col_names[i]), curr_type))
             return np.dtype(record_type_list) if has_rec_fix else None
     
    -    def _convert_from_pandas(self, pdf):
    +    def _convert_from_pandas(self, pdf, schema, timezone):
             """
              Convert a pandas.DataFrame to list of records that can be used to 
make a DataFrame
              :return list of records
             """
    +        if timezone is not None:
    +            from pyspark.sql.types import 
_check_series_convert_timestamps_tz_local
    +            copied = False
    +            if isinstance(schema, StructType):
    +                for field in schema:
    +                    # TODO: handle nested timestamps, such as 
ArrayType(TimestampType())?
    +                    if isinstance(field.dataType, TimestampType):
    +                        s = 
_check_series_convert_timestamps_tz_local(pdf[field.name], timezone)
    +                        if not copied and s is not pdf[field.name]:
    +                            pdf = pdf.copy()
    +                            copied = True
    --- End diff --
    
    Yes, it's to prevent the original one from being updated.
    I'll add some comments.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19607: [SPARK-22395][SQL][PYTHON] Fix the behavior of ti...

Reply via email to