[GitHub] spark pull request #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior ...

BryanCutler Thu, 02 Nov 2017 10:20:06 -0700

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19607#discussion_r148601935
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -274,12 +278,13 @@ def load_stream(self, stream):
             """
             Deserialize ArrowRecordBatches to an Arrow table and return as a 
list of pandas.Series.
             """
    -        from pyspark.sql.types import _check_dataframe_localize_timestamps
    +        from pyspark.sql.types import 
_check_dataframe_localize_timestamps, from_arrow_schema
             import pyarrow as pa
             reader = pa.open_stream(stream)
    +        schema = from_arrow_schema(reader.schema)
             for batch in reader:
                 # NOTE: changed from pa.Columns.to_pandas, timezone issue in 
conversion fixed in 0.7.1
    -            pdf = _check_dataframe_localize_timestamps(batch.to_pandas())
    +            pdf = _check_dataframe_localize_timestamps(batch.to_pandas(), 
schema, self._timezone)
    --- End diff --
    
    Oh, maybe I misunderstood the purpose of this conf 
"spark.sql.execution.pandas.respectSessionTimeZone".  If that is true then what 
is the behavior of Spark?
    
    1) convert timestamps in Pandas to remove the timezone and localize to 
SESSION_LOCAL_TIMEZONE
    
    2) show Pandas timestamps with SESSION_LOCAL_TIMEZONE set as the timezone
    
    It seems this change is doing (1), but what's wrong with doing (2)?  I 
think that would be a lot cleaner



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior ...

Reply via email to