Github user angelini commented on the pull request:

    https://github.com/apache/spark/pull/7950#issuecomment-130473017
  
    So I did some more digging into this issue and found that the updated 
Pyrolite version was only necessary because we were skipping the 
`map(schema.toInternal)` found here: 
https://github.com/apache/spark/blob/master/python/pyspark/sql/context.py#L304
    
    We're forced to "reimplement" `sqlContext.createDataFrame` because of this 
line: 
https://github.com/apache/spark/blob/master/python/pyspark/sql/context.py#L296
    
    Our RDDs are already type checked and we don't want to call an action on 
our RDDs and force materialization.
    
    ----
    
    Another issue, is that even when datetimes + timezones work, the timezone 
is erased from the datetime object.
    
    ```python
    schema = StructType([StructField('a', TimestampType(), True)])
    df = sqlContext.createDataFrame([(datetime(2015, 1, 1, tzinfo=UTC),)], 
schema)
    assert df.first().a.tzinfo is None
    ```
    
    Your test suite should catch that, but it looks like there might have been 
a typo: 
https://github.com/apache/spark/blob/master/python/pyspark/sql/tests.py#L844
    
    If that line had been `self.assertEqual(utcnow, utcnow1)` the following 
exception would be raised:
    
    `TypeError: can't compare offset-naive and offset-aware datetimes`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to