Github user angelini commented on the pull request:
https://github.com/apache/spark/pull/7950#issuecomment-130473017
So I did some more digging into this issue and found that the updated
Pyrolite version was only necessary because we were skipping the
`map(schema.toInternal)` found here:
https://github.com/apache/spark/blob/master/python/pyspark/sql/context.py#L304
We're forced to "reimplement" `sqlContext.createDataFrame` because of this
line:
https://github.com/apache/spark/blob/master/python/pyspark/sql/context.py#L296
Our RDDs are already type checked and we don't want to call an action on
our RDDs and force materialization.
----
Another issue, is that even when datetimes + timezones work, the timezone
is erased from the datetime object.
```python
schema = StructType([StructField('a', TimestampType(), True)])
df = sqlContext.createDataFrame([(datetime(2015, 1, 1, tzinfo=UTC),)],
schema)
assert df.first().a.tzinfo is None
```
Your test suite should catch that, but it looks like there might have been
a typo:
https://github.com/apache/spark/blob/master/python/pyspark/sql/tests.py#L844
If that line had been `self.assertEqual(utcnow, utcnow1)` the following
exception would be raised:
`TypeError: can't compare offset-naive and offset-aware datetimes`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]