Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19646#discussion_r148779399
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -512,9 +557,7 @@ def createDataFrame(self, data, schema=None, 
samplingRatio=None, verifySchema=Tr
             except Exception:
                 has_pandas = False
             if has_pandas and isinstance(data, pandas.DataFrame):
    -            if schema is None:
    -                schema = [str(x) for x in data.columns]
    -            data = [r.tolist() for r in data.to_records(index=False)]
    --- End diff --
    
    seems `r.tolist` is the problem, how about `r[i] for i in xrange(r.size)`? 
Then we can get `numpy.datatype64`
    ```
    >>> pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 
1)]}).to_records(index=False)[0].tolist()[0]
    1509411661000000000L
    >>> pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 
1)]}).to_records(index=False)[0][0]
    numpy.datetime64('2017-10-31T02:01:01.000000000+0100')
    >>>
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to