Semet created SPARK-17360:
-----------------------------

             Summary: PySpark can create dataframe from a Python generator
                 Key: SPARK-17360
                 URL: https://issues.apache.org/jira/browse/SPARK-17360
             Project: Spark
          Issue Type: Improvement
            Reporter: Semet
            Priority: Trivial


It looks like one can create a dataframe from a Python generator, which might 
be more efficient that by creating the list of row and use createDataframe:

{code}
>>> # On Python 3, you want to use "range" on the following line
>>> d = ({'name': 'Alice-{}'.format(i), 'age': i} for i in xrange(0, 10000000))
>>> d  # Please note that 'd' is a generator and not a structure with the 
>>> 10000000 elements.
<generator object <genexpr> at 0x7f1234b92af0>
>>> sqlContext.createDataFrame(d).take(5)
[Row(age=1, name=u'Alice-1')]
[Row(age=2, name=u'Alice-2')]
[Row(age=3, name=u'Alice-3')]
[Row(age=4, name=u'Alice-4')]
[Row(age=5, name=u'Alice-5')]
{code}

Looking at the code, there is nothing important to change in the code, only doc 
and unit tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to