GitHub user 0x0FFF opened a pull request:

    https://github.com/apache/spark/pull/8574

    [SPARK-10417][SQL] Iterating through Column results in infinite loop

    `pyspark.sql.column.Column` object has `__getitem__` method, which makes it 
iterable for Python. In fact it has `__getitem__` to address the case when the 
column might be a list or dict, for you to be able to access certain element of 
it in DF API. The ability to iterate over it is just a side effect that might 
cause confusion for the people getting familiar with Spark DF (as you might 
iterate this way on Pandas DF for instance)
    
    Issue reproduction:
    ```
    df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))
    for i in df["name"]: print i
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/0x0FFF/spark SPARK-10417

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8574.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8574
    
----
commit ea2e9d4e5e1abf7c4913ad33cb89424f444b80b7
Author: 0x0FFF <[email protected]>
Date:   2015-09-02T14:27:04Z

    [SPARK-10417][SQL] Iterating through Column results in infinite loop

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to