very cool! Thank you, Michael.

On Thu, Sep 17, 2015 at 11:00 AM, Michael Armbrust <mich...@databricks.com>
wrote:

> from pyspark.sql.functions import *
>
> ​
>
> df = sqlContext.range(10).select(rand().alias("a"), rand().alias("b"))
>
> df.where("a > b").show()
>
> (2) Spark Jobs
> +------------------+-------------------+ | a| b|
> +------------------+-------------------+
> |0.6697439215581628|0.23420961030968923| |0.9248996796756386|
> 0.4146647917936366| +------------------+-------------------+
>
> On Thu, Sep 17, 2015 at 9:32 AM, Rex X <dnsr...@gmail.com> wrote:
>
>> With Pandas dataframe
>> <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html>,
>> we can do query:
>>
>> >>> from numpy.random import randn>>> from pandas import DataFrame>>> df = 
>> >>> DataFrame(randn(10, 2), columns=list('ab'))>>> df.query('a > b')
>>
>>
>> This SQL-select-like query is very convenient. Can we do similar thing
>> with the new dataframe of spark?
>>
>>
>> Best,
>> Rex
>>
>
>

Reply via email to