very cool! Thank you, Michael.
On Thu, Sep 17, 2015 at 11:00 AM, Michael Armbrust <mich...@databricks.com> wrote: > from pyspark.sql.functions import * > > > > df = sqlContext.range(10).select(rand().alias("a"), rand().alias("b")) > > df.where("a > b").show() > > (2) Spark Jobs > +------------------+-------------------+ | a| b| > +------------------+-------------------+ > |0.6697439215581628|0.23420961030968923| |0.9248996796756386| > 0.4146647917936366| +------------------+-------------------+ > > On Thu, Sep 17, 2015 at 9:32 AM, Rex X <dnsr...@gmail.com> wrote: > >> With Pandas dataframe >> <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html>, >> we can do query: >> >> >>> from numpy.random import randn>>> from pandas import DataFrame>>> df = >> >>> DataFrame(randn(10, 2), columns=list('ab'))>>> df.query('a > b') >> >> >> This SQL-select-like query is very convenient. Can we do similar thing >> with the new dataframe of spark? >> >> >> Best, >> Rex >> > >