DataFrames are a quite different API, more SQL-like in its operations, not functional. The equivalent would be more like df.filterExpr("value > 2")
On Sun, Feb 6, 2022 at 5:51 AM <capitnfrak...@free.fr> wrote: > for example, this work for RDD object: > > scala> val li = List(3,2,1,4,0) > li: List[Int] = List(3, 2, 1, 4, 0) > > scala> val rdd = sc.parallelize(li) > rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at > parallelize at <console>:24 > > scala> rdd.filter(_ > 2).collect() > res0: Array[Int] = Array(3, 4) > > > After I convert RDD to the dataframe, the filter won't work: > > scala> val df = rdd.toDF > df: org.apache.spark.sql.DataFrame = [value: int] > > scala> df.filter(_ > 2).show() > <console>:24: error: value > is not a member of org.apache.spark.sql.Row > df.filter(_ > 2).show() > > > But this can work: > > scala> df.filter($"value" > 2).show() > +-----+ > |value| > +-----+ > | 3| > | 4| > +-----+ > > > Where to check all the methods supported by dataframe? > > > Thank you. > Frakass > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >