DataFrames are a quite different API, more SQL-like in its operations, not
functional. The equivalent would be more like df.filterExpr("value > 2")

On Sun, Feb 6, 2022 at 5:51 AM <capitnfrak...@free.fr> wrote:

> for example, this work for RDD object:
>
> scala> val li = List(3,2,1,4,0)
> li: List[Int] = List(3, 2, 1, 4, 0)
>
> scala> val rdd = sc.parallelize(li)
> rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at
> parallelize at <console>:24
>
> scala> rdd.filter(_ > 2).collect()
> res0: Array[Int] = Array(3, 4)
>
>
> After I convert RDD to the dataframe, the filter won't work:
>
> scala> val df = rdd.toDF
> df: org.apache.spark.sql.DataFrame = [value: int]
>
> scala> df.filter(_ > 2).show()
> <console>:24: error: value > is not a member of org.apache.spark.sql.Row
>         df.filter(_ > 2).show()
>
>
> But this can work:
>
> scala> df.filter($"value" > 2).show()
> +-----+
> |value|
> +-----+
> |    3|
> |    4|
> +-----+
>
>
> Where to check all the methods supported by dataframe?
>
>
> Thank you.
> Frakass
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to