Hi Shawn, Could we do this as below?
for any of true scala> val df = spark.range(10).selectExpr("id as a", "id / 2 as b") df: org.apache.spark.sql.DataFrame = [a: bigint, b: double] scala> df.filter(_.toSeq.exists(v => v == 1)).show() +---+---+ | a| b| +---+---+ | 1|0.5| | 2|1.0| +---+---+ or for all of true scala> val df = spark.range(10).selectExpr("id as a", "id / 2 as b") df: org.apache.spark.sql.DataFrame = [a: bigint, b: double] scala> df.filter(_.toSeq.forall(v => v == 0)).show() +---+---+ | a| b| +---+---+ | 0|0.0| +---+---+ 2017-01-17 7:27 GMT+09:00 Shawn Wan <shawn...@gmail.com>: > I need to filter out outliers from a dataframe by all columns. I can > manually list all columns like: > > df.filter(x=>math.abs(x.get(0).toString().toDouble-means(0))<=3*stddevs(0 > )) > > .filter(x=>math.abs(x.get(1).toString().toDouble-means(1))<=3*stddevs( > 1)) > > ... > > But I want to turn it into a general function which can handle variable > number of columns. How could I do that? Thanks in advance! > > > Regards, > > Shawn > > ------------------------------ > View this message in context: filter rows by all columns > <http://apache-spark-user-list.1001560.n3.nabble.com/filter-rows-by-all-columns-tp28309.html> > Sent from the Apache Spark User List mailing list archive > <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >