Re: Filtering on multiple columns in spark

2020-04-29 Thread Edgardo Szrajber
Maybe create a column with "lit" function for the variables you are comparing against.Bentzi Sent from Yahoo Mail on Android On Wed, Apr 29, 2020 at 18:40, Mich Talebzadeh wrote: The below line works   valc =

Re: Filtering on multiple columns in spark

2020-04-29 Thread Mich Talebzadeh
The below line works val c = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)). filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'") But not the following when the values are passed as parameters val rejectedDF =

Re: Filtering on multiple columns in spark

2020-04-29 Thread Mich Talebzadeh
OK how do you pass variables for 10 and '7' val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)). filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'") in above in Scala. Neither $ value below or lit()

Re: Filtering on multiple columns in spark

2020-04-29 Thread Mich Talebzadeh
Hi Zhang, Yes the SQL way worked fine val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)). filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'") Many thanks, Dr Mich Talebzadeh LinkedIn *

Re: Filtering on multiple columns in spark

2020-04-29 Thread ZHANG Wei
AFAICT, maybe Spark SQL built-in functions[1] can help as below: scala> df.show() ++---+ | age| name| ++---+ |null|Michael| | 30| Andy| | 19| Justin| ++---+ scala> df.filter("length(name) == 4 or substring(name, 1, 1) == 'J'").show() +---+--+ |age| name|

Re: Filtering on multiple columns in spark

2020-04-29 Thread Som Lima
>From your email the obvious seems to be that 10 is an Int because it is not surrounded in quotes "" 10 should be "10". Although I can't image a telephone number with only 10 because that is what you are trying to program. In *Scala*, you can check *if *two operands *are equal* ( == ) or *not*

Filtering on multiple columns in spark

2020-04-29 Thread Mich Talebzadeh
Hi, Trying to filter a dataframe with multiple conditions using OR "||" as below val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)). filter(length(col("target_mobile_no")) !== 10 || substring(col("target_mobile_no"),1,1) !==