Hi,


Trying to filter a dataframe with multiple conditions using OR "||" as below



  val rejectedDF = newDF.withColumn("target_mobile_no",
col("target_mobile_no").cast(StringType)).

                   filter(length(col("target_mobile_no")) !== 10 ||
substring(col("target_mobile_no"),1,1) !== "7")



This throws this error



res12: org.apache.spark.sql.DataFrame = []

<console>:49: error: value || is not a member of Int

                          filter(length(col("target_mobile_no")) !== 10 ||
substring(col("target_mobile_no"),1,1) !== "7")



Try another way



val rejectedDF = newDF.withColumn("target_mobile_no",
col("target_mobile_no").cast(StringType)).

                   filter(length(col("target_mobile_no")) !=== 10 ||
substring(col("target_mobile_no"),1,1) !=== "7")

  rejectedDF.createOrReplaceTempView("tmp")



Tried few options but I am still getting this error



<console>:49: error: value !=== is not a member of
org.apache.spark.sql.Column

                          filter(length(col("target_mobile_no")) !=== 10 ||
substring(col("target_mobile_no"),1,1) !=== "7")

                                                                 ^

<console>:49: error: value || is not a member of Int

                          filter(length(col("target_mobile_no")) !=== 10 ||
substring(col("target_mobile_no"),1,1) !=== "7")



I can create a dataframe for each filter but that does not look efficient
to me?



Thanks



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to