Multiple columns using 'isin' command in pyspark

Shuporno Choudhury Thu, 29 Mar 2018 08:53:10 -0700

Hi Spark Users,

I am trying to achieve the 'IN' functionality of SQL using the isin
function in pyspark
Eg:     select count(*) from tableA
          where (col1, col2) in ((1, 100),(2, 200), (3,300))


We can very well have 1 column isin statements like:
    df.filter(df[0].isin(1,2,3)).count()

But, can I multiple columns in that statement like:
    df.filter((df[0],df[1]).isin((1,100),(2,200),(3,300)).count()

Is this possible to achieve?
Or do I have to create multiple isin statements, merge them using '&'
condition and then execute the statemnt to get the final result?

Any help would be really appreciated.

-- 
Thanks,
Shuporno Choudhury

Multiple columns using 'isin' command in pyspark

Reply via email to