I think you want array_contains: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.array_contains.html
On Tue, Jan 17, 2023 at 4:18 PM Oliver Ruebenacker < oliv...@broadinstitute.org> wrote: > > Hello, > > I have data originally stored as JSON. Column gene contains a string, > column nearest an array of strings. How can I check whether the value of > gene is an element of the array of nearest? > > I tried: genes_joined.gene.isin(genes_joined.nearest) > > But I get an error that says: > > pyspark.sql.utils.AnalysisException: cannot resolve '(gene IN (nearest))' > due to data type mismatch: Arguments must be same type but were: string != > array<string>; > > How do I do this? Thanks! > > Best, Oliver > > -- > Oliver Ruebenacker, Ph.D. (he) > Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, > Flannick > Lab <http://www.flannicklab.org/>, Broad Institute > <http://www.broadinstitute.org/> >