Awesome, thanks, this was exactly what I needed! On Tue, Jan 17, 2023 at 5:23 PM Sean Owen <sro...@gmail.com> wrote:
> I think you want array_contains: > > https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.array_contains.html > > On Tue, Jan 17, 2023 at 4:18 PM Oliver Ruebenacker < > oliv...@broadinstitute.org> wrote: > >> >> Hello, >> >> I have data originally stored as JSON. Column gene contains a string, >> column nearest an array of strings. How can I check whether the value of >> gene is an element of the array of nearest? >> >> I tried: genes_joined.gene.isin(genes_joined.nearest) >> >> But I get an error that says: >> >> pyspark.sql.utils.AnalysisException: cannot resolve '(gene IN (nearest))' >> due to data type mismatch: Arguments must be same type but were: string != >> array<string>; >> >> How do I do this? Thanks! >> >> Best, Oliver >> >> -- >> Oliver Ruebenacker, Ph.D. (he) >> Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, >> Flannick >> Lab <http://www.flannicklab.org/>, Broad Institute >> <http://www.broadinstitute.org/> >> > -- Oliver Ruebenacker, Ph.D. (he) Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, Flannick Lab <http://www.flannicklab.org/>, Broad Institute <http://www.broadinstitute.org/>