Here is the complete error: ``` Traceback (most recent call last): File "nearest-gene.py", line 74, in <module> main() File "nearest-gene.py", line 62, in main distances = joined.withColumn("distance", max(col("start") - col("position"), col("position") - col("end"), 0)) File "/mnt/yarn/usercache/hadoop/appcache/application_1677167576690_0001/container_1677167576690_0001_01_000001/pyspark.zip/pyspark/sql/column.py", line 907, in __nonzero__ ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions. ```
On Thu, Feb 23, 2023 at 2:00 PM Sean Owen <sro...@gmail.com> wrote: > That error sounds like it's from pandas not spark. Are you sure it's this > line? > > On Thu, Feb 23, 2023, 12:57 PM Oliver Ruebenacker < > oliv...@broadinstitute.org> wrote: > >> >> Hello, >> >> I'm trying to calculate the distance between a gene (with start and >> end) and a variant (with position), so I joined gene and variant data by >> chromosome and then tried to calculate the distance like this: >> >> ``` >> distances = joined.withColumn("distance", max(col("start") - >> col("position"), col("position") - col("end"), 0)) >> ``` >> >> Basically, the distance is the maximum of three terms. >> >> This line causes an obscure error: >> >> ``` >> ValueError: Cannot convert column into bool: please use '&' for 'and', >> '|' for 'or', '~' for 'not' when building DataFrame boolean expressions. >> ``` >> >> How can I do this? Thanks! >> >> Best, Oliver >> >> -- >> Oliver Ruebenacker, Ph.D. (he) >> Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, >> Flannick >> Lab <http://www.flannicklab.org/>, Broad Institute >> <http://www.broadinstitute.org/> >> > -- Oliver Ruebenacker, Ph.D. (he) Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, Flannick Lab <http://www.flannicklab.org/>, Broad Institute <http://www.broadinstitute.org/>