Here is the complete error:

```
Traceback (most recent call last):
  File "nearest-gene.py", line 74, in <module>
    main()
  File "nearest-gene.py", line 62, in main
    distances = joined.withColumn("distance", max(col("start") -
col("position"), col("position") - col("end"), 0))
  File
"/mnt/yarn/usercache/hadoop/appcache/application_1677167576690_0001/container_1677167576690_0001_01_000001/pyspark.zip/pyspark/sql/column.py",
line 907, in __nonzero__
ValueError: Cannot convert column into bool: please use '&' for 'and', '|'
for 'or', '~' for 'not' when building DataFrame boolean expressions.
```

On Thu, Feb 23, 2023 at 2:00 PM Sean Owen <sro...@gmail.com> wrote:

> That error sounds like it's from pandas not spark. Are you sure it's this
> line?
>
> On Thu, Feb 23, 2023, 12:57 PM Oliver Ruebenacker <
> oliv...@broadinstitute.org> wrote:
>
>>
>>      Hello,
>>
>>   I'm trying to calculate the distance between a gene (with start and
>> end) and a variant (with position), so I joined gene and variant data by
>> chromosome and then tried to calculate the distance like this:
>>
>> ```
>> distances = joined.withColumn("distance", max(col("start") -
>> col("position"), col("position") - col("end"), 0))
>> ```
>>
>>   Basically, the distance is the maximum of three terms.
>>
>>   This line causes an obscure error:
>>
>> ```
>> ValueError: Cannot convert column into bool: please use '&' for 'and',
>> '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
>> ```
>>
>>   How can I do this? Thanks!
>>
>>      Best, Oliver
>>
>> --
>> Oliver Ruebenacker, Ph.D. (he)
>> Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, 
>> Flannick
>> Lab <http://www.flannicklab.org/>, Broad Institute
>> <http://www.broadinstitute.org/>
>>
>

-- 
Oliver Ruebenacker, Ph.D. (he)
Senior Software Engineer, Knowledge Portal Network
<http://kp4cd.org/>, Flannick
Lab <http://www.flannicklab.org/>, Broad Institute
<http://www.broadinstitute.org/>

Reply via email to