Re: [PySpark] Join using condition where each record may be joined multiple times

2022-11-28 Thread Oliver Ruebenacker
Hello, Thanks, I can do that. What I was hoping to hear is whether what I'm trying to do is even considered possible, and what would be the correct 'how' parameter? Best, Oliver On Sun, Nov 27, 2022 at 2:50 PM Artemis User wrote: > What if you just do a join with the first conditio

Re: [PySpark] Join using condition where each record may be joined multiple times

2022-11-27 Thread Artemis User
What if you just do a join with the first condition (equal chromosome) and append a select with the rest of the conditions after join?  This will allow you to test your query step by step, maybe with a visual inspection to figure out what the problem is. It may be a data quality problem as well