1. Given following CSV file > $cat data.csv > > ID,City,Zip,Flag > 1,A,95126,0 > 2,A,95126,1 > 3,A,95126,1 > 4,B,95124,0 > 5,B,95124,1 > 6,C,95124,0 > 7,C,95127,1 > 8,C,95127,0 > 9,C,95127,1
(a) where "ID" above is a primary key (unique), (b) for each "City" and "Zip" combination, there is one ID in max with Flag=0; while it can contain multiple IDs with Flag=1 for each "City" and "Zip" combination. (c) Flag can be 0 or 1 2. For each ID with Flag=0, we want to pair it with another ID with Flag=1 but with the same City - Zip. If one cannot find another paired ID with Flag=1 and matched City - Zip, we just delete that record. Here is the expected result: > ID,City,Zip,Flag > 1,A,95126,0 > 2,A,95126,1 > 4,B,95124,0 > 5,B,95124,1 > 7,C,95127,1 > 8,C,95127,0 Any valuable tips how to do this pairing in Python or Scala? Great thanks! Rex