Why 3 and 9 should be deleted? 3 can be paired with 1and 9 can be paired with 8. On 26 Aug 2016 11:00, "Rex X" <dnsr...@gmail.com> wrote:
> 1. Given following CSV file > > > $cat data.csv > > > > ID,City,Zip,Flag > > 1,A,95126,0 > > 2,A,95126,1 > > 3,A,95126,1 > > 4,B,95124,0 > > 5,B,95124,1 > > 6,C,95124,0 > > 7,C,95127,1 > > 8,C,95127,0 > > 9,C,95127,1 > > > (a) where "ID" above is a primary key (unique), > > (b) for each "City" and "Zip" combination, there is one ID in max with > Flag=0; while it can contain multiple IDs with Flag=1 for each "City" and > "Zip" combination. > > (c) Flag can be 0 or 1 > > > 2. For each ID with Flag=0, we want to pair it with another ID with > Flag=1 but with the same City - Zip. If one cannot find another paired ID > with Flag=1 and matched City - Zip, we just delete that record. > > Here is the expected result: > > > ID,City,Zip,Flag > > 1,A,95126,0 > > 2,A,95126,1 > > 4,B,95124,0 > > 5,B,95124,1 > > 7,C,95127,1 > > 8,C,95127,0 > > > Any valuable tips how to do this pairing in Python or Scala? > > Great thanks! > > Rex >