I'm writing back because I thought I might have time to deal with this, but it develops that I actually won't. :)
Basically . . . the algorithm I described is a reasonable ground truth. It probably won't be very fast without NumPy or HW accel, and I can't really debug your implementation for you, but it should at least help you get started with what to look for. Good luck! Ian