John Nagle wrote:
The parser at PyParsing:http://pyparsing.wikispaces.com/file/view/streetAddressParser.py ..Bad cases... 487 E. Middlefield Rd. -> streetnumber = 487, streetname = E. MIDDLEFIELD 487 East Middlefield Road -> streetnumber = 487, streetname = EAST MIDDLEFIELD 226 West Wayne Street -> streetnumber = 226, streetname = WEST WAYNE New Orchard Road -> streetnumber = , streetname = NEW 1 New Orchard Road -> streetnumber = 1 , streetname = NEW 390 Park Avenue -> streetnumber =, streetname = 390
Here's a system that gets all the above cases right: the USC Deterministic Address Parser. https://webgis.usc.edu/Services/AddressNormalization/Interactive/DeterministicNormalization.aspx This will parse a street address line alone, without a city, state, or ZIP code, so it's not using a big database. There's a technical paper http://gislab.usc.edu/i/publications/gislabtr11.pdf but it doesn't have that much detail. However, now we know a solution exists. I've asked USC if they'll make the code available. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
