On Apr 17, 2:23 pm, John Nagle <na...@animats.com> wrote: > Is there a usable street address parser available? There are some > bad ones out there, but nothing good that I've found other than commercial > products with large databases. I don't need 100% accuracy, but I'd like > to be able to extract street name and street number for at least 98% of > US mailing addresses. > > There's pyparsing, of course. There's a street address parser as an > example at "http://pyparsing.wikispaces.com/file/view/streetAddressParser.py". > It's not very good. It gets all of the following wrong: > > 1500 Deer Creek Lane (Parses "Creek" as a street type") > 186 Avenue A (NYC street) > 2081 N Webb Rd (Parses N Webb as a street name) > 2081 N. Webb Rd (Parses N as street name) > 1515 West 22nd Street (Parses "West" as name) > 2029 Stierlin Court (Street names starting with "St" misparse.) > > Some special cases that don't work, unsurprisingly. > P.O. Box 33170 > The Landmark @ One Market, Suite 200 > One Market, Suite 200 > One Market >
Please take a look at the updated form of this parser. It turns out there actually *were* some bugs in the old form, plus there was no provision for PO Boxes, avenues that start with "Avenue" instead of ending with them, or house numbers spelled out as words. The only one I consider a "special case" is the support for "Avenue X" instead of "X Avenue" - adding support for the rest was added in a fairly general way. With these bug fixes, I hope this improves your hit rate. (There are also some simple attempts at adding apt/suite numbers, and APO and AFP in addition to PO boxes - if not exactly what you need, the means to extend to support other options should be pretty straightforward.) -- Paul -- http://mail.python.org/mailman/listinfo/python-list