On Mon, Sep 21, 2009 at 5:59 PM, Rich Shepard <[email protected]>wrote:
> I'm sure it will help. Some strings are single words (such as the city > name), others are multiple words (such as the facility description). > Probably only the first word should be capitalized, not all of them. But, > tomorrow I'll look again at the data and see what I really want. > > This task belongs in the category of "things that seem like they should be really simple but aren't." You're attempting to discover information in data that doesn't contain it. Unless you have very narrow and specific data set, you may not be able to do much better than applying a simple and consistent *format* such as .upper() to all data. Consider the following lines that you might find in a "street address" line: - Dept. of Motor Vehicles - Attn: Guido van Rossum - Attn: Henry Higgins III - San Francisco Chapter - PO Box 1234 - Mail Stop C5A - Attn: A/R It's going to be *really* difficult to develop rules to handle those examples correctly and that's even before you get to military addresses and ex-US addresses. The more sophisticated you attempt to be, the more glaring and difficult the exceptions will become. FWIW, Dylan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/portland/attachments/20090922/5fd432fa/attachment.htm> _______________________________________________ Portland mailing list [email protected] http://mail.python.org/mailman/listinfo/portland
