Re: Stripping ASCII codes when parsing

Tony Nelson Mon, 17 Oct 2005 20:16:41 -0700

In article <[EMAIL PROTECTED]>,
 David Pratt <[EMAIL PROTECTED]> wrote:


> This is very nice :-)  Thank you Tony.  I think this will be the way to  
> go.  My concern ATM is where it will be best to unicode. The data after  
> this will go into dict and a few processes and into database. Because  
> input source if not explicit encoding, I will have to assume ISO-8859-1  
> I believe but could well be cp1252 for most part ( because it says no  
> ASCII (0-30) but alright ASCII chars 128-254) and because most are  
> Windows users.  Am thinking to unicode after stripping these characters  
> and validating text, then unicoding (utf-8) so it is unicode in dict.  
> Then when I perform these other processes it should be uniform and then  
> it will go into database as unicode.  I think this should be ok.

Definitely "".translate() then unicode().  See the docs for 
"".translate().  As far as charset, well, if you can't know in advance 
you'll want to have some way to configure it for when it's wrong.  Also, 
maybe 255 is not allowed and should be checked for?
________________________________________________________________________
TonyN.:'                        [EMAIL PROTECTED]
      '                                  <http://www.georgeanelson.com/>
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Stripping ASCII codes when parsing

Reply via email to