Re: Stripping ASCII codes when parsing

2005-10-17 Thread Steve Holden
David Pratt wrote: I am working with a text format that advises to strip any ascii control characters (0 - 30) as part of parsing data and also the ascii pipe character (124) from the data. I think many of these characters are from a different time. Since I have never seen most of these

Re: Stripping ASCII codes when parsing

2005-10-17 Thread David Pratt
Many thanks Steve. This is good information. I think this should work fine. I was doing a string.replace in a cleanData() method with the following characters but don't know if that would have done it. This contains all the control characters that I really know about in normal use. ord(c) 32

Re: Stripping ASCII codes when parsing

2005-10-17 Thread Steve Holden
David Pratt wrote: [about ord(), chr() and stripping control characters] Many thanks Steve. This is good information. I think this should work fine. I was doing a string.replace in a cleanData() method with the following characters but don't know if that would have done it. This contains

Re: Stripping ASCII codes when parsing

2005-10-17 Thread Tony Nelson
In article [EMAIL PROTECTED], David Pratt [EMAIL PROTECTED] wrote: I am working with a text format that advises to strip any ascii control characters (0 - 30) as part of parsing data and also the ascii pipe character (124) from the data. I think many of these characters are from a

Re: Stripping ASCII codes when parsing

2005-10-17 Thread David Pratt
Hi Steve. My plan is to parse the data removing the control characters and validate to data as records are being added to a dictionary. I am going to Unicode after this step but before it gets into storage (in which case I think the translate method could work well). The encoding itself is

Re: Stripping ASCII codes when parsing

2005-10-17 Thread David Pratt
This is very nice :-) Thank you Tony. I think this will be the way to go. My concern ATM is where it will be best to unicode. The data after this will go into dict and a few processes and into database. Because input source if not explicit encoding, I will have to assume ISO-8859-1 I

Re: Stripping ASCII codes when parsing

2005-10-17 Thread Erik Max Francis
David Pratt wrote: I am working with a text format that advises to strip any ascii control characters (0 - 30) as part of parsing data and also the ascii pipe character (124) from the data. I think many of these characters are from a different time. Since I have never seen most of these

Re: Stripping ASCII codes when parsing

2005-10-17 Thread Tony Nelson
In article [EMAIL PROTECTED], David Pratt [EMAIL PROTECTED] wrote: This is very nice :-) Thank you Tony. I think this will be the way to go. My concern ATM is where it will be best to unicode. The data after this will go into dict and a few processes and into database. Because input