Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-11-01 Thread Steven D'Aprano
On Mon, 31 Oct 2011 20:44:45 -0400, Terry Reedy wrote: [...] def is_ascii_text(text): for c in text: if c not in LEGAL: return False return True If text is 3.x bytes, this does not work ;-). OP did not specify bytes or unicode or Python version. The OP

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-11-01 Thread Steven D'Aprano
On Mon, 31 Oct 2011 22:12:26 -0400, Dave Angel wrote: I would claim that a well-written (in C) translate function, without using the delete option, should be much quicker than any python loop, even if it does copy the data. I think you are selling short the speed of the Python interpreter.

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-11-01 Thread Peter Otten
Steven D'Aprano wrote: On Mon, 31 Oct 2011 22:12:26 -0400, Dave Angel wrote: I would claim that a well-written (in C) translate function, without using the delete option, should be much quicker than any python loop, even if it does copy the data. I think you are selling short the speed

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-11-01 Thread Duncan Booth
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f' MASK = ''.join('\01' if chr(n) in LEGAL else '\0' for n in range(128)) # Untested def is_ascii_text(text): for c in text: n = ord(c) if n =

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-11-01 Thread Ian Kelly
On Mon, Oct 31, 2011 at 6:32 PM, Patrick Maupin pmau...@gmail.com wrote: On Oct 31, 5:52 pm, Ian Kelly ian.g.ke...@gmail.com wrote:  For instance, split() will split on vertical tab, which is not one of the characters the OP wanted. That's just the default behavior.  You can explicitly specify

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-11-01 Thread MRAB
On 01/11/2011 18:54, Duncan Booth wrote: Steven D'Apranosteve+comp.lang.pyt...@pearwood.info wrote: LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f' MASK = ''.join('\01' if chr(n) in LEGAL else '\0' for n in range(128)) # Untested def is_ascii_text(text): for c in text:

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-11-01 Thread Stefan Behnel
pyt...@bdurham.com, 31.10.2011 20:54: Wondering if there's a fast/efficient built-in way to determine if a string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or Tab? I know I can look at the chars of a string individually and compare them against a set of legal chars using

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-11-01 Thread Duncan Booth
MRAB pyt...@mrabarnett.plus.com wrote: On 01/11/2011 18:54, Duncan Booth wrote: Steven D'Apranosteve+comp.lang.pyt...@pearwood.info wrote: LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f' MASK = ''.join('\01' if chr(n) in LEGAL else '\0' for n in range (128)) # Untested def

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-11-01 Thread Terry Reedy
On 11/1/2011 2:56 AM, Steven D'Aprano wrote: On Mon, 31 Oct 2011 20:44:45 -0400, Terry Reedy wrote: [...] def is_ascii_text(text): for c in text: if c not in LEGAL: return False return True If text is 3.x bytes, this does not work ;-). OP did not specify

Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-10-31 Thread python
Wondering if there's a fast/efficient built-in way to determine if a string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or Tab? I know I can look at the chars of a string individually and compare them against a set of legal chars using standard Python code (and this works fine),

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-10-31 Thread Dave Angel
On 10/31/2011 03:54 PM, pyt...@bdurham.com wrote: Wondering if there's a fast/efficient built-in way to determine if a string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or Tab? I know I can look at the chars of a string individually and compare them against a set of legal chars

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-10-31 Thread Dave Angel
On 10/31/2011 05:47 PM, Dave Angel wrote: On 10/31/2011 03:54 PM, pyt...@bdurham.com wrote: Wondering if there's a fast/efficient built-in way to determine if a string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or Tab? I know I can look at the chars of a string individually

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-10-31 Thread Ian Kelly
On Mon, Oct 31, 2011 at 4:08 PM, Dave Angel d...@davea.name wrote: I was wrong once again.  But a simple combination of  translate() and split() methods might do it.  Here I'm suggesting that the table replace all valid characters with space, so the split() can use its default behavior. That

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-10-31 Thread Steven D'Aprano
On Mon, 31 Oct 2011 17:47:06 -0400, Dave Angel wrote: On 10/31/2011 03:54 PM, pyt...@bdurham.com wrote: Wondering if there's a fast/efficient built-in way to determine if a string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or Tab? I know I can look at the chars of a string

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-10-31 Thread Terry Reedy
On 10/31/2011 3:54 PM, pyt...@bdurham.com wrote: Wondering if there's a fast/efficient built-in way to determine if a string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or Tab? I presume you also want to disallow the other ascii control chars? I know I can look at the chars

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-10-31 Thread Tim Chase
On 10/31/11 18:02, Steven D'Aprano wrote: # Define legal characters: LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f' # everybody forgets about formfeed... \f # and are you sure you want to include chr(127) as a text char? def is_ascii_text(text): for c in text:

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-10-31 Thread Patrick Maupin
On Mon, Oct 31, 2011 at 4:08 PM, Dave Angel d...@davea.name wrote: Yes. Actually, you don't even need the split() -- you can pass an optional deletechars parameter to translate(). On Oct 31, 5:52 pm, Ian Kelly ian.g.ke...@gmail.com wrote: That sounds overly complicated and error-prone. Not

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-10-31 Thread Terry Reedy
On 10/31/2011 7:02 PM, Steven D'Aprano wrote: On Mon, 31 Oct 2011 17:47:06 -0400, Dave Angel wrote: On 10/31/2011 03:54 PM, pyt...@bdurham.com wrote: Wondering if there's a fast/efficient built-in way to determine if a string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or Tab?

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-10-31 Thread Dave Angel
On 10/31/2011 08:32 PM, Patrick Maupin wrote: On Mon, Oct 31, 2011 at 4:08 PM, Dave Angeld...@davea.name wrote: Yes. Actually, you don't even need the split() -- you can pass an optional deletechars parameter to translate(). On Oct 31, 5:52 pm, Ian Kellyian.g.ke...@gmail.com wrote: That

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

2011-10-31 Thread Patrick Maupin
On Oct 31, 9:12 pm, Dave Angel d...@davea.name wrote: I would claim that a well-written (in C) translate function, without using the delete option, should be much quicker than any python loop, even if it does copy the data. Are you arguing with me? I was agreeing with you, I thought, that