Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-29 Thread Stefan Behnel
Dun Peal, 28.10.2010 09:10: I find myself surprised at the relatively little use that Cython is seeing. I don't think it's being used that little. It just doesn't show that easily. We get a lot of feedback on the mailing list that suggests that it's actually used by all sorts of people in

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-28 Thread Dun Peal
On Wed, Oct 20, 2010 at 6:52 AM, Stefan Behnel stefan...@behnel.de wrote: Well, the estimate is about one man-month, so it would be doable in about three months time if we had the money to work on it. So far, no one has made a serious offer to support that project, though. I find myself

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-19 Thread Dun Peal
On Mon, Oct 18, 2010 at 1:41 AM, Stefan Behnel stefan...@behnel.de wrote: Or, a bit shorter, using Cython 0.13:    def only_allowed_characters(list strings):        cdef unicode s        return any((c 31 or c 127)                   for s in strings for c in s) Very cool, this caused me to

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-19 Thread Stefan Behnel
Dun Peal, 20.10.2010 02:07: On Mon, Oct 18, 2010 at 1:41 AM, Stefan Behnel wrote: Or, a bit shorter, using Cython 0.13: def only_allowed_characters(list strings): cdef unicode s return any((c 31 or c 127) for s in strings for c in s) Very cool, this

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-18 Thread Stefan Behnel
Dun Peal, 17.10.2010 21:59: `all_ascii(L)` is a function that accepts a list of strings L, and returns True if all of those strings contain only ASCII chars, False otherwise. What's the fastest way to implement `all_ascii(L)`? My ideas so far are: 1. Match against a regexp with a character

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-18 Thread Grant Edwards
On 2010-10-18, Steven D'Aprano steve-remove-t...@cybersource.com.au wrote: Neither is accurate. all_ascii would be: all(ord(c) = 127 for c in string for string in L) Definitely. all_printable would be considerably harder. As far as I can tell, there's no simple way to tell if a character

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-18 Thread Tim Chase
On 10/18/10 09:28, Grant Edwards wrote: There's no easy way to even define what printable means. Ask three different people, and you'll get at least four different answers answers. I don't have a printer...that makes *all* characters unprintable, right? Now I can convert the algorithm to

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-18 Thread Felipe Bastos Nunes
Printable in the screen, all of them are, except for blank spaces ehhehehe 2010/10/18, Tim Chase python.l...@tim.thechases.com: On 10/18/10 09:28, Grant Edwards wrote: There's no easy way to even define what printable means. Ask three different people, and you'll get at least four different

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Seebs
On 2010-10-17, Dun Peal dunpea...@gmail.com wrote: What's the fastest way to implement `all_ascii(L)`? Start by defining it. 1. Match against a regexp with a character range: `[ -~]` What about tabs and newlines? For that matter, what about DEL and BEL? Seems to me that the entire 0-127

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Rhodri James
On Sun, 17 Oct 2010 20:59:22 +0100, Dun Peal dunpea...@gmail.com wrote: `all_ascii(L)` is a function that accepts a list of strings L, and returns True if all of those strings contain only ASCII chars, False otherwise. What's the fastest way to implement `all_ascii(L)`? My ideas so far are:

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Carl Banks
On Oct 17, 12:59 pm, Dun Peal dunpea...@gmail.com wrote: `all_ascii(L)` is a function that accepts a list of strings L, and returns True if all of those strings contain only ASCII chars, False otherwise. What's the fastest way to implement `all_ascii(L)`? My ideas so far are: 1. Match

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Tim Chase
On 10/17/10 19:04, Rhodri James wrote: import string return set(.join(L))= set(string.printable) I've no idea whether this is faster or slower than any of your suggestions. For set(.join(L)) to return, it has to scan the entire input list/string. Imagine s = UNPRINTABLE_CHAR +

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Steven D'Aprano
On Mon, 18 Oct 2010 01:04:09 +0100, Rhodri James wrote: On Sun, 17 Oct 2010 20:59:22 +0100, Dun Peal dunpea...@gmail.com wrote: `all_ascii(L)` is a function that accepts a list of strings L, and returns True if all of those strings contain only ASCII chars, False otherwise. What's the

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Albert Hopkins
On Sun, 2010-10-17 at 14:59 -0500, Dun Peal wrote: `all_ascii(L)` is a function that accepts a list of strings L, and returns True if all of those strings contain only ASCII chars, False otherwise. What's the fastest way to implement `all_ascii(L)`? My ideas so far are: 1. Match