Dun Peal, 17.10.2010 21:59:
`all_ascii(L)` is a function that accepts a list of strings L, and
returns True if all of those strings contain only ASCII chars, False
otherwise.

What's the fastest way to implement `all_ascii(L)`?

My ideas so far are:

1. Match against a regexp with a character range: `[ -~]`
2. Use s.decode('ascii')
3. `return all(31<  ord(c)<  127 for s in L for c in s)`

Any other ideas?  Which one do you think will be fastest?

You can't beat Cython for this kind of task. If it's really a list of (unicode) strings, you can do this:

    def only_allowed_characters(list strings):
        cdef unicode s
        for s in strings:
            for c in s:
                if c < 31 or c > 127:
                    return False
        return True

Or, a bit shorter, using Cython 0.13:

    def only_allowed_characters(list strings):
        cdef unicode s
        return any((c < 31 or c > 127)
                   for s in strings for c in s)

Both are untested. Basically the same should work for byte strings. You can also support both string types efficiently with an isinstance() type test inside of the outer loop.

Also see here:

http://behnel.de/cgi-bin/weblog_basic/index.php?p=49
http://docs.cython.org/src/tutorial/strings.html

Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to