Re: Newbie: Check first two non-whitespace characters
otaksoftspamt...@gmail.com writes: > I need to check a string over which I have no control for the first 2 > non-white space characters (which should be '[{'). > > The string would ideally be: '[{...' but could also be something like > ' [ { '. > > Best to use re and how? Something else? No comment on whether re is good for your use case but another comment on how. First, some test data: >>> data = '\r\n {\r\n\t[ "etc" ]}\n\n\n') Then the actual comment - there's a special regex type, \S, to match a non-whitespace character, and a method to produce matches on demand: >>> black = re.compile(r'\S') >>> matches = re.finditer(black, data) Then the demonstration. This accesses the first, then second, then third match: >>> empty = re.match('', '') >>> next(matches, empty).group() '{' >>> next(matches, empty).group() '[' >>> next(matches, empty).group() '"' The empty match object provides an appropriate .group() when there is no first or second (and so on) non-whitespace character in the data: >>> matches = re.finditer(black, '\r\t\n') >>> next(matches, empty).group() '' >>> next(matches, empty).group() '' -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: Check first two non-whitespace characters
On 01/01/2016 00:25, Mark Lawrence wrote: On 31/12/2015 18:54, Karim wrote: On 31/12/2015 19:18, otaksoftspamt...@gmail.com wrote: I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{'). The string would ideally be: '[{...' but could also be something like ' [ { '. Best to use re and how? Something else? Use pyparsing it is straight forward: >>> from pyparsing import Suppress, restOfLine >>> mystring = Suppress('[') + Suppress('{') + restOfLine >>> result = mystring.parse(' [ { I am learning pyparsing' ) >>> print result.asList() [' I am learning pyparsing'] You'll get your string inside the list. Hope this help see pyparsing doc for in depth study. Karim Congratulations for writing up one of the most overengineered pile of cobblers I've ever seen. You welcome ! The intent was to make a simple introduction to pyparsing which is a powerful tool for more complex parser build. Karim -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: Check first two non-whitespace characters
I would personally use re here. test_string = ' [{blah blah blah' matches = re.findall(r'[^\s]', t) result = ''.join(matches)[:2] >> '[{' On Thu, Dec 31, 2015 at 10:18 AM,wrote: > I need to check a string over which I have no control for the first 2 > non-white space characters (which should be '[{'). > > The string would ideally be: '[{...' but could also be something like > ' [ { '. > > Best to use re and how? Something else? > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: Check first two non-whitespace characters
On 2015-12-31 18:18, otaksoftspamt...@gmail.com wrote: I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{'). The string would ideally be: '[{...' but could also be something like ' [ { '. Best to use re and how? Something else? I would use .split and then ''.join: >>> ''.join(' [ { '.split()) '[{' It might be faster if you provide a maximum for the number of splits: >>> ''.join(' [ { '.split(None, 1)) '[{ ' -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: Check first two non-whitespace characters
On 31/12/2015 19:54, Karim wrote: On 31/12/2015 19:18, otaksoftspamt...@gmail.com wrote: I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{'). The string would ideally be: '[{...' but could also be something like ' [ { '. Best to use re and how? Something else? Use pyparsing it is straight forward: >>> from pyparsing import Suppress, restOfLine >>> mystring = Suppress('[') + Suppress('{') + restOfLine >>> result = mystring.parse(' [ { I am learning pyparsing' ) >>> print result.asList() [' I am learning pyparsing'] You'll get your string inside the list. Hope this help see pyparsing doc for in depth study. Karim Sorry the method to parse a string is parseString not parse, please replace by this line: >>> result = mystring.parseString(' [ { I am learning pyparsing' ) Regards -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: Check first two non-whitespace characters
On 31/12/2015 19:18, otaksoftspamt...@gmail.com wrote: I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{'). The string would ideally be: '[{...' but could also be something like ' [ { '. Best to use re and how? Something else? Use pyparsing it is straight forward: >>> from pyparsing import Suppress, restOfLine >>> mystring = Suppress('[') + Suppress('{') + restOfLine >>> result = mystring.parse(' [ { I am learning pyparsing' ) >>> print result.asList() [' I am learning pyparsing'] You'll get your string inside the list. Hope this help see pyparsing doc for in depth study. Karim -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: Check first two non-whitespace characters
Thanks much to both of you! On Thursday, December 31, 2015 at 11:05:26 AM UTC-8, Karim wrote: > On 31/12/2015 19:54, Karim wrote: > > > > > > On 31/12/2015 19:18, snailp...@gmail.com wrote: > >> I need to check a string over which I have no control for the first 2 > >> non-white space characters (which should be '[{'). > >> > >> The string would ideally be: '[{...' but could also be something like > >> ' [ { '. > >> > >> Best to use re and how? Something else? > > > > Use pyparsing it is straight forward: > > > > >>> from pyparsing import Suppress, restOfLine > > > > >>> mystring = Suppress('[') + Suppress('{') + restOfLine > > > > >>> result = mystring.parse(' [ { I am learning pyparsing' ) > > > > >>> print result.asList() > > > > [' I am learning pyparsing'] > > > > You'll get your string inside the list. > > > > Hope this help see pyparsing doc for in depth study. > > > > Karim > > Sorry the method to parse a string is parseString not parse, please > replace by this line: > > >>> result = mystring.parseString(' [ { I am learning pyparsing' ) > > Regards -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: Check first two non-whitespace characters
On Fri, 1 Jan 2016 05:18 am, otaksoftspamt...@gmail.com wrote: > I need to check a string over which I have no control for the first 2 > non-white space characters (which should be '[{'). > > The string would ideally be: '[{...' but could also be something like > ' [ { '. > > Best to use re and how? Something else? This should work, and be very fast, for moderately-sized strings: def starts_with_brackets(the_string): the_string = the_string.replace(" ", "") return the_string.startswith("[}") It might be a bit slow for huge strings (tens of millions of characters), but for short strings it will be fine. Alternatively, use a regex: import re regex = re.compile(r' *\[ *\{') if regex.match(the_string): print("string starts with [{ as expected") else: raise ValueError("invalid string") This will probably be slower for small strings, but faster for HUGE strings (tens of millions of characters). But I expect it will be fast enough. It is simple enough to skip tabs as well as spaces. Easiest way is to match on any whitespace: regex = re.compile(r'\w*\[\w*\{') -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: Check first two non-whitespace characters
otaksoftspamt...@gmail.com writes: > I need to check a string over which I have no control for the first 2 > non-white space characters (which should be '[{'). > > The string would ideally be: '[{...' but could also be something like > ' [ { '. > > Best to use re and how? Something else? Is it an arbitrary string, or is it a JSON object consisting of a list whose first element is a dictionary? Because if you're planning on reading it as a JSON object later you could just validate the types after you've parsed it. -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: Check first two non-whitespace characters
On 31/12/2015 18:54, Karim wrote: On 31/12/2015 19:18, otaksoftspamt...@gmail.com wrote: I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{'). The string would ideally be: '[{...' but could also be something like ' [ { '. Best to use re and how? Something else? Use pyparsing it is straight forward: >>> from pyparsing import Suppress, restOfLine >>> mystring = Suppress('[') + Suppress('{') + restOfLine >>> result = mystring.parse(' [ { I am learning pyparsing' ) >>> print result.asList() [' I am learning pyparsing'] You'll get your string inside the list. Hope this help see pyparsing doc for in depth study. Karim Congratulations for writing up one of the most overengineered pile of cobblers I've ever seen. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: Check first two non-whitespace characters
On Fri, 1 Jan 2016 10:25 am, Mark Lawrence wrote: > Congratulations for writing up one of the most overengineered pile of > cobblers I've ever seen. You should get out more. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: Check first two non-whitespace characters
On Thu, 31 Dec 2015 10:18:52 -0800, otaksoftspamtrap wrote: > Best to use re and how? Something else? Split the string on the space character and check the first two non blank elements of the resulting list? Maybe something similar to the following: if [x for x in s.split(' ') if x != ''][0:3] == ['(', '(', '(']: # string starts '(((' -- Denis McMahon, denismfmcma...@gmail.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: Check first two non-whitespace characters
On 31Dec2015 18:38, MRABwrote: On 2015-12-31 18:18, otaksoftspamt...@gmail.com wrote: I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{'). The string would ideally be: '[{...' but could also be something like ' [ { '. Best to use re and how? Something else? I would use .split and then ''.join: ''.join(' [ { '.split()) '[{' This presumes it is ok to drop/mangle/lose the whitespace elsewhere in the string. If it contains quoted text I'd expect that to be very bad. It might be faster if you provide a maximum for the number of splits: ''.join(' [ { '.split(None, 1)) '[{ ' Not to mention safer. I would use lstrip and startswith: s = lstrip(s) if s.startswith('['): s = s[1:].lstrip() if s.startswith('{'): ... deal with s[1:] here ... It is wordier, but far more basic and direct. Cheers, Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list