Re: Newbie: Check first two non-whitespace characters

2016-01-01 Thread Jussi Piitulainen
otaksoftspamt...@gmail.com writes:

> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like 
> '  [  {  '.
>
> Best to use re and how? Something else?

No comment on whether re is good for your use case but another comment
on how. First, some test data:

  >>> data = '\r\n  {\r\n\t[ "etc" ]}\n\n\n')

Then the actual comment - there's a special regex type, \S, to match a
non-whitespace character, and a method to produce matches on demand:

  >>> black = re.compile(r'\S')
  >>> matches = re.finditer(black, data)

Then the demonstration. This accesses the first, then second, then third
match:

  >>> empty = re.match('', '')
  >>> next(matches, empty).group()
  '{'
  >>> next(matches, empty).group()
  '['
  >>> next(matches, empty).group()
  '"'

The empty match object provides an appropriate .group() when there is no
first or second (and so on) non-whitespace character in the data:

  >>> matches = re.finditer(black, '\r\t\n')
  >>> next(matches, empty).group()
  ''
  >>> next(matches, empty).group()
  ''
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: Check first two non-whitespace characters

2016-01-01 Thread Karim



On 01/01/2016 00:25, Mark Lawrence wrote:

On 31/12/2015 18:54, Karim wrote:



On 31/12/2015 19:18, otaksoftspamt...@gmail.com wrote:

I need to check a string over which I have no control for the first 2
non-white space characters (which should be '[{').

The string would ideally be: '[{...' but could also be something like
'  [  {  '.

Best to use re and how? Something else?


Use pyparsing it is straight forward:

 >>> from pyparsing import Suppress, restOfLine

 >>> mystring = Suppress('[') + Suppress('{') + restOfLine

 >>> result = mystring.parse(' [ {  I am learning pyparsing' )

 >>> print result.asList()

[' I am learning pyparsing']

You'll get your string inside the list.

Hope this help see pyparsing doc for in depth study.

Karim


Congratulations for writing up one of the most overengineered pile of 
cobblers I've ever seen.




You welcome !

The intent was to make a simple introduction to pyparsing which is a 
powerful tool for more complex parser build.


Karim
--
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: Check first two non-whitespace characters

2015-12-31 Thread Cory Madden
I would personally use re here.

test_string = '  [{blah blah blah'
matches = re.findall(r'[^\s]', t)
result = ''.join(matches)[:2]
>> '[{'

On Thu, Dec 31, 2015 at 10:18 AM,   wrote:
> I need to check a string over which I have no control for the first 2 
> non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> '  [  {  '.
>
> Best to use re and how? Something else?
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: Check first two non-whitespace characters

2015-12-31 Thread MRAB

On 2015-12-31 18:18, otaksoftspamt...@gmail.com wrote:

I need to check a string over which I have no control for the first 2 non-white 
space characters (which should be '[{').

The string would ideally be: '[{...' but could also be something like
'  [  {  '.

Best to use re and how? Something else?


I would use .split and then ''.join:

>>> ''.join(' [ { '.split())
'[{'

It might be faster if you provide a maximum for the number of splits:

>>> ''.join(' [ { '.split(None, 1))
'[{ '

--
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: Check first two non-whitespace characters

2015-12-31 Thread Karim



On 31/12/2015 19:54, Karim wrote:



On 31/12/2015 19:18, otaksoftspamt...@gmail.com wrote:
I need to check a string over which I have no control for the first 2 
non-white space characters (which should be '[{').


The string would ideally be: '[{...' but could also be something like
'  [  {  '.

Best to use re and how? Something else?


Use pyparsing it is straight forward:

>>> from pyparsing import Suppress, restOfLine

>>> mystring = Suppress('[') + Suppress('{') + restOfLine

>>> result = mystring.parse(' [ {  I am learning pyparsing' )

>>> print result.asList()

[' I am learning pyparsing']

You'll get your string inside the list.

Hope this help see pyparsing doc for in depth study.

Karim


Sorry the method to parse a string is parseString not parse, please 
replace by this line:


>>> result = mystring.parseString(' [ {  I am learning pyparsing' )

Regards
--
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: Check first two non-whitespace characters

2015-12-31 Thread Karim



On 31/12/2015 19:18, otaksoftspamt...@gmail.com wrote:

I need to check a string over which I have no control for the first 2 non-white 
space characters (which should be '[{').

The string would ideally be: '[{...' but could also be something like
'  [  {  '.

Best to use re and how? Something else?


Use pyparsing it is straight forward:

>>> from pyparsing import Suppress, restOfLine

>>> mystring = Suppress('[') + Suppress('{') + restOfLine

>>> result = mystring.parse(' [ {  I am learning pyparsing' )

>>> print result.asList()

[' I am learning pyparsing']

You'll get your string inside the list.

Hope this help see pyparsing doc for in depth study.

Karim
--
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: Check first two non-whitespace characters

2015-12-31 Thread cassius . fechter
Thanks much to both of you!


On Thursday, December 31, 2015 at 11:05:26 AM UTC-8, Karim wrote:
> On 31/12/2015 19:54, Karim wrote:
> >
> >
> > On 31/12/2015 19:18, snailp...@gmail.com wrote:
> >> I need to check a string over which I have no control for the first 2 
> >> non-white space characters (which should be '[{').
> >>
> >> The string would ideally be: '[{...' but could also be something like
> >> '  [  {  '.
> >>
> >> Best to use re and how? Something else?
> >
> > Use pyparsing it is straight forward:
> >
> > >>> from pyparsing import Suppress, restOfLine
> >
> > >>> mystring = Suppress('[') + Suppress('{') + restOfLine
> >
> > >>> result = mystring.parse(' [ {  I am learning pyparsing' )
> >
> > >>> print result.asList()
> >
> > [' I am learning pyparsing']
> >
> > You'll get your string inside the list.
> >
> > Hope this help see pyparsing doc for in depth study.
> >
> > Karim
> 
> Sorry the method to parse a string is parseString not parse, please 
> replace by this line:
> 
>  >>> result = mystring.parseString(' [ {  I am learning pyparsing' )
> 
> Regards

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: Check first two non-whitespace characters

2015-12-31 Thread Steven D'Aprano
On Fri, 1 Jan 2016 05:18 am, otaksoftspamt...@gmail.com wrote:

> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
> 
> The string would ideally be: '[{...' but could also be something like
> '  [  {  '.
> 
> Best to use re and how? Something else?

This should work, and be very fast, for moderately-sized strings:


def starts_with_brackets(the_string):
the_string = the_string.replace(" ", "")
return the_string.startswith("[}")


It might be a bit slow for huge strings (tens of millions of characters),
but for short strings it will be fine.

Alternatively, use a regex:


import re
regex = re.compile(r' *\[ *\{')

if regex.match(the_string):
print("string starts with [{ as expected")
else:
raise ValueError("invalid string")


This will probably be slower for small strings, but faster for HUGE strings
(tens of millions of characters). But I expect it will be fast enough.

It is simple enough to skip tabs as well as spaces. Easiest way is to match
on any whitespace:

regex = re.compile(r'\w*\[\w*\{')




-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: Check first two non-whitespace characters

2015-12-31 Thread Random832
otaksoftspamt...@gmail.com writes:
> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like 
> '  [  {  '.
>
> Best to use re and how? Something else?

Is it an arbitrary string, or is it a JSON object consisting of a list
whose first element is a dictionary? Because if you're planning on
reading it as a JSON object later you could just validate the types
after you've parsed it.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: Check first two non-whitespace characters

2015-12-31 Thread Mark Lawrence

On 31/12/2015 18:54, Karim wrote:



On 31/12/2015 19:18, otaksoftspamt...@gmail.com wrote:

I need to check a string over which I have no control for the first 2
non-white space characters (which should be '[{').

The string would ideally be: '[{...' but could also be something like
'  [  {  '.

Best to use re and how? Something else?


Use pyparsing it is straight forward:

 >>> from pyparsing import Suppress, restOfLine

 >>> mystring = Suppress('[') + Suppress('{') + restOfLine

 >>> result = mystring.parse(' [ {  I am learning pyparsing' )

 >>> print result.asList()

[' I am learning pyparsing']

You'll get your string inside the list.

Hope this help see pyparsing doc for in depth study.

Karim


Congratulations for writing up one of the most overengineered pile of 
cobblers I've ever seen.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: Check first two non-whitespace characters

2015-12-31 Thread Steven D'Aprano
On Fri, 1 Jan 2016 10:25 am, Mark Lawrence wrote:

> Congratulations for writing up one of the most overengineered pile of
> cobblers I've ever seen.

You should get out more.


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: Check first two non-whitespace characters

2015-12-31 Thread Denis McMahon
On Thu, 31 Dec 2015 10:18:52 -0800, otaksoftspamtrap wrote:

> Best to use re and how? Something else?

Split the string on the space character and check the first two non blank 
elements of the resulting list?

Maybe something similar to the following:

if [x for x in s.split(' ') if x != ''][0:3] == ['(', '(', '(']:
# string starts '((('

-- 
Denis McMahon, denismfmcma...@gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: Check first two non-whitespace characters

2015-12-31 Thread Cameron Simpson

On 31Dec2015 18:38, MRAB  wrote:

On 2015-12-31 18:18, otaksoftspamt...@gmail.com wrote:

I need to check a string over which I have no control for the first 2 non-white 
space characters (which should be '[{').

The string would ideally be: '[{...' but could also be something like
'  [  {  '.

Best to use re and how? Something else?


I would use .split and then ''.join:


''.join(' [ { '.split())

'[{'


This presumes it is ok to drop/mangle/lose the whitespace elsewhere in the 
string. If it contains quoted text I'd expect that to be very bad.



It might be faster if you provide a maximum for the number of splits:

''.join(' [ { '.split(None, 1))

'[{ '


Not to mention safer.

I would use lstrip and startswith:

 s = lstrip(s)
 if s.startswith('['):
   s = s[1:].lstrip()
   if s.startswith('{'):
 ... deal with s[1:] here ...

It is wordier, but far more basic and direct.

Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list