Re: [Tutor] Re Module

2018-12-27 Thread Avi Gross
Asad,

After reading replies to you by Alan and Steven I want to ask you if you can
first tell us in normal words what the exact outline of the program does. If
you only want help on one small part, tell us  about that.

I was first fooled into thinking you wanted to show us how you solve the
majority of the entire problem, whatever it was so I wanted to hear things
like I show next.

An example would be to search two files for error matches of various kinds
and report if they contain any matches. Just report True versus False or
something.

Another goal might be to show the first match in some way then quit.

Another might be to do the same search in two files and report ALL the
matches in some format.

After being clear on the goal, you might specify the overall algorithm you
want to use. For example, do you process one file to completion and save
some results then process the other the same way then compare and produce
output? Or do you process both nearly simultaneously in one pass, or perhaps
multiple passes. Do you search for one error type at a time or all at once?
Can there be multiple errors on the same line of the same kind or different
ones? What does error even mean? Is it something like "Fail: 666" versus
"Warn: 42" or something where multiple errors share a part or ...

Once we have some idea of the goal, we could help you see if the approach
seems reasonable even before reading the code. And, when reading the code,
we might see if your implementation  seems to match the plan so perhaps we
can see where you diverge from it perhaps with a mistake.

If I just look at what you provided, you do some of what I asked. You are
not clear on what the two files contain other than they may have an error
that you can identify with a set of patterns. Can you tell us if you are
looking at one line at a time, assuming it is a text file? Your code shows
no evidence of a file at all. Your focus in what you share with us is mainly
on creating a list of compiled search patterns and applying it to one
uninitialized "st" and trying to figure out which one matched. 

You do not show any examples of the pattern but suggest something is
failing. For all we know one of your patterns just matched the presence of a
single common character or even was not formatted properly and failed to be
compiled.

My impression is you are not actually asking about the overall problem. Your
real question may be how to use a regular expression on a string and find
out what matched. If so, that would be the headline, not about two files.
And it may even be your entire approach could change. An example would be to
store your patterns as a text keyword in a dictionary with the value being
the compiled version so when you evaluate a line using the pattern, you know
which one you matched with. I am NOT saying this is a good solution or a
better one. I am asking you to think what you will need and what techniques
might make life easier in doing it.

So besides trying to alter some code based of the feedback, from others,
could you resubmit the question with a focus on what you are doing and what
exactly is not working that you want looked at. Specifics would be useful
including at least one pattern and a line of sample text that should be
matched by the pattern as an example and perhaps one that should not. And
any error messages are vital.

When you do, I am sure Steven and Alan and others might be able to zoom
right in and help you diagnose, if you don't figure it out by yourself first
by being able to see what your goal is and perhaps doing a little debugging.

-Original Message-
From: Tutor  On Behalf Of
Asad
Sent: Thursday, December 27, 2018 10:10 AM
To: tutor@python.org
Subject: [Tutor] Re Module

Hi All ,

  I trying find a solution for my script , I have two files :

file1 - I need a search a error say x if the error matches

Look for the same error x in other file 2

Here is the code :
I have 10 different patterns therefore I used list comprehension and
compiling the pattern so I loop over and find the exact pattern matching

re_comp1 = [re.compile(pattern) for pattern in str1]

for pat in re_comp1:
if pat.search(st,re.IGNORECASE):
x = pat.pattern
print x===> here it gives the expected output it correct
match
print type(x)



if re.search('x', line, re.IGNORECASE) is not None:  ===> Gives a wrong
match
  print line

Instead if I use :

if re.search(x, line, re.IGNORECASE) is not None: then no match occurs
  print line

Please advice where I going wrong or what can be done to make it better .

Thanks,


--
Asad Hasan
+91 9582111698
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http

Re: [Tutor] Re Module

2018-12-27 Thread Steven D'Aprano
On Thu, Dec 27, 2018 at 08:40:12PM +0530, Asad wrote:
> Hi All ,
> 
>   I trying find a solution for my script , I have two files :
> 
> file1 - I need a search a error say x if the error matches
> 
> Look for the same error x in other file 2
> 
> Here is the code :
> I have 10 different patterns therefore I used list comprehension and
> compiling the pattern so I loop over and find the exact pattern matching
> 
> re_comp1 = [re.compile(pattern) for pattern in str1]


You can move the IGNORECASE flag into the call to compile. Also, perhaps 
you can use better names instead of "str1" (one string?).

patterns = [re.compile(pattern, re.IGNORECASE) for pattern in string_patterns]
 
> for pat in re_comp1:
> if pat.search(st,re.IGNORECASE):
> x = pat.pattern
> print x===> here it gives the expected output it correct
> match
> print type(x)
> 

Be careful here: even though you have ten different patterns, only *one* 
will be stored in x. If three patterns match, x will only get the last 
of the three and the others will be ignored.

 
> if re.search('x', line, re.IGNORECASE) is not None:  ===> Gives a wrong match

That's because you are trying to match the literal string "x", so it 
will match anything with the letter "x":

box, text, ax, equinox, except, hexadecimal, fix, Kleenex, sixteen ...


> Instead if I use :
> 
> if re.search(x, line, re.IGNORECASE) is not None: then no match occurs
>   print line

Here you are trying to match the variable called x. That is a very bad 
name for a variable (what does "x" mean?) but it should work.

If no match occurs, it probably means that the value of x doesn't occur 
in the line you are looking at.

Try printing x and line and see if they are what you expect them to be:

print x
print line


-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Re Module

2018-12-27 Thread Alan Gauld via Tutor
On 27/12/2018 15:10, Asad wrote:

> file1 - I need a search a error say x if the error matches
> 
> Look for the same error x in other file 2
> 
> Here is the code :
> I have 10 different patterns therefore I used list comprehension and
> compiling the pattern so I loop over and find the exact pattern matching
> 
> re_comp1 = [re.compile(pattern) for pattern in str1]

I assume str1 is actually a list of strings? You don't
show the definition but since you say it gives the
expected output I'll hope that its correct.

> for pat in re_comp1:
> if pat.search(st,re.IGNORECASE):
> x = pat.pattern
> print x===> here it gives the expected output it correct

I assume st comes from your file1? You don't show us that
bit of code either...

But you do realize that the print only shows the last result.
If there is more than one matching pattern the previous results
get thrown away. And if you only care about one match you
could just use a single regex.
On the other hand, if you do only want the last matching
pattern then what you have works.

> if re.search('x', line, re.IGNORECASE) is not None:  ===> Gives a wrong
> match
>   print line

Notice that you pass the string 'x' into the search.
I assume it is meant to be x? That means you are searching
for the single character 'x' in line. You also don't show
us where line comes from I assume its the other file?

But why do you switch from using the compiled pattern?
Why not just assign x to the pattern object pat? This can
then be used to search line directly and with greater
efficiency.


> if re.search(x, line, re.IGNORECASE) is not None: then no match occurs
>   print line

And are you sure a match should occur?
It would help debug this if you showed us some sample data.
Such as the value of x and the value of line.

Given you are obviously only showing us a selected segment
of your code its hard to be sure. But as written here you
are searching line even if no pattern matches in file1.
That is, you could loop through all your patterns, never
assign anything to x and then go ahead and try to search
for 'x' in line. You should probably check x first.

Also, since you don't show the file looping code we don't
know whether you break out whenever you find a match or
whether the rest of the code is all inside the first
loop over file1. Trying to debug someone else's code
is hard enough. When we only have half the code we are
reduced to guesswork.

Finally, do you get any error messages? If so, please
post them in their entirety. Based on your code I'm
assuming you are working on Python v2.? but its always
worth posting the python version and OS.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Re Module

2018-12-27 Thread Asad
Hi All ,

  I trying find a solution for my script , I have two files :

file1 - I need a search a error say x if the error matches

Look for the same error x in other file 2

Here is the code :
I have 10 different patterns therefore I used list comprehension and
compiling the pattern so I loop over and find the exact pattern matching

re_comp1 = [re.compile(pattern) for pattern in str1]

for pat in re_comp1:
if pat.search(st,re.IGNORECASE):
x = pat.pattern
print x===> here it gives the expected output it correct
match
print type(x)



if re.search('x', line, re.IGNORECASE) is not None:  ===> Gives a wrong
match
  print line

Instead if I use :

if re.search(x, line, re.IGNORECASE) is not None: then no match occurs
  print line

Please advice where I going wrong or what can be done to make it better .

Thanks,


-- 
Asad Hasan
+91 9582111698
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module

2014-08-19 Thread Sunil Tech
Hey thanks Danny Yoo, Chris “Kwpolska” Warrick, D.V.N Sarma
​.

I will take all your inputs.

Thanks a lot.​


On Fri, Aug 15, 2014 at 3:32 AM, Danny Yoo d...@hashcollision.org wrote:

 On Thu, Aug 14, 2014 at 8:39 AM, D.V.N.Sarma డి.వి.ఎన్.శర్మ
 dvnsa...@gmail.com wrote:
  I tested it on IDLE. It works.


 Hi Sarma,


 Following up on this one.  I'm pretty sure that:

 print re.search(span style=\(.*)\, stmt).group()

 is going to print something, but it almost certainly will not do what
 Sunil wants.  See:

 https://docs.python.org/2/howto/regex.html#greedy-versus-non-greedy

 for why.
 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] re module

2014-08-14 Thread Sunil Tech
Hi,

I have string like
stmt = 'pspan style=font-size: 11pt;span style=font-family: times
new roman,times;Patient name:nbsp;Upadhyay Shyam/spanspan
style=font-family: times new roman,times;nbsp;nbsp;br /Date of
birth:nbsp;nbsp;nbsp;08/08/1988 br /Issue(s) to be
analyzed:nbsp;nbsp;tes/span/spanbr /span
style=font-size: 11pt;span style=font-family: times new
roman,times;Nurse Clinical summary:nbsp;nbsp;test1/spanspan
style=font-family: times new roman,times;nbsp;br /br /Date of
injury:nbsp;nbsp;nbsp;12/14/2013/spanbr /span style=font-family:
times new roman,times;Diagnoses:nbsp;nbsp;nbsp;723.4 - 300.02 - 298.3
- 780.50 - 724.4nbsp;Brachial neuritis or radiculitis nos - Generalized
anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance
- Thoracic or lumbosacral neuritis or radiculitis, unspecified/spanbr
/span style=font-family: times new roman,times;Requester
name:nbsp;nbsp;nbsp;Demo Spltycdtestt/spanbr /span
style=font-family: times new roman,times;Phone #:nbsp;nbsp;nbsp;(213)
480-9000/spanbr /br /span style=font-family: times new
roman,times;Medical records reviewed br /__ pages of medical and
administrative records were reviewed including:br /br /br /Criteria
used in analysis br /nbsp;br /br /Reviewer comments br /br /br
/Determinationbr /Based on the clinical information submitted for this
review and using the evidence-based, peer-reviewed guidelines referenced
above, this request isnbsp;br /br /Peer Reviewer
Name/Credentialsnbsp;nbsp;/spanbr /span style=font-family: times
new roman,times;Solis, Test,nbsp;PhD/spanbr /span
style=font-family: times new roman,times;Internal Medicine/spanbr
/span style=font-family: times new roman,times;nbsp;/spanbr /br
/span style=font-family: times new roman,times;Attestationbr /br
/br /Contact Information/spanspan style=font-family: times new
roman,times;nbsp;br //span/span/pbr/font face=\'times new
roman,times\' size=\'3\'Peer to Peer contact attempt 1: 08/13/2014 02:46
PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did
Not Change Determination/font'


i am trying to find the various font sizes and font face from this string.

i tried

print re.search(span style=\(.*)\, stmt).group()


Thank you.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module

2014-08-14 Thread Chris “Kwpolska” Warrick
On 14 Aug 2014 15:58 Sunil Tech sunil.tech...@gmail.com wrote:

 Hi,

 I have string like
 stmt = 'pspan style=font-size: 11pt;span style=font-family: times
new roman,times;Patient name:nbsp;Upadhyay Shyam/spanspan
style=font-family: times new roman,times;nbsp;nbsp;br /Date of
birth:nbsp;nbsp;nbsp;08/08/1988 br /Issue(s) to be
analyzed:nbsp;nbsp;tes/span/spanbr /span
style=font-size: 11pt;span style=font-family: times new
roman,times;Nurse Clinical summary:nbsp;nbsp;test1/spanspan
style=font-family: times new roman,times;nbsp;br /br /Date of
injury:nbsp;nbsp;nbsp;12/14/2013/spanbr /span style=font-family:
times new roman,times;Diagnoses:nbsp;nbsp;nbsp;723.4 - 300.02 - 298.3
- 780.50 - 724.4nbsp;Brachial neuritis or radiculitis nos - Generalized
anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance
- Thoracic or lumbosacral neuritis or radiculitis, unspecified/spanbr
/span style=font-family: times new roman,times;Requester
name:nbsp;nbsp;nbsp;Demo Spltycdtestt/spanbr /span
style=font-family: times new roman,times;Phone #:nbsp;nbsp;nbsp;(213)
480-9000/spanbr /br /span style=font-family: times new
roman,times;Medical records reviewed br /__ pages of medical and
administrative records were reviewed including:br /br /br /Criteria
used in analysis br /nbsp;br /br /Reviewer comments br /br /br
/Determinationbr /Based on the clinical information submitted for this
review and using the evidence-based, peer-reviewed guidelines referenced
above, this request isnbsp;br /br /Peer Reviewer
Name/Credentialsnbsp;nbsp;/spanbr /span style=font-family: times
new roman,times;Solis, Test,nbsp;PhD/spanbr /span
style=font-family: times new roman,times;Internal Medicine/spanbr
/span style=font-family: times new roman,times;nbsp;/spanbr /br
/span style=font-family: times new roman,times;Attestationbr /br
/br /Contact Information/spanspan style=font-family: times new
roman,times;nbsp;br //span/span/pbr/font face=\'times new
roman,times\' size=\'3\'Peer to Peer contact attempt 1: 08/13/2014 02:46
PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did
Not Change Determination/font'


 i am trying to find the various font sizes and font face from this string.

 i tried

 print re.search(span style=\(.*)\, stmt).group()


 Thank you.




 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 https://mail.python.org/mailman/listinfo/tutor

Don't use regular expressions for HTML. Use lxml instead.

Also, why would you need that exact thing? It's useless. Also, this code is
very ugly, with too many spans and — worse — fonts which should not be
used at all.

-- 
Chris “Kwpolska” Warrick http://chriswarrick.com/
Sent from my SGS3.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module

2014-08-14 Thread D . V . N . Sarma డి . వి . ఎన్ . శర్మ
I tested it on IDLE. It works.

regards,
Sarma.


On Thu, Aug 14, 2014 at 7:37 PM, Chris “Kwpolska” Warrick 
kwpol...@gmail.com wrote:


 On 14 Aug 2014 15:58 Sunil Tech sunil.tech...@gmail.com wrote:
 
  Hi,
 
  I have string like
  stmt = 'pspan style=font-size: 11pt;span style=font-family:
 times new roman,times;Patient name:nbsp;Upadhyay Shyam/spanspan
 style=font-family: times new roman,times;nbsp;nbsp;br /Date of
 birth:nbsp;nbsp;nbsp;08/08/1988 br /Issue(s) to be
 analyzed:nbsp;nbsp;tes/span/spanbr /span
 style=font-size: 11pt;span style=font-family: times new
 roman,times;Nurse Clinical summary:nbsp;nbsp;test1/spanspan
 style=font-family: times new roman,times;nbsp;br /br /Date of
 injury:nbsp;nbsp;nbsp;12/14/2013/spanbr /span style=font-family:
 times new roman,times;Diagnoses:nbsp;nbsp;nbsp;723.4 - 300.02 - 298.3
 - 780.50 - 724.4nbsp;Brachial neuritis or radiculitis nos - Generalized
 anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance
 - Thoracic or lumbosacral neuritis or radiculitis, unspecified/spanbr
 /span style=font-family: times new roman,times;Requester
 name:nbsp;nbsp;nbsp;Demo Spltycdtestt/spanbr /span
 style=font-family: times new roman,times;Phone #:nbsp;nbsp;nbsp;(213)
 480-9000/spanbr /br /span style=font-family: times new
 roman,times;Medical records reviewed br /__ pages of medical and
 administrative records were reviewed including:br /br /br /Criteria
 used in analysis br /nbsp;br /br /Reviewer comments br /br /br
 /Determinationbr /Based on the clinical information submitted for this
 review and using the evidence-based, peer-reviewed guidelines referenced
 above, this request isnbsp;br /br /Peer Reviewer
 Name/Credentialsnbsp;nbsp;/spanbr /span style=font-family: times
 new roman,times;Solis, Test,nbsp;PhD/spanbr /span
 style=font-family: times new roman,times;Internal Medicine/spanbr
 /span style=font-family: times new roman,times;nbsp;/spanbr /br
 /span style=font-family: times new roman,times;Attestationbr /br
 /br /Contact Information/spanspan style=font-family: times new
 roman,times;nbsp;br //span/span/pbr/font face=\'times new
 roman,times\' size=\'3\'Peer to Peer contact attempt 1: 08/13/2014 02:46
 PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did
 Not Change Determination/font'
 
 
  i am trying to find the various font sizes and font face from this
 string.
 
  i tried
 
  print re.search(span style=\(.*)\, stmt).group()
 
 
  Thank you.
 
 
 
 
  ___
  Tutor maillist  -  Tutor@python.org
  To unsubscribe or change subscription options:
  https://mail.python.org/mailman/listinfo/tutor
 
 Don't use regular expressions for HTML. Use lxml instead.

 Also, why would you need that exact thing? It's useless. Also, this code
 is very ugly, with too many spans and — worse — fonts which should not
 be used at all.

 --
 Chris “Kwpolska” Warrick http://chriswarrick.com/
 Sent from my SGS3.

 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 https://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module

2014-08-14 Thread Albert-Jan Roskam

-
On Thu, Aug 14, 2014 4:07 PM CEST Chris “Kwpolska” Warrick wrote:

On 14 Aug 2014 15:58 Sunil Tech sunil.tech...@gmail.com wrote:

 Hi,

 I have string like
 stmt = 'pspan style=font-size: 11pt;span style=font-family: times
new roman,times;Patient name: Upadhyay Shyam/spanspan
style=font-family: times new roman,times;  br /Date of
birth:   08/08/1988 br /Issue(s) to be
analyzed:  tes/span/spanbr /span
style=font-size: 11pt;span style=font-family: times new
roman,times;Nurse Clinical summary:  test1/spanspan
style=font-family: times new roman,times; br /br /Date of
injury:   12/14/2013/spanbr /span style=font-family:
times new roman,times;Diagnoses:   723.4 - 300.02 - 298.3
- 780.50 - 724.4 Brachial neuritis or radiculitis nos - Generalized
anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance
- Thoracic or lumbosacral neuritis or radiculitis, unspecified/spanbr
/span style=font-family: times new roman,times;Requester
name:   Demo Spltycdtestt/spanbr /span
style=font-family: times new roman,times;Phone #:   (213)
480-9000/spanbr /br /span style=font-family: times new
roman,times;Medical records reviewed br /__ pages of medical and
administrative records were reviewed including:br /br /br /Criteria
used in analysis br / br /br /Reviewer comments br /br /br
/Determinationbr /Based on the clinical information submitted for this
review and using the evidence-based, peer-reviewed guidelines referenced
above, this request is br /br /Peer Reviewer
Name/Credentials  /spanbr /span style=font-family: times
new roman,times;Solis, Test, PhD/spanbr /span
style=font-family: times new roman,times;Internal Medicine/spanbr
/span style=font-family: times new roman,times; /spanbr /br
/span style=font-family: times new roman,times;Attestationbr /br
/br /Contact Information/spanspan style=font-family: times new
roman,times; br //span/span/pbr/font face=\'times new
roman,times\' size=\'3\'Peer to Peer contact attempt 1: 08/13/2014 02:46
PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did
Not Change Determination/font'


 i am trying to find the various font sizes and font face from this string.

 i tried

 print re.search(span style=\(.*)\, stmt).group()


 Thank you.




 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 https://mail.python.org/mailman/listinfo/tutor

Don't use regular expressions for HTML. Use lxml instead.

Also, why would you need that exact thing? It's useless. Also, this code is
very ugly, with too many spans and — worse — fonts which should not be
used at all.

Why lxml and not bs? I read that bs deals better with malformed html. You said 
the above html is messy, which is not necessarily the same as malformed, but.. 
Anyway, this reference also seems to favor lxml: 
http://stackoverflow.com/questions/4967103/beautifulsoup-and-lxml-html-what-to-prefer
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module

2014-08-14 Thread Danny Yoo
On Thu, Aug 14, 2014 at 8:39 AM, D.V.N.Sarma డి.వి.ఎన్.శర్మ
dvnsa...@gmail.com wrote:
 I tested it on IDLE. It works.


Hi Sarma,


Following up on this one.  I'm pretty sure that:

print re.search(span style=\(.*)\, stmt).group()

is going to print something, but it almost certainly will not do what
Sunil wants.  See:

https://docs.python.org/2/howto/regex.html#greedy-versus-non-greedy

for why.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module

2014-08-14 Thread Danny Yoo
Hi Sunil,

Don't use regular expressions for this task.  Use something that knows
about HTML structure.  As others have noted, the Beautiful Soup or
lxml libraries are probably a much better choice here.

There are good reasons to avoid regexp for the task you're trying to
do.  For example, your regular expression:

 span style=\(.*)\

does not respect the string boundaries of attributes.  You may think
that .* matches just content within a string attribute, but this is
not true.  For example, see the following example:

##
 import re
 m = re.match('(.*)', 'quoted' text, but note how it's greedy!)
 m.group(1)
quoted' text, but note how it
##

and note how the match doesn't limited itself to quoted, but goes as
far as it can.

This shows at least one of the problems that you're going to run into.
Fixing this so it doesn't grab so much is doable, of course.  But
there are other issues, all of which are little headaches upon
headaches.  (e.g. Attribute vlaues may be single or double quoted, may
use HTML entity references, etc.)

So don't try to parse HTML by hand.  Let a library do it for you.  For
example with Beautiful Soup:

http://www.crummy.com/software/BeautifulSoup/bs4/doc/

the code should be as straightforward as:

###
from bs4 import BeautifulSoup
soup = BeautifulSoup(stmt)
for span in soup.find_all('span'):
print span.get('style')
###

where you deal with the _structure_ of your document, rather than at
the low-level individual characters of that document.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module- puzzling results when matching money

2013-08-04 Thread Dominik George
Hi,

not quite. The moral is to learn about greedy and non-greedy matching ;)!

-nik



Alex Kleider aklei...@sonic.net schrieb:
On 2013-08-03 13:38, Dominik George wrote:
 Hi,
 
  b is defined as all non-word characters, so it is the complement oft
 w. w is [A-Za-z0-9_-], so b includes $ and thus cuts off your sign
 group.
 
  -nik

I get it now.  I was using it before the '$' to define the beginning of

a word but I think things are failing because it detects an end of
word.
Anyway, the moral is not to use it with anything but \w!

Thanks!

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module- puzzling results when matching money

2013-08-04 Thread Alan Gauld

On 04/08/13 08:45, Alex Kleider wrote:


sorry, my bad. I forgot to delete that backslash, I meant
re.findall(r\be\b, d e f). Same with the other example.


..but the interesting thing is that the presence or absence of the
spurious back slashes seems not to change the results.



It wouldn't because the backslash says treat the next character as a 
literal and if its not a metacharacter its already treated as a literal.

So the \ is effectively a non-operation in that context.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module- puzzling results when matching money

2013-08-03 Thread Dominik George
Hi,

\b is defined as all non-word characters, so it is the complement oft \w. \w is 
[A-Za-z0-9_-], so \b includes \$ and thus cuts off your sign group.

-nik



Alex Kleider aklei...@sonic.net schrieb:
#!/usr/bin/env python


I've been puzzling over the re module and have a couple of questions
regarding the behaviour of this script.

I've provided two possible patterns (re_US_money):
the one surrounded by the 'word boundary' meta sequence seems not to
work
while the other one does. I can't understand why the addition of the
word
boundary defeats the match.

I also don't understand why the split method includes the matched text.
Splitting only works as I would have expected if no goupings are used.

If I've set this up as intended, the full body of this e-mail should be
executable as a script.

Comments appreciated.
alex kleider


# file :  tutor.py (Python 2.7, NOT Python 3)
print 'Running tutor.py on an Ubuntu Linux machine. *'

import re

target = \
Cost is $4.50. With a $.30 discount:
Price is $4.15.
The price could be less, say $4 or $4.
Let's see how this plays out:  $4.50.60


# Choose one of the following two alternatives:
re_US_money =\
r((?Psign\$)(?Pdollars\d{0,})(?:\.(?Pcents\d{2})){0,1})
# The above provides matches.
# The following does NOT.
# re_US_money =\
# r\b((?Psign\$)(?Pdollars\d{0,})(?:\.(?Pcents\d{2})){0,1})\b

pat_object = re.compile(re_US_money)
match_object = pat_object.search(target)
if match_object:
 print 'match_object.group()' and 'match_object.span()' yield:
 print match_object.group(), match_object.span()
 print
else:
 print NO MATCH FOUND!!!
print
print Now will use 'finditer()':

print
iterator = pat_object.finditer(target)
i = 1
for iter in iterator:
 print
 print iter #%d: %(i, ),
 print iter.group()
 print 'groups()' yields: '%s'.%(iter.groups(), )
 print iter.span()
 i += 1
 sign = iter.group(sign)
 dollars = iter.group(dollars)
 cents = iter.group(cents)
 print sign,
 print   ,
 if dollars:
 print dollars,
 else:
 print 00,
 print   ,
 if cents:
 print cents,
 else:
 print 00,

print

t = target
sub_target = pat_object.sub(insert value here, t)
print
print Printing substitution: 
print sub_target
split_target = pat_object.split(target)
print Result of splitting on the target: 
print split_target

# End of script.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] re module help

2012-01-09 Thread Ganesh Kumar
Hi Gurus,

I have created regular expression with os modules, I have created file
sdptool to match the regular expression pattern, will print the result.
I want without creating file how to get required output, I tried but i
didn't get output correctly, over stream.

#! /usr/bin/python
import os,re

def scan():

cmd = sdptool -i hci0 search OPUSH  sdptool
fp = os.popen(cmd)

results = []
l = open(sdptool).read()


pattern = r^Searching for OPUSH on (\w\w(:\w\w)+).*?Channel: (\d+)
r = re.compile(pattern, flags=re.MULTILINE|re.DOTALL)
while True:
for match in r.finditer(l):
g  = match.groups()

results.append((g[0],'phone',g[2]))
return results

## output [('00:15:83:3D:0A:57', 'phone', '1')]


http://dpaste.com/684335/
please guide me. with out file creating, to archive required output.


Did I learn something today? If not, I wasted it.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module help

2012-01-09 Thread bodsda
You could use read directly on the popen call to negate having to write to a 
file

output = os.popen(“sdptool -i hci0 search OPUSH“).read()

Bodsda
Sent from my BlackBerry® wireless device

-Original Message-
From: Ganesh Kumar bugcy...@gmail.com
Sender: tutor-bounces+bodsda=googlemail@python.org
Date: Mon, 9 Jan 2012 14:47:46 
To: tutor@python.org
Subject: [Tutor] re module help

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-04 Thread Peter Otten
Karim wrote:

 Recall:
 
   re.subn(r'([^\\])?', r'\1\\', expression)
 
 Traceback (most recent call last):
  File stdin, line 1, inmodule
  File /home/karim/build/python/install/lib/python2.7/re.py, line
 162, in subn
return _compile(pattern, flags).subn(repl, string, count)
  File /home/karim/build/python/install/lib/python2.7/re.py, line
 278, in filter
return sre_parse.expand_template(template, match)
  File /home/karim/build/python/install/lib/python2.7/sre_parse.py,
 line 787, in expand_template
raise error, unmatched group
 sre_constants.error: unmatched group
 
 
 Found the solution: '?' needs to be inside parenthesis (saved pattern)
 because outside we don't know if the saved match argument
 will exist or not namely '\1'.
 
   re.subn(r'([^\\]?)', r'\1\\', expression)
 
 ('  ', 2)
 
 sed unix command is more permissive: sed 's/\([^\\]\)\?/\1\\/g'
 because '?' can be outside parenthesis (saved pattern but escaped for
 sed). \1 seems to not cause issue when matching is found. Perhaps it is
 created only when match occurs.

Thanks for reporting the explanation.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-04 Thread Peter Otten
Karim wrote:

 That is not the thing I want. I want to escape any  which are not
 already escaped.
 The sed regex  '/\([^\\]\)\?/\1\\/g' is exactly what I need (I have
 made regex on unix since 15 years).

Can the backslash be escaped, too? If so I don't think your regex does what 
you think it does.

r'\\\' # escaped \ followed by escaped 

should not be altered, but:

$ echo '\\\' | sed 's/\([^\\]\)\?/\1\\/g'
 # two escaped \ folloed by a  that is not escaped



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-04 Thread Karim

On 02/04/2011 02:36 AM, Steven D'Aprano wrote:

Karim wrote:


*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know 
how to fix it yet.


A man when to a doctor and said, Doctor, every time I do this, it 
hurts. What should I do?


The doctor replied, Then stop doing that!

:)


Yes this these words made me laugh. I will keep it in my funny box.




Don't add bold or any other formatting to things which should be 
program code. Even if it looks okay in *your* program, you don't know 
how it will look in other people's programs. If you need to draw 
attention to something in a line of code, add a comment, or talk about 
it in the surrounding text.



[...]
That is not the thing I want. I want to escape any  which are not 
already escaped.
The sed regex  '/\([^\\]\)\?/\1\\/g' is exactly what I need (I have 
made regex on unix since 15 years).


Mainly sed, awk and perl sometimes grep and egrep. I know this is the 
jungle.


Which regex? Perl regexes? sed or awk regexes? Extended regexes? GNU 
posix compliant regexes? grep or egrep regexes? They're all different.


In any case, I am sorry, I don't think your regex does what you say. 
When I try it, it doesn't work for me.


[steve@sylar ~]$ echo 'Some \text' | sed -e 's/\([^\\]\)\?/\1\\/g'
Some \\text\


I give you my word on this. Exact output I redid it:

#MY OS VERSION
karim@Requiem4Dream:~$ uname -a
Linux Requiem4Dream 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 23:42:43 
UTC 2011 x86_64 GNU/Linux

#MY SED VERSION
karim@Requiem4Dream:~$ sed --version
GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.

GNU sed home page: http://www.gnu.org/software/sed/.
General help using GNU software: http://www.gnu.org/gethelp/.
E-mail bug reports to: bug-gnu-ut...@gnu.org.
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
#MY SED OUTPUT COMMAND:
karim@Requiem4Dream:~$  echo 'Some ' | sed -e 's/\([^\\]\)\?/\1\\/g'
Some \\
# THIS IS WHAT I WANT 2 CONSECUTIVES IF THE FIRST ONE IS ALREADY ESCAPED 
I DON'T WANT TO ESCAPED IT TWICE.

karim@Requiem4Dream:~$ echo 'Some \' | sed -e 's/\([^\\]\)\?/\1\\/g'
Some \\
# BY THE WAY THIS ONE WORKS:
karim@Requiem4Dream:~$ echo 'Some text' | sed -e 's/\([^\\]\)\?/\1\\/g'
Some \text\
# BUT SURE NOT THIS ONE NOT COVERED BY MY REGEX (I KNOW IT AND WANT 
ORIGINALY TO COVER IT):
karim@Requiem4Dream:~$ echo 'Some \text' | sed -e 
's/\([^\\]\)\?/\1\\/g'

Some \\text\

By the way in all sed version I work with the '?'  (0 or one match) 
should be escaped that's the reason I have '\?' same thing with save 
'\(' and '\)' to store value. In perl, grep you don't need to escape.


# SAMPLE FROM http://www.gnu.org/software/sed/manual/sed.html

|\+|
   same As |*|, but matches one or more. It is a GNU extension.
|\?|
   same As |*|, but only matches zero or one. It is a GNU extension


I wouldn't expect it to work. See below.

By the way, you don't need to escape the brackets or the question mark:

[steve@sylar ~]$ echo 'Some \text' | sed -re 's/([^\\])?/\1\\/g'
Some \\text\



For me the equivalent python regex is buggy: r'([^\\])?', r'\1\\'


No it is not.



Yes I know, see my latest post in detail I already found the solution. I 
put it again the solution below:


#Found the solution: '?' needs to be inside parenthesis (saved pattern) 
because outside we don't know if the saved match argument

#will exist or not namely '\1'.

 re.subn(r'([^\\]?)', r'\1\\', expression)

('  ', 2)


The pattern you are matching does not do what you think it does. Zero 
or one of not-backslash, followed by a quote will match a single 
quote *regardless* of what is before it. This is true even in sed, as 
you can see above, your sed regex matches both quotes.


\ will match, because the regular expression will match zero 
characters, followed by a quote. So the regex is correct.


 match = r'[^\\]?'  # zero or one not-backslash followed by quote
 re.search(match, r'aaa\aaa').group()
''

Now watch what happens when you call re.sub:


 match = r'([^\\])?'  # group 1 equals a single non-backslash
 replace = r'\1\\'  # group 1 followed by \ followed by 
 re.sub(match, replace, '')  # no matches
''
 re.sub(match, replace, '')  # one match
'aa\\aa'
 re.sub(match, replace, '')  # one match, but there's no group 1
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/local/lib/python3.1/re.py, line 166, in sub
return _compile(pattern, flags).sub(repl, string, count)
  File /usr/local/lib/python3.1/re.py, line 303, in filter
return sre_parse.expand_template(template, match)
  File /usr/local/lib/python3.1/sre_parse.py, line 807, in 
expand_template

raise error(unmatched 

Re: [Tutor] RE module is working ?

2011-02-03 Thread Karim


Hello,

Any news on this topic?O:-)

Regards
Karim

On 02/02/2011 08:21 PM, Karim wrote:


Hello,

I am trying to subsitute a '' pattern in '\\' namely escape 2 
consecutives double quotes:


* *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type help, copyright, credits or license for more information.
 expression = *'  '*
 re.subn(*r'([^\\])?', r'\1\\', expression*)
Traceback (most recent call last):
  File stdin, line 1, in module
  File /home/karim/build/python/install/lib/python2.7/re.py, line 
162, in subn

return _compile(pattern, flags).subn(repl, string, count)
  File /home/karim/build/python/install/lib/python2.7/re.py, line 
278, in filter

return sre_parse.expand_template(template, match)
  File /home/karim/build/python/install/lib/python2.7/sre_parse.py, 
line 787, in expand_template

raise error, unmatched group
sre_constants.error: unmatched group

But if I remove '?' I get the following:

 re.subn(r'([^\\])', r'\1\\', expression)
(' \\ ', 1)

Only one substitution..._But this is not the same REGEX._ And the 
count=2 does nothing. By default all occurrence shoul be substituted.


* *On linux using my good old sed command, it is working with my
  '?' (0-1 match):*

*$* echo *'  '* | sed *'s/\([^\\]\)\?/\1\\/g*'*
 \\

*Indeed what's the matter with RE module!?*

*Any idea will be welcome!

Regards
Karim*
*


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Steven D'Aprano

Karim wrote:


Hello,

I am trying to subsitute a '' pattern in '\\' namely escape 2 
consecutives double quotes:


You don't have to escape quotes. Just use the other sort of quote:

 print ''




   * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type help, copyright, credits or license for more information.
  expression = *'  '*


No, I'm sorry, that's incorrect -- that gives a syntax error in every 
version of Python I know of, including version 2.7:


 expression = *'  '*
  File stdin, line 1
expression = *'  '*
 ^
SyntaxError: invalid syntax


So what are you really running?




  re.subn(*r'([^\\])?', r'\1\\', expression*)


Likewise here. *r'...' is a syntax error, as is expression*)

I don't understand what you are running or why you are getting the 
results you are.



 *Indeed what's the matter with RE module!?*

There are asterisks all over your post! Where are they coming from?

What makes you think the problem is with the RE module?

We have a saying in English:

The poor tradesman blames his tools.

Don't you think it's more likely that the problem is that you are using 
the module wrongly?


I don't understand what you are trying to do, so I can't tell you how to 
do it. Can you give an example of what you want to start with, and what 
you want to end up with? NOT Python code, just literal text, like you 
would type into a letter.


E.g. ABC means literally A followed by B followed by C.
\ means literally backslash followed by double-quote




--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Karim


Hello Steven,

I am perhaps a poor tradesman but I have to blame my thunderbird tool :-P .
Because expression = *'  '*  is in fact fact expression = '  '.
The bold appear as stars I don't know why. I need to have escapes for 
passing it to another language (TCL interpreter).

So I will rewrite it not _in bold_:

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type help, copyright, credits or license for more information.
 expression = '  '

 re.subn(r'([^\\])?', r'\1\\', expression)

But if I remove '?' I get the following:

 re.subn(r'([^\\])', r'\1\\', expression)
(' \\ ', 1)

   * On linux using my good old sed command, it is working with my '?'
 (0-1 match):

$ echo '  ' | sed 's/\([^\\]\)\?/\1\\/g'*
* \\

For me linux/unix sed utility is trusty and is the reference.

Regards
Karim


On 02/03/2011 11:43 AM, Steven D'Aprano wrote:

Karim wrote:


Hello,

I am trying to subsitute a '' pattern in '\\' namely escape 2 
consecutives double quotes:


You don't have to escape quotes. Just use the other sort of quote:

 print ''




   * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type help, copyright, credits or license for more information.
 expression = *'  '*


No, I'm sorry, that's incorrect -- that gives a syntax error in every 
version of Python I know of, including version 2.7:


 expression = *'  '*
  File stdin, line 1
expression = *'  '*
 ^
SyntaxError: invalid syntax


So what are you really running?




 re.subn(*r'([^\\])?', r'\1\\', expression*)


Likewise here. *r'...' is a syntax error, as is expression*)

I don't understand what you are running or why you are getting the 
results you are.



 *Indeed what's the matter with RE module!?*

There are asterisks all over your post! Where are they coming from?

What makes you think the problem is with the RE module?

We have a saying in English:

The poor tradesman blames his tools.

Don't you think it's more likely that the problem is that you are 
using the module wrongly?


I don't understand what you are trying to do, so I can't tell you how 
to do it. Can you give an example of what you want to start with, and 
what you want to end up with? NOT Python code, just literal text, like 
you would type into a letter.


E.g. ABC means literally A followed by B followed by C.
\ means literally backslash followed by double-quote






___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Karim


I forget something. There is no issue with python and double quotes.
But I need to give it to TCL script but as TCL is shit string is only 
delimited by double quotes.
Thus I need to escape it to not have syntax error whith nested double 
quotes.


Regards
The poor tradesman


On 02/03/2011 12:45 PM, Karim wrote:


Hello Steven,

I am perhaps a poor tradesman but I have to blame my thunderbird tool 
:-P .

Because expression = *'  '*  is in fact fact expression = '  '.
The bold appear as stars I don't know why. I need to have escapes for 
passing it to another language (TCL interpreter).

So I will rewrite it not _in bold_:

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type help, copyright, credits or license for more information.
 expression = '  '

 re.subn(r'([^\\])?', r'\1\\', expression)

But if I remove '?' I get the following:

 re.subn(r'([^\\])', r'\1\\', expression)
(' \\ ', 1)

* On linux using my good old sed command, it is working with my
  '?' (0-1 match):

$ echo '  ' | sed 's/\([^\\]\)\?/\1\\/g'*
* \\

For me linux/unix sed utility is trusty and is the reference.

Regards
Karim


On 02/03/2011 11:43 AM, Steven D'Aprano wrote:

Karim wrote:


Hello,

I am trying to subsitute a '' pattern in '\\' namely escape 2 
consecutives double quotes:


You don't have to escape quotes. Just use the other sort of quote:

 print ''




   * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type help, copyright, credits or license for more information.
 expression = *'  '*


No, I'm sorry, that's incorrect -- that gives a syntax error in every 
version of Python I know of, including version 2.7:


 expression = *'  '*
  File stdin, line 1
expression = *'  '*
 ^
SyntaxError: invalid syntax


So what are you really running?




 re.subn(*r'([^\\])?', r'\1\\', expression*)


Likewise here. *r'...' is a syntax error, as is expression*)

I don't understand what you are running or why you are getting the 
results you are.



 *Indeed what's the matter with RE module!?*

There are asterisks all over your post! Where are they coming from?

What makes you think the problem is with the RE module?

We have a saying in English:

The poor tradesman blames his tools.

Don't you think it's more likely that the problem is that you are 
using the module wrongly?


I don't understand what you are trying to do, so I can't tell you how 
to do it. Can you give an example of what you want to start with, and 
what you want to end up with? NOT Python code, just literal text, 
like you would type into a letter.


E.g. ABC means literally A followed by B followed by C.
\ means literally backslash followed by double-quote







___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Peter Otten
Karim wrote:

 I am trying to subsitute a '' pattern in '\\' namely escape 2
 consecutives double quotes:
 
 * *In Python interpreter:*
 
 $ python
 Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
 [GCC 4.4.3] on linux2
 Type help, copyright, credits or license for more information.
   expression = *'  '*
   re.subn(*r'([^\\])?', r'\1\\', expression*)
 Traceback (most recent call last):
File stdin, line 1, in module
File /home/karim/build/python/install/lib/python2.7/re.py, line
 162, in subn
  return _compile(pattern, flags).subn(repl, string, count)
File /home/karim/build/python/install/lib/python2.7/re.py, line
 278, in filter
  return sre_parse.expand_template(template, match)
File /home/karim/build/python/install/lib/python2.7/sre_parse.py,
 line 787, in expand_template
  raise error, unmatched group
 sre_constants.error: unmatched group
 
 But if I remove '?' I get the following:
 
   re.subn(r'([^\\])', r'\1\\', expression)
 (' \\ ', 1)
 
 Only one substitution..._But this is not the same REGEX._ And the
 count=2 does nothing. By default all occurrence shoul be substituted.
 
 * *On linux using my good old sed command, it is working with my '?'
   (0-1 match):*
 
 *$* echo *'  '* | sed *'s/\([^\\]\)\?/\1\\/g*'*
   \\
 
 *Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first; afterwards 
it's probably a good idea to try and explain your goal clearly, in plain 
English.

Yes. What Steven said ;)

Now to your question as stated: if you want to escape two consecutive double 
quotes that can be done with

s = s.replace('', '\\')

but that's probably *not* what you want. Assuming you want to escape two 
consecutive double quotes and make sure that the first one isn't already 
escaped, this is my attempt:

 def sub(m):
... s = m.group()
... return r'\\' if s == '' else s
...
 print re.compile(r'[\\].|').sub(sub, r'\\\ \\ \  \\\ \\ \')
\\\  \ \\ \\\ \\ \

Compare that with

$ echo '\\\ \\ \  \\\ \\ \' | sed 's/\([^\\]\)\?/\1\\/g'
 \\\ \\ \\  \\\ \\

Concerning the exception and the discrepancy between sed and python's re, I 
suggest that you ask it again on comp.lang.python aka the python-list 
mailing list where at least one regex guru will read it.

Peter

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Karim

On 02/03/2011 02:15 PM, Peter Otten wrote:

Karim wrote:


I am trying to subsitute a '' pattern in '\\' namely escape 2
consecutives double quotes:

 * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type help, copyright, credits or license for more information.
expression = *'  '*
re.subn(*r'([^\\])?', r'\1\\', expression*)
Traceback (most recent call last):
File stdin, line 1, inmodule
File /home/karim/build/python/install/lib/python2.7/re.py, line
162, in subn
  return _compile(pattern, flags).subn(repl, string, count)
File /home/karim/build/python/install/lib/python2.7/re.py, line
278, in filter
  return sre_parse.expand_template(template, match)
File /home/karim/build/python/install/lib/python2.7/sre_parse.py,
line 787, in expand_template
  raise error, unmatched group
sre_constants.error: unmatched group

But if I remove '?' I get the following:

re.subn(r'([^\\])', r'\1\\', expression)
(' \\ ', 1)

Only one substitution..._But this is not the same REGEX._ And the
count=2 does nothing. By default all occurrence shoul be substituted.

 * *On linux using my good old sed command, it is working with my '?'
   (0-1 match):*

*$* echo *'  '* | sed *'s/\([^\\]\)\?/\1\\/g*'*
   \\

*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know how 
to fix it yet.

  afterwards
it's probably a good idea to try and explain your goal clearly, in plain
English.


I already did it. (cf the mails queue). But to resume I pass the 
expression string to TCL command which delimits string with double 
quotes only.

Indeed I get error with nested double quotes = That's the key problem.

Yes. What Steven said ;)

Now to your question as stated: if you want to escape two consecutive double
quotes that can be done with

s = s.replace('', '\\')

I have already done it as a workaround but I have to add another 
replacement before to consider all other cases.

I want to make the original command work to suppress the workaround.



but that's probably *not* what you want. Assuming you want to escape two
consecutive double quotes and make sure that the first one isn't already
escaped,


You hit it !:-)


this is my attempt:


def sub(m):

... s = m.group()
... return r'\\' if s == '' else s
...

print re.compile(r'[\\].|').sub(sub, r'\\\ \\ \  \\\ \\ \')


That is not the thing I want. I want to escape any  which are not 
already escaped.
The sed regex  '/\([^\\]\)\?/\1\\/g' is exactly what I need (I have 
made regex on unix since 15 years).


For me the equivalent python regex is buggy: r'([^\\])?', r'\1\\'
'?' is not accepted Why? character which should not be an antislash with 
0 or 1 occurence. This is quite simple.


I am a poor tradesman but I don't deny evidence.

Regards
Karim


\\\  \ \\ \\\ \\ \

Compare that with

$ echo '\\\ \\ \  \\\ \\ \' | sed 's/\([^\\]\)\?/\1\\/g'
 \\\ \\ \\  \\\ \\

Concerning the exception and the discrepancy between sed and python's re, I
suggest that you ask it again on comp.lang.python aka the python-list
mailing list where at least one regex guru will read it.

Peter

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Dave Angel

On 01/-10/-28163 02:59 PM, Karim wrote:

On 02/03/2011 02:15 PM, Peter Otten wrote:

Karim wrote:
  (snip

*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;

Thunderbird issue with bold type (appears as stars) but I don't know how
to fix it yet.


The simple fix is not to try to add bold or colors on a text message. 
Python-tutor is a text list, not an html one.  Thunderbird tries to 
accomodate you by adding the asterisks, which is fine if it's regular 
English.  But in program code, it's obviously confuses things.


While I've got you, can I urge you not to top-post?  In this message, 
you correctly added your remarks after the part you were quoting.  But 
many times you put your comments at the top, which is backwards.


DaveA

--
--
da...@ieee.org
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Karim

On 02/03/2011 11:20 PM, Dave Angel wrote:

On 01/-10/-28163 02:59 PM, Karim wrote:

On 02/03/2011 02:15 PM, Peter Otten wrote:

Karim wrote:
  (snip

*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;

Thunderbird issue with bold type (appears as stars) but I don't know how
to fix it yet.


The simple fix is not to try to add bold or colors on a text message. 
Python-tutor is a text list, not an html one.  Thunderbird tries to 
accomodate you by adding the asterisks, which is fine if it's regular 
English.  But in program code, it's obviously confuses things.


While I've got you, can I urge you not to top-post?  In this message, 
you correctly added your remarks after the part you were quoting.  But 
many times you put your comments at the top, which is backwards.


DaveA



Sorry Dave,

I will try and do my best to avoid bold and top-post in the future.

Regards
Karim
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Karim

On 02/03/2011 07:47 PM, Karim wrote:

On 02/03/2011 02:15 PM, Peter Otten wrote:

Karim wrote:


I am trying to subsitute a '' pattern in '\\' namely escape 2
consecutives double quotes:

 * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type help, copyright, credits or license for more information.
  expression = *'  '*
  re.subn(*r'([^\\])?', r'\1\\', expression*)
Traceback (most recent call last):
File stdin, line 1, inmodule
File /home/karim/build/python/install/lib/python2.7/re.py, line
162, in subn
  return _compile(pattern, flags).subn(repl, string, count)
File /home/karim/build/python/install/lib/python2.7/re.py, line
278, in filter
  return sre_parse.expand_template(template, match)
File /home/karim/build/python/install/lib/python2.7/sre_parse.py,
line 787, in expand_template
  raise error, unmatched group
sre_constants.error: unmatched group

But if I remove '?' I get the following:

  re.subn(r'([^\\])', r'\1\\', expression)
(' \\ ', 1)

Only one substitution..._But this is not the same REGEX._ And the
count=2 does nothing. By default all occurrence shoul be substituted.

 * *On linux using my good old sed command, it is working with 
my '?'

   (0-1 match):*

*$* echo *'  '* | sed *'s/\([^\\]\)\?/\1\\/g*'*
   \\

*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know 
how to fix it yet.

  afterwards
it's probably a good idea to try and explain your goal clearly, in plain
English.


I already did it. (cf the mails queue). But to resume I pass the 
expression string to TCL command which delimits string with double 
quotes only.

Indeed I get error with nested double quotes = That's the key problem.

Yes. What Steven said ;)

Now to your question as stated: if you want to escape two consecutive 
double

quotes that can be done with

s = s.replace('', '\\')

I have already done it as a workaround but I have to add another 
replacement before to consider all other cases.

I want to make the original command work to suppress the workaround.



but that's probably *not* what you want. Assuming you want to escape two
consecutive double quotes and make sure that the first one isn't already
escaped,


You hit it !:-)


this is my attempt:


def sub(m):

... s = m.group()
... return r'\\' if s == '' else s
...
print re.compile(r'[\\].|').sub(sub, r'\\\ \\ \  \\\ 
\\ \')


That is not the thing I want. I want to escape any  which are not 
already escaped.
The sed regex  '/\([^\\]\)\?/\1\\/g' is exactly what I need (I have 
made regex on unix since 15 years).


For me the equivalent python regex is buggy: r'([^\\])?', r'\1\\'
'?' is not accepted Why? character which should not be an antislash 
with 0 or 1 occurence. This is quite simple.


I am a poor tradesman but I don't deny evidence.


Recall:

 re.subn(r'([^\\])?', r'\1\\', expression)

Traceback (most recent call last):
File stdin, line 1, inmodule
File /home/karim/build/python/install/lib/python2.7/re.py, line
162, in subn
  return _compile(pattern, flags).subn(repl, string, count)
File /home/karim/build/python/install/lib/python2.7/re.py, line
278, in filter
  return sre_parse.expand_template(template, match)
File /home/karim/build/python/install/lib/python2.7/sre_parse.py,
line 787, in expand_template
  raise error, unmatched group
sre_constants.error: unmatched group


Found the solution: '?' needs to be inside parenthesis (saved pattern) 
because outside we don't know if the saved match argument

will exist or not namely '\1'.

 re.subn(r'([^\\]?)', r'\1\\', expression)

('  ', 2)

sed unix command is more permissive: sed 's/\([^\\]\)\?/\1\\/g' 
because '?' can be outside parenthesis (saved pattern but escaped for sed).
\1 seems to not cause issue when matching is found. Perhaps it is 
created only when match occurs.


MORALITY:

1) Behaviour of python is logic and I must understand what I do with it.
2) sed is a fantastic tool because it manages match value when missing.
3) I am a real poor tradesman

Regards
Karim



Regards
Karim


\\\  \ \\ \\\ \\ \

Compare that with

$ echo '\\\ \\ \  \\\ \\ \' | sed 's/\([^\\]\)\?/\1\\/g'
 \\\ \\ \\  \\\ \\

Concerning the exception and the discrepancy between sed and python's 
re, I

suggest that you ask it again on comp.lang.python aka the python-list
mailing list where at least one regex guru will read it.

Peter

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor 

Re: [Tutor] RE module is working ?

2011-02-03 Thread Alan Gauld


Karim karim.liat...@free.fr wrote


Because expression = *'  '*  is in fact fact expression = '  '.
The bold appear as stars I don't know why. 


Because in the days when email was always sent in plain 
ASCII text the way to show bold was to put asterisks around 
it. Underlining used _underscores_ like so...


Obviously somebody decided that Thunderbird would stick 
with those conventions when translating HTML to text :-)


Quite smart really :-)

Alan G.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Steven D'Aprano

Karim wrote:


*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know how 
to fix it yet.


A man when to a doctor and said, Doctor, every time I do this, it 
hurts. What should I do?


The doctor replied, Then stop doing that!

:)

Don't add bold or any other formatting to things which should be program 
code. Even if it looks okay in *your* program, you don't know how it 
will look in other people's programs. If you need to draw attention to 
something in a line of code, add a comment, or talk about it in the 
surrounding text.



[...]
That is not the thing I want. I want to escape any  which are not 
already escaped.
The sed regex  '/\([^\\]\)\?/\1\\/g' is exactly what I need (I have 
made regex on unix since 15 years).


Which regex? Perl regexes? sed or awk regexes? Extended regexes? GNU 
posix compliant regexes? grep or egrep regexes? They're all different.


In any case, I am sorry, I don't think your regex does what you say. 
When I try it, it doesn't work for me.


[steve@sylar ~]$ echo 'Some \text' | sed -e 's/\([^\\]\)\?/\1\\/g'
Some \\text\

I wouldn't expect it to work. See below.

By the way, you don't need to escape the brackets or the question mark:

[steve@sylar ~]$ echo 'Some \text' | sed -re 's/([^\\])?/\1\\/g'
Some \\text\



For me the equivalent python regex is buggy: r'([^\\])?', r'\1\\'


No it is not.

The pattern you are matching does not do what you think it does. Zero 
or one of not-backslash, followed by a quote will match a single quote 
*regardless* of what is before it. This is true even in sed, as you can 
see above, your sed regex matches both quotes.


\ will match, because the regular expression will match zero 
characters, followed by a quote. So the regex is correct.


 match = r'[^\\]?'  # zero or one not-backslash followed by quote
 re.search(match, r'aaa\aaa').group()
''

Now watch what happens when you call re.sub:


 match = r'([^\\])?'  # group 1 equals a single non-backslash
 replace = r'\1\\'  # group 1 followed by \ followed by 
 re.sub(match, replace, '')  # no matches
''
 re.sub(match, replace, '')  # one match
'aa\\aa'
 re.sub(match, replace, '')  # one match, but there's no group 1
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/local/lib/python3.1/re.py, line 166, in sub
return _compile(pattern, flags).sub(repl, string, count)
  File /usr/local/lib/python3.1/re.py, line 303, in filter
return sre_parse.expand_template(template, match)
  File /usr/local/lib/python3.1/sre_parse.py, line 807, in 
expand_template

raise error(unmatched group)
sre_constants.error: unmatched group

Because group 1 was never matched, Python's re.sub raised an error. It 
is not a very informative error, but it is valid behaviour.


If I try the same thing in sed, I get something different:

[steve@sylar ~]$ echo 'Some text' | sed -re 's/([^\\])?/\1\\/g'
\Some text

It looks like this version of sed defines backreferences on the 
right-hand side to be the empty string, in the case that they don't 
match at all. But this is not standard behaviour. The sed FAQs say that 
this behaviour will depend on the version of sed you are using:


Seds differ in how they treat invalid backreferences where no 
corresponding group occurs.


http://sed.sourceforge.net/sedfaq3.html

So you can't rely on this feature. If it works for you, great, but it 
may not work for other people.



When you delete the ? from the Python regex, group 1 is always valid, 
and you don't get an exception. Or if you ensure the input always 
matches group 1, no exception:


 match = r'([^\\])?'
 replace = r'\1\\'
 re.sub(match, replace, '') # group 1 always matches
'a\\a\\a\\a'

(It still won't do what you want, but that's a *different* problem.)



Jamie Zawinski wrote:

  Some people, when confronted with a problem, think I know,
  I'll use regular expressions. Now they have two problems.

How many hours have you spent trying to solve this problem using 
regexes? This is a *tiny* problem that requires an easy solution, not 
wrestling with a programming language that looks like line-noise.


This should do what you ask for:

def escape(text):
Escape any double-quote characters if and only if they
aren't already escaped.
output = []
escaped = False
for c in text:
if c == '' and not escaped:
output.append('\\')
elif c == '\\':
output.append('\\')
escaped = True
continue
output.append(c)
escaped = False
return ''.join(output)


Armed with this helper function, which took me two minutes to write, I 
can do this:


 text = 'Some text with backslash-quotes \\ and plain quotes  
together.'

 print escape(text)
Some text with backslash-quotes \ and plain quotes \ together.


Most problems that people turn to regexes are best solved 

[Tutor] RE module is working ?

2011-02-02 Thread Karim


Hello,

I am trying to subsitute a '' pattern in '\\' namely escape 2 
consecutives double quotes:


   * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type help, copyright, credits or license for more information.
 expression = *'  '*
 re.subn(*r'([^\\])?', r'\1\\', expression*)
Traceback (most recent call last):
  File stdin, line 1, in module
  File /home/karim/build/python/install/lib/python2.7/re.py, line 
162, in subn

return _compile(pattern, flags).subn(repl, string, count)
  File /home/karim/build/python/install/lib/python2.7/re.py, line 
278, in filter

return sre_parse.expand_template(template, match)
  File /home/karim/build/python/install/lib/python2.7/sre_parse.py, 
line 787, in expand_template

raise error, unmatched group
sre_constants.error: unmatched group

But if I remove '?' I get the following:

 re.subn(r'([^\\])', r'\1\\', expression)
(' \\ ', 1)

Only one substitution..._But this is not the same REGEX._ And the 
count=2 does nothing. By default all occurrence shoul be substituted.


   * *On linux using my good old sed command, it is working with my '?'
 (0-1 match):*

*$* echo *'  '* | sed *'s/\([^\\]\)\?/\1\\/g*'*
 \\

*Indeed what's the matter with RE module!?*

*Any idea will be welcome!

Regards
Karim*
*
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] re module / separator

2009-06-24 Thread Tiago Saboga
Hi!

I am trying to split some lists out of a single text file, and I am
having a hard time. I have reduced the problem to the following one:

text = a2345b. f325. a45453b. a325643b. a435643b. g234324b.

Of this line of text, I want to take out strings where all words start
with a, end with b.. But I don't want a list of words. I want that:

[a2345b., a45453b. a325643b. a435643b.]

And I feel I still don't fully understand regular expression's logic. I
do not understand the results below:

In [33]: re.search((a[^.]*?b\.\s?){2}, text).group(0)
Out[33]: 'a45453b. a325643b. '

In [34]: re.findall((a[^.]*?b\.\s?){2}, text)
Out[34]: ['a325643b. ']

In [35]: re.search((a[^.]*?b\.\s?)+, text).group(0)
Out[35]: 'a2345b. '

In [36]: re.findall((a[^.]*?b\.\s?)+, text)
Out[36]: ['a2345b. ', 'a435643b. ']


What's the difference between search and findall in [33-34]? And why I
cannot generalize [33] to [35]? Out[35] would make sense to me if I had
put a non-greedy +, but why do re gets only one word?

Thanks,

Tiago Saboga.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module / separator

2009-06-24 Thread Serdar Tumgoren
Hey Tiago,

 text = a2345b. f325. a45453b. a325643b. a435643b. g234324b.

 Of this line of text, I want to take out strings where all words start
 with a, end with b.. But I don't want a list of words. I want that:

 [a2345b., a45453b. a325643b. a435643b.]


Are you saying you want a list of every item that starts with an a
and ends with a b? If so, the above list is not what you're after.
It only contains two items:
  a2345b.
  a45453b. a325643b. a435643b.

You can verify this by trying len([a2345b., a45453b. a325643b.
a435643b.]).  You can also see that each item is wrapped in double
quotes and separated by a comma.

 And I feel I still don't fully understand regular expression's logic. I
 do not understand the results below:

Try reading this:
http://www.amk.ca/python/howto/regex/

I've found it to be a very gentle and useful introduction to regexes.

It explains, among other things, what the search and findall methods
do. If I'm understanding your problem correctly, you probably want the
findall method:

You should definitely take the time to read up on regexes. Your
patterns grew too complex for this problem (again, if I'm
understanding you right) which is probably why you're not
understanding your results.

In [9]:   re.findall(r'a[a-z0-9]+b',text)
Out[9]: ['a2345b', 'a45453b', 'a325643b', 'a435643b']

There are other ways to perform the above, for instance using the \w
metacharacter to match any alphanumeric.

In [20]: re.findall(r'a\w+b',text)
Out[20]: ['a2345b', 'a45453b', 'a325643b', 'a435643b']

Or, to get even more (needlessly) complicated:

In [21]: re.findall(r'\ba\w+b\b',text)
Out[21]: ['a2345b', 'a45453b', 'a325643b', 'a435643b']

As you learned, regexes can get really complicated, really quickly if
you don't understand the syntax.  Others with more experience might
offer more elegant solutions to your problem, but I'd still encourage
you to read up on the basics and get comfortable with the re module.
It's a great tool once you understand it.

Best of luck,
Serdar
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module / separator

2009-06-24 Thread Tiago Saboga
Serdar Tumgoren zstumgo...@gmail.com writes:

 Hey Tiago,

 text = a2345b. f325. a45453b. a325643b. a435643b. g234324b.

 Of this line of text, I want to take out strings where all words start
 with a, end with b.. But I don't want a list of words. I want that:

 [a2345b., a45453b. a325643b. a435643b.]


 Are you saying you want a list of every item that starts with an a
 and ends with a b? If so, the above list is not what you're after.
 It only contains two items:
   a2345b.
   a45453b. a325643b. a435643b.

Yes, I want to find only two items. I want every sequence of words where
every word begins with an a and ends with b..

 Try reading this:
 http://www.amk.ca/python/howto/regex/

I have read several times, and I thought I understood it quite well ;)

I have not the time right now to do it, but if it turns out to be
useful, I can show why I came to the patterns I sent to the list.

Thanks,

Tiago.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module / separator

2009-06-24 Thread Kent Johnson
On Wed, Jun 24, 2009 at 2:24 PM, Tiago Sabogatiagosab...@gmail.com wrote:
 Hi!

 I am trying to split some lists out of a single text file, and I am
 having a hard time. I have reduced the problem to the following one:

 text = a2345b. f325. a45453b. a325643b. a435643b. g234324b.

 Of this line of text, I want to take out strings where all words start
 with a, end with b.. But I don't want a list of words. I want that:

 [a2345b., a45453b. a325643b. a435643b.]

 And I feel I still don't fully understand regular expression's logic. I
 do not understand the results below:

 In [33]: re.search((a[^.]*?b\.\s?){2}, text).group(0)
 Out[33]: 'a45453b. a325643b. '

group(0) is the entire match so this returns what you expect. But what
is group(1)?

In [6]: re.search((a[^.]*?b\.\s?){2}, text).group(1)
Out[6]: 'a325643b. '

Repeated groups are tricky; the returned value contains only the first
match for the group, not the repeats.

 In [34]: re.findall((a[^.]*?b\.\s?){2}, text)
 Out[34]: ['a325643b. ']

When the re contains groups, re.findall() returns the groups. It
doesn't return the whole match. So this is giving group(1), not
group(0). You can get the whole match by explicitly grouping it:

In [4]: re.findall(((a[^.]*?b\.\s?){2}), text)
Out[4]: [('a45453b. a325643b. ', 'a325643b. ')]

 In [35]: re.search((a[^.]*?b\.\s?)+, text).group(0)
 Out[35]: 'a2345b. '

You only get the first match, so this is correct.

 In [36]: re.findall((a[^.]*?b\.\s?)+, text)
 Out[36]: ['a2345b. ', 'a435643b. ']

This is finding both matches but the grouping has the same difficulty
as the previous findall(). This is closer:

In [7]: re.findall(((a[^.]*?b\.\s?)+), text)
Out[7]: [('a2345b. ', 'a2345b. '), ('a45453b. a325643b. a435643b. ',
'a435643b. ')]

If you change the inner parentheses to be non-grouping then you get
pretty much what you want:

In [8]: re.findall(((?:a[^.]*?b\.\s?)+), text)
Out[8]: ['a2345b. ', 'a45453b. a325643b. a435643b. ']

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module / separator

2009-06-24 Thread Serdar Tumgoren
As usual, Kent Johnson has swooped in an untangled the mess with a
clear explanation.

By the time a regex gets this complicated, I typically start thinking
of ways to simplify or avoid them altogether.

Below is the code I came up with. It goes through some gymnastics and
can surely stand improvement, but it seems to get the job done.
Suggestions are welcome.


In [83]: text
Out[83]: 'a2345b. f325. a45453b. a325643b. a435643b. g234324b.'

In [84]: textlist = text.split()

In [85]: textlist
Out[85]: ['a2345b.', 'f325.', 'a45453b.', 'a325643b.', 'a435643b.', 'g234324b.']

In [86]: newlist = []

In [87]: pat = re.compile(r'a\w+b\.')

In [88]: for item in textlist:
   : if pat.match(item):
   : newlist.append(item)
   : else:
   : newlist.append(|)
   :
   :

In [89]: newlist
Out[89]: ['a2345b.', '|', 'a45453b.', 'a325643b.', 'a435643b.', '|']

In [90]: lastlist = ''.join(newlist)

In [91]: lastlist
Out[91]: 'a2345b.|a45453b.a325643b.a435643b.|'

In [92]: lastlist.rstrip(|).split(|)
Out[92]: ['a2345b.', 'a45453b.a325643b.a435643b.']
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module / separator

2009-06-24 Thread Serdar Tumgoren
Ok -- realized my solution incorrectly strips white space from
multiword strings:

 Out[92]: ['a2345b.', 'a45453b.a325643b.a435643b.']


So here are some more gymnastics to get the correct result:

In [105]: newlist
Out[105]: ['a2345b.', '|', 'a45453b.', 'a325643b.', 'a435643b.', '|']

In [109]: lastlist2 =  .join(newlist).rstrip(|).split(|)

In [110]: lastlist3 = [item.strip() for item in lastlist2]

In [111]: lastlist3
Out[111]: ['a2345b.', 'a45453b. a325643b. a435643b.']
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Re: Module Loop doesn't work (Joseph Q.)

2005-04-01 Thread Andrei
Joseph Quigley wrote on Fri, 01 Apr 2005 10:07:08 -0600:

   I have some code on a geek dictionary that I'm making where the command 
 geeker() opens a module for the real geek dictionary (where you can type 
 a word to see what it is geekified). Supposedly, you type lobby() to go 
 back to what I call  the lobby (where you can get info on the web site and 
 email and version). But it just loops back to the Geeker prompt where 
 you type the word that you want geekified. I even tried having it restart 
 the whole program by importing the index module that I wrote. But it still 
 won't restart the program!

Without seeing your code, I doubt anyone will be able to solve your problem
except by pure chance. In addition to that, I'm confused by the use of
function calls in what seems te be essentially a menu system. 

Speaking in general terms, the way you could handle this is as follows:
- have a main menu loop (what you call the lobby) which accepts user input
and based on that input calls other functions which perform certain tasks
(e.g. open a webpage or go to the dictionary part)
- the dictionary part would in turn be another loop accepting words as
input which get 'translated', until the user gives a blank string or
whatever as input in order to terminate the loop (and automatically fall
back into the loop of the lobby)

-- 
Yours,

Andrei

=
Real contact info (decode with rot13):
[EMAIL PROTECTED] Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V ernq
gur yvfg, fb gurer'f ab arrq gb PP.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor