Okay, let me try again.  I need to end up with a string that C++ can
swallow. The way I have it coded now it will find
match(0):  href="http://someurl/page.html";
match(1): href
match(2): ="http://someurl/page.html";

I don't believe this is right and I suspect it has something to do with the
"r" in front of the third line.  Can someone tell me what the r does?
I found that whitespace is really [ \r\t\n].

Bill




                                                                                       
                 
                                                                                       
                 
                                         To:     "'Plucker Development List'"          
                 
                    Nicolas              <[EMAIL PROTECTED]>               
                 
                    Huillard             cc:     (bcc: Bill Nalen/Towers Perrin)       
                 
                    <nhuillard@gh        Subject:     RE: Windows Palm conduit         
                 
                    s.fr>                                                              
                 
                                                                                       
                 
                    06/06/2001                                                         
                 
                    09:34 AM                                                           
                 
                    Please                                                             
                 
                    respond to                                                         
                 
                    Plucker                                                            
                 
                    Development                                                        
                 
                    List                                                               
                 
                                                                                       
                 
                                                                                       
                 



I think % are not comments (because the first '(' on the second line
matches the last ')' on the same line). They might just be pattern
replacers...
I guess the one-liner shuld simply be

attrfind = re.compile('[%s]*([a-zA-Z_][-.a-zA-Z_0-9]*)' % string.whitespace

+ ('([%s]*=[%s]*' % (string.whitespace, string.whitespace)) +
r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./:+*%?!\(\)_#=~]*))?')

or maybe just add parens around each line :

attrfind = re.compile(('[%s]*([a-zA-Z_][-.a-zA-Z_0-9]*)' %
string.whitespace) + ('([%s]*=[%s]*' % (string.whitespace,
string.whitespace)) +
(r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./:+*%?!\(\)_#=~]*))?'))

NH


> -----Message d'origine-----
> De:           Bill Nalen/Towers Perrin [SMTP:[EMAIL PROTECTED]]
> Date:         mercredi 6 juin 2001 15:22
> �:       Plucker Development List
> Objet:        Re: Windows Palm conduit
>
>
> Can someone translate the following into a single line?
>
> It's from the sgml library and is supposed to allow me to find the
> attributes within a tag.  I have a version of the pcre, but I don't know
> what the second and third lines are supposed to do.
>
> attrfind = re.compile(
>     '[%s]*([a-zA-Z_][-.a-zA-Z_0-9]*)' % string.whitespace
>     + ('([%s]*=[%s]*' % (string.whitespace, string.whitespace))
>     + r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./:+*%?!\(\)_#=~]*))?')
>
> I thought it was
>
> '[ ]*([a-zA-Z_][-.a-zA-Z_0-9]*)([ ]*=[
]*('[^']*'|\"[^\"]*\"|[-a-zA-Z0-9./:
> +*%?!\\_#=~]*))?'
>
> but that doesn't seem to work quite right.  I guess I don't know what the

r
> does in the third line.
>
> I am assuming that for the tag
> <a href="http://someurl/page.html";>
> I would get href, and http://someurl/page.html for the attributes from
the
> re.match command.
>
> Any help would be appreciated.
> Bill
>




Reply via email to