One other thing, Liam.  Download the latest version of pyparsing (1.3.2),
and make this change to the assignment statement:

assignment << pp.Dict( pp.Group( LHS + EQUALS + RHS ) )


Now you can write clean-looking code like:

test = """j = { line = { foo = 10 bar = 20 } } }"""
res = assignment.parseString(test)
print res
print res.j
print res.j.line
print res.j.line.foo
print res.j.line.bar


And get:

[['j', [['line', [['foo', '10'], ['bar', '20']]]]]]
[['line', [['foo', '10'], ['bar', '20']]]]
[['foo', '10'], ['bar', '20']]
10
20


-- Paul
 

-----Original Message-----
From: Liam Clarke [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 25, 2005 7:38 AM
To: Paul McGuire
Cc: tutor@python.org
Subject: Re: [Tutor] Parsing problem

Hi Paul, 

Well various tweaks and such done, it parses perfectly, so much thanks, I
think I now have a rough understanding of the basics of pyparsing. 

Now, onto the fun part of optimising it. At the moment, I'm looking at 2 - 5
minutes to parse a 2000 line country section, and that's with psyco. Only
problem is, I have 157 country sections...

I am running a 650 MHz processor, so that isn't helping either. I read this
quote on http://pyparsing.sourceforge.net.

"Thanks again for your help and thanks for writing pyparser! It seems my
code needed to be optimized and now I am able to parse a 200mb file in 3
seconds. Now I can stick my tongue out at the Perl guys ;)"

I'm jealous, 200mb in 3 seconds, my file's only 4mb.

Are there any general approaches to optimisation that work well?

My current thinking is to use string methods to split the string into each
component section, and then parse each section to a bare minimum key, value.
ie - instead of parsing 

x = { foo = { bar = 10 bob = 20 } type = { z = { } y = { } }}

out fully, just parse to "x":"{ foo = { bar = 10 bob = 20 } type = { z = { }
y = { } }}"

I'm thinking that would avoid the complicated nested structure I have now,
and I could parse data out of the string as needed, if needed at all.

Erk, I don't know, I've never had to optimise anything. 

Much thanks for creating pyparsing, and doubly thank-you for your assistance
in learning how to use it. 

Regards, 

Liam Clarke

On 7/25/05, Liam Clarke <[EMAIL PROTECTED]> wrote:

        Hi Paul, 
        
        My apologies, as I was jumping into my car after sending that email,
it clicked in my brain. 
        "Oh yeah... initial & body..."
        
        But good to know about how to accept valid numbers.
        
        Sorry, getting a bit too quick to fire off emails here.
        
        Regards, 
        
        Liam Clarke
        
        
        On 7/25/05, Paul McGuire < [EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]> > wrote:
        

                Liam -
                
                The two arguments to Word work this way:
                - the first argument lists valid *initial* characters
                - the second argument lists valid *body* or subsequent
characters
                
                For example, in the identifier definition, 
                
                identifier = pp.Word(pp.alphas, pp.alphanums + "_/:.")
                
                identifiers *must* start with an alphabetic character, and
then may be
                followed by 0 or more alphanumeric or _/: or . characters.
If only one 
                argument is supplied, then the same string of characters is
used as both
                initial and body.  Identifiers are very typical for 2
argument Word's, as
                they often start with alphas, but then accept digits and
other punctuation. 
                No whitespace is permitted within a Word.  The Word matching
will end when a
                non-body character is seen.
                
                Using this definition:
                
                integer = pp.Word(pp.nums+"-+.", pp.nums)
                
                It will accept "+123", "-345", "678", and ".901".  But in a
real number, a 
                period may occur anywhere in the number, not just as the
initial character,
                as in "3.14159".  So your bodyCharacters must also include a
".", as in:
                
                integer = pp.Word(pp.nums+"-+.", pp.nums+".")
                
                Let me say, though, that this is a very permissive
definition of integer -
                for one thing, we really should rename it something like
"number", since it
                now accepts non-integers as well!  But also, there is no
restriction on the 
                frequency of body characters.  This definition would accept
a "number" that
                looks like "3.4.3234.111.123.3234".  If you are certain that
you will only
                receive valid inputs, then this simple definition will be
fine.  But if you 
                will have to handle and reject erroneous inputs, then you
might do better
                with a number definition like:
                
                number = Combine( Word( "+-"+nums, nums ) +
                                  Optional( point + Optional( Word( nums ) )
) )
                
                This will handle "+123", "-345", "678", and "0.901", but not
".901".  If you
                want to accept numbers that begin with "."s, then you'll
need to tweak this 
                a bit further.
                
                One last thing: you may want to start using setName() on
some of your
                expressions, as in:
                
                number = Combine( Word( "+-"+nums, nums ) +
                                  Optional( point + Optional( Word( nums ) )
)
                ).setName("number")
                
                Note, this is *not* the same as setResultsName.  Here
setName is attaching a
                name to this pattern, so that when it appears in an
exception, the name will 
                be used instead of an encoded pattern string (such as
W:012345...).  No need
                to do this for Literals, the literal string is used when it
appears in an
                exception.
                
                -- Paul
                
                
                




        -- 
        
        'There is only one basic human right, and that is to do as you damn
well please.
        And with it comes the only basic human duty, to take the
consequences.' 




--
'There is only one basic human right, and that is to do as you damn well
please.
And with it comes the only basic human duty, to take the consequences.' 

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to