[Python-announce] TatSu 5.10.2 - PEG parser generator

2023-11-23 Thread Juancarlo Añez
竜 TatSu is a tool that takes grammars in a variation of EBNF as input, and
outputs memoizing (Packrat) PEG parsers in Python.

Please take a look at the release log for the latest improvements and fixes:

https://github.com/neogeny/TatSu/releases

--
Juancarlo Añez
mailto:apal...@gmail.com
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com


[Python-announce] TatSu v5.9.0 - PEG parser generator

2023-10-22 Thread Juancarlo Añez
竜 TatSu is a tool that takes grammars in a variation of EBNF as input, and
outputs memoizing (Packrat) PEG parsers in Python.

Why use a PEG parser? Because regular languages (those parsable with
Python's `re` package) "cannot count". Any input with nested structures or
with balancing of demarcations requires more than regular expressions to be
parsed.

竜 TatSu can compile a grammar stored in a string into a
`tatsu.grammars.Grammar` object that can be used to parse any given input,
much like the `re` module does with regular expressions, or it can generate
a Python module that implements the parser.

On this release:


   - validate with Python 3.12 (#313
   )
   - drop support for Python 3.10 (#313
   )
   - move build configuration to pyproject.toml (#316
    #317
   )
   - evaluate constant to a Python literal when possible (#304
    #320
   )
   - fix comments_re and eol_comments_re so they effectively can be None (
   #307  #312
    #314
   )
   - skip over whitespace and comments before memoizing (#305
    #306
    #309
    #318
   )
   - verify that () parses to None or is ignored (#308
   )


--
Juancarlo Añez
mailto:apal...@gmail.com
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com


TatSu v4.4.0 PEG parser generator released

2019-04-25 Thread Juancarlo Añez
竜 TatSu v4.4.0 has been released. Thanks to Vic Nightfall, support for
left-recursion in PEG grammars is now complete.

def WARNING():
return 'v4.4.0 is the last version of 竜TatSu supporting Python 2.7'

竜 TatSu (the successor to Grako) is a tool that takes grammars in a
variation of EBNF as input, and outputs memoizing (Packrat) PEG parsers
in Python.

竜 TatSu can compile a grammar stored in a string into a
tatsu.grammars.Grammar object that can be used to parse any given input,
much like the re module does with regular expressions, or it can
generate a Python module that implements the parser.

竜 TatSu supports left-recursive rules in PEG grammars using the
algorithm by _Laurent_ and _Mens_. The generated AST has the expected
left associativity.


LINKS
*   https://pypi.org/project/TatSu/4.4.0/
*   https://tatsu.readthedocs.io/
*   https://github.com/neogeny/TatSu


CHANGELOG
*   The default regexp for whitespace was changed to `(?s)s+
*   Allow empty patterns (//) like Python does
*   #65 Allow initial, consecutive, and trailing @namechars
*   #73 Allow @@whitespace :: None and @@whitespace :: False
*   #75 Complete implemenation of left recursion(@Victorious3)
*   #77 Allow @keyword throughout the grammar
*   #89 Make all attributes defined in the rule present in the resulting
AST or Node even if the associated expression was not parsed
*   #93 Fix trace colorization on Windows
*   #96 Documented each @@directive
*   Switched the documentation to the "Alabaster" theme
*   Various code and documentation fixes (@davesque, @nicholasbishop,
@rayjolt)


-- 
Juancarlo *Añez*
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


PEP/GSoC idea: built-in parser generator module for Python?

2014-03-14 Thread Peter Mawhorter
First of all, hi everyone, I'm new to this list.

I'm a grad student who's worked on and off with Python on various
projects for 8ish years now. I recently wanted to construct a parser
for another programing language in Python and was dissapointed that
Python doesn't have a built-in module for building parsers, which
seems like a common-enough task. There are plenty of different
3rd-party parsing libraries available, specialized in lots of
different ways (see e.g., [1]). I happened to pick one that seemed
suitable for my needs but didn't turn out to support the recursive
structures that I needed to parse. Rather than pick a different one I
just built my own parser generator module, and used that to build my
parser: problem solved.

It would have been much nicer if there were a fully-featured builtin
parser generator module in Python, however, and the purpose of this
email is to test the waters a bit: is this something that other people
in the Python community would be interested in? I imagine the route to
providing a built-in parser generator module would be to first canvass
the community to figure out what third-party libraries they use, and
then contact the developers of some of the top libraries to see if
they'd be happy integrating as a built-in module. At that point
someone would need to work to integrate the chosen third-party library
as a built-in module (ideally with its developers).

From what I've looked at PyParsing and PLY seem to be standout parser
generators for Python, PyParsing has a bit more Pythonic syntax from
what I've seen. One important issue would be speed though: an
implementation mostly written in C for low-level parsing tasks would
probably be much preferrable to one written in pure Python, since a
builtin module should be geared towards efficiency, but I don't
actually know exactly how that would work (I've both extended and
embedded Python with/in C before, but I'm not sure how that kind of
project relates to writing a built-in module in C).

Sorry if this is a bit rambly, but I'm interested in feedback from the
community on this idea: is a builtin parser generator module
desirable? If so, would integrating PyParsing as a builtin module be a
good solution? What 3rd-party parsing module do you think would serve
best for this purpose?

-Peter Mawhorter

[1] http://nedbatchelder.com/text/python-parsers.html
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP/GSoC idea: built-in parser generator module for Python?

2014-03-14 Thread Terry Reedy

On 3/14/2014 2:51 PM, Peter Mawhorter wrote:

First of all, hi everyone, I'm new to this list.


Welcome.


I'm a grad student who's worked on and off with Python on various
projects for 8ish years now. I recently wanted to construct a parser
for another programing language in Python and was dissapointed that
Python doesn't have a built-in module for building parsers, which
seems like a common-enough task. There are plenty of different
3rd-party parsing libraries available, specialized in lots of
different ways (see e.g., [1]). I happened to pick one that seemed
suitable for my needs but didn't turn out to support the recursive
structures that I needed to parse. Rather than pick a different one I
just built my own parser generator module, and used that to build my
parser: problem solved.

It would have been much nicer if there were a fully-featured builtin
parser generator module in Python, however, and the purpose of this
email is to test the waters a bit: is this something that other people
in the Python community would be interested in? I imagine the route to
providing a built-in parser generator module would be to first canvass
the community to figure out what third-party libraries they use, and
then contact the developers of some of the top libraries to see if
they'd be happy integrating as a built-in module. At that point
someone would need to work to integrate the chosen third-party library
as a built-in module (ideally with its developers).


I think the idea has been raised before, but I am not sure which list 
(this one, pydev, or python-ideas).


My first reaction, as a core developer, is that the stdlib is, if 
anything, too large. It is already not as well-maintained as we would like.


My second is that parser generation is an application, not a library. A 
parser generator is used by running it with an input specification, not 
by importing it and using specific functions and classes.



 From what I've looked at PyParsing and PLY seem to be standout parser
generators for Python, PyParsing has a bit more Pythonic syntax from
what I've seen. One important issue would be speed though: an
implementation mostly written in C for low-level parsing tasks would
probably be much preferrable to one written in pure Python, since a
builtin module should be geared towards efficiency, but I don't
actually know exactly how that would work (I've both extended and
embedded Python with/in C before, but I'm not sure how that kind of
project relates to writing a built-in module in C).


Something written in Python can be run with any implementation of 
Python. Something written in C tends to be tied to CPython,




Sorry if this is a bit rambly, but I'm interested in feedback from the
community on this idea: is a builtin parser generator module
desirable? If so, would integrating PyParsing as a builtin module be a
good solution? What 3rd-party parsing module do you think would serve
best for this purpose?



[1] http://nedbatchelder.com/text/python-parsers.html


Perhaps something like this should be in the wiki, if not already.

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-27 Thread Paul McGuire
On Aug 26, 10:48 pm, Steven Bethard [EMAIL PROTECTED] wrote:
 Paul McGuire wrote:
  On Aug 26, 8:05 pm, Ryan Ginstrom [EMAIL PROTECTED] wrote:
  The only caveat being that since Chinese and Japanese scripts don't
  typically delimit words with spaces, I think you'd have to pass the text
  through a tokenizer (like ChaSen for Japanese) before using PyParsing.

  Did you think pyparsing is so mundane as to require spaces between
  tokens?  Pyparsing has been doing this type of token-recognition since
  Day 1.  Looking for tokens without delimiting spaces was one of the
  first applications for pyparsing.  This issue is not unique to Chinese
  or Japanese text.  Pyparsing will easily find the tokens in this
  string:

  y=a*x**2+b*x+c

  as

  ['y','=','a','*','x','**','2','+','b','*','x','+','c']

 The difference is that in the expression above (and in many other
 tokenization problems) you can determine word boundaries by looking at
 the class of character, e.g. alphanumeric vs. punctuation vs. whatever.

 In Japanese and Chinese tokenization, word boundaries are not marked by
 different classes of characters. They only exist in the mind of the
 reader who knows which sequences of characters could be words given the
 context, and which sequences of characters couldn't.

 The closest analog would be to ask pyparsing to find the words in the
 following sentence:

 Thepyparsingmoduleprovidesalibraryofclassesthatclientcodeusestoconstructthe­grammardirectlyinPythoncode.

 Most approaches that have been even marginally successful on these kinds
 of tasks have used statistical machine learning approaches.

 STeVe- Hide quoted text -

 - Show quoted text -

Steve -

You mean like this?

from pyparsing import *

knownWords = ['of', 'grammar', 'construct', 'classes', 'a',
'client', 'pyparsing', 'directly', 'the', 'module', 'uses',
'that', 'in', 'python', 'library', 'provides', 'code', 'to']

knownWord = oneOf( knownWords, caseless=True )
sentence = OneOrMore( knownWord ) + .

mush =
ThepyparsingmoduleprovidesalibraryofclassesthatclientcodeusestoconstructthegrammardirectlyinPythoncode.

print sentence.parseString( mush )

prints:

['the', 'pyparsing', 'module', 'provides', 'a', 'library', 'of',
'classes', 'that', 'client', 'code', 'uses', 'to', 'construct',
'the', 'grammar', 'directly', 'in', 'python', 'code', '.']

In fact, this is almost the exact scheme used by Zhpy for extracting
Chinese versions of Python keywords, and mapping them back to English/
Latin words.  Of course, this is not practical for natural language
processing, as the vocabulary gets too large. And you can get
ambiguous matches, such as a vocabulary containing the words ['in',
'to', 'into'] - the runtogether into will always be assumed to be
into, and never in to.  Fortunately (for pyparsing), your example
was sufficiently friendly as to avoid ambiguities.  But if you can
select a suitable vocabulary, even a runon mush is parseable.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Parser Generator?

2007-08-27 Thread Steven Bethard
Paul McGuire wrote:
 On Aug 26, 10:48 pm, Steven Bethard [EMAIL PROTECTED] wrote:
 In Japanese and Chinese tokenization, word boundaries are not marked by
 different classes of characters. They only exist in the mind of the
 reader who knows which sequences of characters could be words given the
 context, and which sequences of characters couldn't.

 The closest analog would be to ask pyparsing to find the words in the
 following sentence:

 Thepyparsingmoduleprovidesalibraryofclassesthatclientcodeusestoconstructthe­grammardirectlyinPythoncode.

 Most approaches that have been even marginally successful on these kinds
 of tasks have used statistical machine learning approaches.
 
 You mean like this?
 
 from pyparsing import *
 
 knownWords = ['of', 'grammar', 'construct', 'classes', 'a',
 'client', 'pyparsing', 'directly', 'the', 'module', 'uses',
 'that', 'in', 'python', 'library', 'provides', 'code', 'to']
 
 knownWord = oneOf( knownWords, caseless=True )
 sentence = OneOrMore( knownWord ) + .
 
 mush =
 ThepyparsingmoduleprovidesalibraryofclassesthatclientcodeusestoconstructthegrammardirectlyinPythoncode.
 
 print sentence.parseString( mush )
 
 prints:
 
 ['the', 'pyparsing', 'module', 'provides', 'a', 'library', 'of',
 'classes', 'that', 'client', 'code', 'uses', 'to', 'construct',
 'the', 'grammar', 'directly', 'in', 'python', 'code', '.']
 
 In fact, this is almost the exact scheme used by Zhpy for extracting
 Chinese versions of Python keywords, and mapping them back to English/
 Latin words.  Of course, this is not practical for natural language
 processing, as the vocabulary gets too large. And you can get
 ambiguous matches, such as a vocabulary containing the words ['in',
 'to', 'into'] - the runtogether into will always be assumed to be
 into, and never in to.

Yep, and these kinds of things occur quite frequently with Chinese and 
Japanese. The point was not that pyparsing couldn't do it for a small 
subset of characters/words, but that pyparsing is probably not the right 
solution for general purpose Japanese/Chinese tokenization.

Steve
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-26 Thread Jason Evans
On Aug 24, 1:21 pm, Jack [EMAIL PROTECTED] wrote:
 Jason Evans [EMAIL PROTECTED] wrote in message
 http://www.canonware.com/Parsing/

 Thanks Jason. Does Parsing.py support Unicode characters (especially CJK)?
 I'll take a look.

Parsers typically deal with tokens rather than individual characters,
so the scanner that creates the tokens is the main thing that Unicode
matters to.  I have written Unicode-aware scanners for use with
Parsing-based parsers, with no problems.  This is pretty easy to do,
since Python has built-in support for Unicode strings.

Jason

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-26 Thread Jack
Good to know, thanks Paul.
!
Paul McGuire [EMAIL PROTECTED] wrote in message

 Pyparsing was already mentioned once on this thread.  Here is an
 application using pyparsing that parses Chinese characters to convert
 to English Python.

 http://pypi.python.org/pypi/zhpy/0.5

 -- Paul 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-26 Thread Jack
Thanks Json. There seem to be a few options that I can pursue. Having a hard 
time
chooing one now :)

Jason Evans [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 On Aug 24, 1:21 pm, Jack [EMAIL PROTECTED] wrote:
 Jason Evans [EMAIL PROTECTED] wrote in message
 http://www.canonware.com/Parsing/

 Thanks Jason. Does Parsing.py support Unicode characters (especially 
 CJK)?
 I'll take a look.

 Parsers typically deal with tokens rather than individual characters,
 so the scanner that creates the tokens is the main thing that Unicode
 matters to.  I have written Unicode-aware scanners for use with
 Parsing-based parsers, with no problems.  This is pretty easy to do,
 since Python has built-in support for Unicode strings.

 Jason
 


-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Parser Generator?

2007-08-26 Thread Ryan Ginstrom
 On Behalf Of Jason Evans
 Parsers typically deal with tokens rather than individual 
 characters, so the scanner that creates the tokens is the 
 main thing that Unicode matters to.  I have written 
 Unicode-aware scanners for use with Parsing-based parsers, 
 with no problems.  This is pretty easy to do, since Python 
 has built-in support for Unicode strings.

The only caveat being that since Chinese and Japanese scripts don't
typically delimit words with spaces, I think you'd have to pass the text
through a tokenizer (like ChaSen for Japanese) before using PyParsing.

Regards,
Ryan Ginstrom

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-26 Thread Paul McGuire
On Aug 26, 8:05 pm, Ryan Ginstrom [EMAIL PROTECTED] wrote:
  On Behalf Of Jason Evans
  Parsers typically deal with tokens rather than individual
  characters, so the scanner that creates the tokens is the
  main thing that Unicode matters to.  I have written
  Unicode-aware scanners for use with Parsing-based parsers,
  with no problems.  This is pretty easy to do, since Python
  has built-in support for Unicode strings.

 The only caveat being that since Chinese and Japanese scripts don't
 typically delimit words with spaces, I think you'd have to pass the text
 through a tokenizer (like ChaSen for Japanese) before using PyParsing.

 Regards,
 Ryan Ginstrom

Did you think pyparsing is so mundane as to require spaces between
tokens?  Pyparsing has been doing this type of token-recognition since
Day 1.  Looking for tokens without delimiting spaces was one of the
first applications for pyparsing.  This issue is not unique to Chinese
or Japanese text.  Pyparsing will easily find the tokens in this
string:

y=a*x**2+b*x+c

as

['y','=','a','*','x','**','2','+','b','*','x','+','c']

even though there is not a single delimiting space.  But pyparsing
will also render this as a nested parse tree, reflecting the
precedence of operations:

['y', '=', [['a', '*', ['x', '**', 2]], '+',['b', '*', 'x'], '+',
'c']]

and will allow you to access individual tokens by field name:
- lhs: y
- rhs: [['a', '*', ['x', '**', 2]], '+', ['b', '*', 'x'], '+', 'c']

Please feel free to look through the posted examples on the pyparsing
wiki at http://pyparsing.wikispaces.com/Examples, or some of the
applications currently using pyparsing at 
http://pyparsing.wikispaces.com/WhosUsingPyparsing,
and you might get a better feel for what kind of tasks pyparsing is
capable of.

-- Paul

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-26 Thread Steven Bethard
Paul McGuire wrote:
 On Aug 26, 8:05 pm, Ryan Ginstrom [EMAIL PROTECTED] wrote:
 The only caveat being that since Chinese and Japanese scripts don't
 typically delimit words with spaces, I think you'd have to pass the text
 through a tokenizer (like ChaSen for Japanese) before using PyParsing.
 
 Did you think pyparsing is so mundane as to require spaces between
 tokens?  Pyparsing has been doing this type of token-recognition since
 Day 1.  Looking for tokens without delimiting spaces was one of the
 first applications for pyparsing.  This issue is not unique to Chinese
 or Japanese text.  Pyparsing will easily find the tokens in this
 string:
 
 y=a*x**2+b*x+c
 
 as
 
 ['y','=','a','*','x','**','2','+','b','*','x','+','c']

The difference is that in the expression above (and in many other 
tokenization problems) you can determine word boundaries by looking at 
the class of character, e.g. alphanumeric vs. punctuation vs. whatever.

In Japanese and Chinese tokenization, word boundaries are not marked by 
different classes of characters. They only exist in the mind of the 
reader who knows which sequences of characters could be words given the 
context, and which sequences of characters couldn't.

The closest analog would be to ask pyparsing to find the words in the 
following sentence:

ThepyparsingmoduleprovidesalibraryofclassesthatclientcodeusestoconstructthegrammardirectlyinPythoncode.

Most approaches that have been even marginally successful on these kinds 
of tasks have used statistical machine learning approaches.

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Parser Generator?

2007-08-26 Thread Ryan Ginstrom
 On Behalf Of Paul McGuire
 
 On Aug 26, 8:05 pm, Ryan Ginstrom [EMAIL PROTECTED] wrote:
  The only caveat being that since Chinese and Japanese scripts don't 
  typically delimit words with spaces, I think you'd have 
 to pass the 
  text through a tokenizer (like ChaSen for Japanese) before 
 using PyParsing.
 
 Did you think pyparsing is so mundane as to require spaces 
 between tokens?  Pyparsing has been doing this type of 
 token-recognition since Day 1.

Cool! I stand happily corrected. I did write I think because although I
couldn't find a way to do it, there might well actually be one g. I'll
keep looking to find some examples of parsing Japanese.

BTW, I think PyParsing is great, and I use it for several tasks. I just
could never figure out a way to use it with Japanese (at least on the
applications I had in mind).

Regards,
Ryan Ginstrom

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-24 Thread Jack
Thanks Jason. Does Parsing.py support Unicode characters (especially CJK)?
I'll take a look.

Jason Evans [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 On Aug 18, 3:22 pm, Jack [EMAIL PROTECTED] wrote:
 Hi all, I need to do syntax parsing of simple naturual languages,
 for example, weather of London or what is the time, simple
 things like these, with Unicode support in the syntax.

 In Java, there are JavaCC, Antlr, etc. I wonder what people use
 in Python? Antlr also has Python support but I'm not sure how good
 it is. Comments/hints are welcome.

 I use Parsing.py.  I like it a lot, probably because I wrote it.

http://www.canonware.com/Parsing/

 Jason
 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-24 Thread Paul McGuire
On Aug 18, 11:37 pm, Jack [EMAIL PROTECTED] wrote:
 Thanks for all the replies!

 SPARK looks promising. Its doc doesn't say if it handles unicode
 (CJK in particular) encoding though.

 Yapps also looks powerful:http://theory.stanford.edu/~amitp/yapps/

 There's also PyGgyhttp://lava.net/~newsham/pyggy/

 I may also give Antlr a try.

 If anyone has experiences using any of the parser generators with CJK
 languages, I'd be very interested in hearing that.

 Jack

 Jack [EMAIL PROTECTED] wrote in message

 news:[EMAIL PROTECTED]



  Hi all, I need to do syntax parsing of simple naturual languages,
  for example, weather of London or what is the time, simple
  things like these, with Unicode support in the syntax.

  In Java, there are JavaCC, Antlr, etc. I wonder what people use
  in Python? Antlr also has Python support but I'm not sure how good
  it is. Comments/hints are welcome.- Hide quoted text -

 - Show quoted text -

Jack -

Pyparsing was already mentioned once on this thread.  Here is an
application using pyparsing that parses Chinese characters to convert
to English Python.

http://pypi.python.org/pypi/zhpy/0.5

-- Paul

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-22 Thread Jason Evans
On Aug 18, 3:22 pm, Jack [EMAIL PROTECTED] wrote:
 Hi all, I need to do syntax parsing of simple naturual languages,
 for example, weather of London or what is the time, simple
 things like these, with Unicode support in the syntax.

 In Java, there are JavaCC, Antlr, etc. I wonder what people use
 in Python? Antlr also has Python support but I'm not sure how good
 it is. Comments/hints are welcome.

I use Parsing.py.  I like it a lot, probably because I wrote it.

http://www.canonware.com/Parsing/

Jason

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-19 Thread samwyse
Jack wrote:
 Thanks for all the replies!
 
 SPARK looks promising. Its doc doesn't say if it handles unicode
 (CJK in particular) encoding though.
 
 Yapps also looks powerful: http://theory.stanford.edu/~amitp/yapps/
 
 There's also PyGgy http://lava.net/~newsham/pyggy/
 
 I may also give Antlr a try.
 
 If anyone has experiences using any of the parser generators with CJK
 languages, I'd be very interested in hearing that.

I'm going to echo Tommy's reply.  If you want to parse natural language, 
conventional parsers are going to be worse than useless (because you'll 
keep thinking, Just one more tweak and this time it'll work for 
sure!).  Instead, go look at what the interactive fiction community 
uses.  They analyse the statement in multiple passes, first picking out 
the verbs, then the noun phrases.  Some of their parsers can do 
on-the-fly domain-specific spelling correction, etc, and all of them can 
ask the user for clarification.  (I'm currently cobbling together 
something similar for pre-teen users.)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-19 Thread Jack
Thanks for the suggestion. I understand that more work is needed for natural 
language
understanding. What I want to do is actually very simple - I pre-screen the 
user
typed text. If it's a simple syntax my code understands, like, Weather in 
London, I'll
redirect it to a weather site. Or, if it's What is ...  I'll probably 
redirect it to wikipedia.
Otherwise, I'll throw it to a search engine. So, extremelyl simple stuff ...

samwyse [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Jack wrote:
 Thanks for all the replies!

 SPARK looks promising. Its doc doesn't say if it handles unicode
 (CJK in particular) encoding though.

 Yapps also looks powerful: http://theory.stanford.edu/~amitp/yapps/

 There's also PyGgy http://lava.net/~newsham/pyggy/

 I may also give Antlr a try.

 If anyone has experiences using any of the parser generators with CJK
 languages, I'd be very interested in hearing that.

 I'm going to echo Tommy's reply.  If you want to parse natural language, 
 conventional parsers are going to be worse than useless (because you'll 
 keep thinking, Just one more tweak and this time it'll work for sure!). 
 Instead, go look at what the interactive fiction community uses.  They 
 analyse the statement in multiple passes, first picking out the verbs, 
 then the noun phrases.  Some of their parsers can do on-the-fly 
 domain-specific spelling correction, etc, and all of them can ask the user 
 for clarification.  (I'm currently cobbling together something similar for 
 pre-teen users.) 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-19 Thread Alex Martelli
Jack [EMAIL PROTECTED] wrote:

 Thanks for the suggestion. I understand that more work is needed for natural
 language
 understanding. What I want to do is actually very simple - I pre-screen the
 user
 typed text. If it's a simple syntax my code understands, like, Weather in
 London, I'll
 redirect it to a weather site. Or, if it's What is ...  I'll probably
 redirect it to wikipedia.
 Otherwise, I'll throw it to a search engine. So, extremelyl simple stuff ...

http://nltk.sourceforge.net/index.php/Main_Page


NLTK — the Natural Language Toolkit — is a suite of open source Python
modules, data sets and tutorials supporting research and development in
natural language processing.



Alex
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Parser Generator?

2007-08-19 Thread Jack
Very interesting work. Thanks for the link!

Alex Martelli [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]

 http://nltk.sourceforge.net/index.php/Main_Page

 
 NLTK ¡ª the Natural Language Toolkit ¡ª is a suite of open source Python
 modules, data sets and tutorials supporting research and development in
 natural language processing.
 


 Alex 


-- 
http://mail.python.org/mailman/listinfo/python-list

Parser Generator?

2007-08-18 Thread Jack
Hi all, I need to do syntax parsing of simple naturual languages,
for example, weather of London or what is the time, simple
things like these, with Unicode support in the syntax.

In Java, there are JavaCC, Antlr, etc. I wonder what people use
in Python? Antlr also has Python support but I'm not sure how good
it is. Comments/hints are welcome. 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-18 Thread Diez B. Roggisch
Jack schrieb:
 Hi all, I need to do syntax parsing of simple naturual languages,
 for example, weather of London or what is the time, simple
 things like these, with Unicode support in the syntax.
 
 In Java, there are JavaCC, Antlr, etc. I wonder what people use
 in Python? Antlr also has Python support but I'm not sure how good
 it is. Comments/hints are welcome. 

There are several options. I personally like spark.py, the most common 
answer is pyparsing, and don't forget to check out NLTK, the natural 
language toolkit.

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-18 Thread beginner
On Aug 18, 5:22 pm, Jack [EMAIL PROTECTED] wrote:
 Hi all, I need to do syntax parsing of simple naturual languages,
 for example, weather of London or what is the time, simple
 things like these, with Unicode support in the syntax.

 In Java, there are JavaCC, Antlr, etc. I wonder what people use
 in Python? Antlr also has Python support but I'm not sure how good
 it is. Comments/hints are welcome.

Antlr seems to be able to generate python code, too.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-18 Thread Tommy Nordgren

On 19 aug 2007, at 00.22, Jack wrote:

 Hi all, I need to do syntax parsing of simple naturual languages,
 for example, weather of London or what is the time, simple
 things like these, with Unicode support in the syntax.

 In Java, there are JavaCC, Antlr, etc. I wonder what people use
 in Python? Antlr also has Python support but I'm not sure how good
 it is. Comments/hints are welcome.


 --  
 http://mail.python.org/mailman/listinfo/python-list
Antlr can generate Python code.
However, I don't think a parser generator is suitable for generating  
natural language parsers.
They are intended to generate code for computer language parsers.
However, for examples on parsing imperative English sentences, I  
suggest taking a look
at the class library for TADS 3 (Text Adventure Development System)  
http://www.tads.org
The lanuge has a syntax reminding of c++ and Java.
-
An astronomer to a colleague:
-I can't understsnad how you can go to the brothel as often as you  
do. Not only is it a filthy habit, but it must cost a lot of money too.
-Thats no problem. I've got a big government grant for the study of  
black holes.
Tommy Nordgren
[EMAIL PROTECTED]



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-18 Thread Jack
Thanks for all the replies!

SPARK looks promising. Its doc doesn't say if it handles unicode
(CJK in particular) encoding though.

Yapps also looks powerful: http://theory.stanford.edu/~amitp/yapps/

There's also PyGgy http://lava.net/~newsham/pyggy/

I may also give Antlr a try.

If anyone has experiences using any of the parser generators with CJK
languages, I'd be very interested in hearing that.

Jack


Jack [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Hi all, I need to do syntax parsing of simple naturual languages,
 for example, weather of London or what is the time, simple
 things like these, with Unicode support in the syntax.

 In Java, there are JavaCC, Antlr, etc. I wonder what people use
 in Python? Antlr also has Python support but I'm not sure how good
 it is. Comments/hints are welcome.
 


-- 
http://mail.python.org/mailman/listinfo/python-list


ANN: Pyrr 0.1 - Lexer and LR(1)-Parser Generator for Python

2006-04-21 Thread Heiko Wundram
Hi list!

Not long ago I was looking for an easy to use, but powerful parser and lexer 
generating tool for Python, and to my dismay, I found quite a number of 
Python projects implementing an (LA)LR(1) parser generator, but none of them 
seemed quite finished, or even pythonic.

As I required a parser generator for Python for one of my work projects, I set 
out to write (yet another one), and currently am at (release-)version 0.1 for 
Pyrr.ltk and ptk.

An example for Pyrr.ltk and ptk usage implementing a (very) simple calculator:


# -*- coding: iso-8859-15 -*-

from ltk import LexerBase, IgnoreMatch
from ptk import ParserBase
from operator import add, sub, mul, div

class NumLexer(LexerBase):

def number(self,value):
number - r/[0-9]+/
return float(value)

def ws(self,*args):
ws - r/\\s+/
raise IgnoreMatch

def ops(self,op):
addop - /+/
 - /-/
   mulop - /*/
 - r/\\//
return op

class NumParser(ParserBase):
/mulop/: left
   /addop/: left
__start__ = term

def term(self,value1,op,value2):
term - term /addop/ term
- term /mulop/ term
return {+:add,-:sub,*:mul,/:div}[op](value1,value2)

def value(self,value):
term - /number/
return value

print NumParser.parse(NumLexer.tokenize(3 + 4 - 123 / 23))


Grammar rules and lexemes are specified in docstrings, where lines not 
matching a definition of a rule or lexeme are ignored. The resulting lexer 
and parser class is, thus, very much self-documenting, which was one of my 
biggest goals for the project.

I'm currently in the process of writing documentation for both packages (and 
especially documenting the extensions to BNF-grammars that Pyrr.ptk allows, 
such as your usual RE-operators ?, *, + and {x,y}, and forward arguments, and 
documenting the stateful lexer support that Pyrr.ltk implements), but I 
thought that I'd release early and often, so that people interested in this 
project might have a look at it now to input suggestions and extensions that 
they'd like me to add to make this a fully featured Python parser generating 
toolkit which might be offered as a Python package.

Anyway, the sources can be downloaded (via subversion) from:

http://svn.modelnine.org/svn/Pyrr/trunk

where I'll check in the documentation that I've written so far and a Python 
distutils distribution over the weekend, and make sure that I don't check in 
brocken code from now on. And, Pyrr.* is Python 2.4 only at the moment, and I 
have no plans to make it backwards-compatible, but if you're interested in 
backporting it, feel free to mail me patches.

--- Heiko.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ANN: Pyrr 0.1 - Lexer and LR(1)-Parser Generator for Python

2006-04-21 Thread Norman Shelley
FWIW: This has a similiar look/feel to how sabbey wrapped dparser.
http://staff.washington.edu/sabbey/py_dparser/

Heiko Wundram wrote:
 Hi list!
 
 Not long ago I was looking for an easy to use, but powerful parser and lexer 
 generating tool for Python, and to my dismay, I found quite a number of 
 Python projects implementing an (LA)LR(1) parser generator, but none of them 
 seemed quite finished, or even pythonic.
 
 As I required a parser generator for Python for one of my work projects, I 
 set 
 out to write (yet another one), and currently am at (release-)version 0.1 for 
 Pyrr.ltk and ptk.
 
 An example for Pyrr.ltk and ptk usage implementing a (very) simple calculator:
 
 
 # -*- coding: iso-8859-15 -*-
 
 from ltk import LexerBase, IgnoreMatch
 from ptk import ParserBase
 from operator import add, sub, mul, div
 
 class NumLexer(LexerBase):
 
 def number(self,value):
 number - r/[0-9]+/
 return float(value)
 
 def ws(self,*args):
 ws - r/\\s+/
 raise IgnoreMatch
 
 def ops(self,op):
 addop - /+/
  - /-/
mulop - /*/
  - r/\\//
 return op
 
 class NumParser(ParserBase):
 /mulop/: left
/addop/: left
 __start__ = term
 
 def term(self,value1,op,value2):
 term - term /addop/ term
 - term /mulop/ term
 return {+:add,-:sub,*:mul,/:div}[op](value1,value2)
 
 def value(self,value):
 term - /number/
 return value
 
 print NumParser.parse(NumLexer.tokenize(3 + 4 - 123 / 23))
 
 
 Grammar rules and lexemes are specified in docstrings, where lines not 
 matching a definition of a rule or lexeme are ignored. The resulting lexer 
 and parser class is, thus, very much self-documenting, which was one of my 
 biggest goals for the project.
 
 I'm currently in the process of writing documentation for both packages (and 
 especially documenting the extensions to BNF-grammars that Pyrr.ptk allows, 
 such as your usual RE-operators ?, *, + and {x,y}, and forward arguments, and 
 documenting the stateful lexer support that Pyrr.ltk implements), but I 
 thought that I'd release early and often, so that people interested in this 
 project might have a look at it now to input suggestions and extensions that 
 they'd like me to add to make this a fully featured Python parser generating 
 toolkit which might be offered as a Python package.
 
 Anyway, the sources can be downloaded (via subversion) from:
 
 http://svn.modelnine.org/svn/Pyrr/trunk
 
 where I'll check in the documentation that I've written so far and a Python 
 distutils distribution over the weekend, and make sure that I don't check in 
 brocken code from now on. And, Pyrr.* is Python 2.4 only at the moment, and I 
 have no plans to make it backwards-compatible, but if you're interested in 
 backporting it, feel free to mail me patches.
 
 --- Heiko.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: BisonGen parser generator. Newbie question

2005-11-29 Thread uche . ogbuji

I'm trying to run the calculator example included with the BisonGen
parser generator, but I've been unable to put it to work.

When I compile the xml file simple.bgen with the script
BisonGen.bat, the only parser I get is a C file. I've heard BisonGen
generates also a python file, which is, I believe, the one used
imported by test.py to run the testing.


Apologies for the late reply.  Holidays and all that...

Anyway, this is strange.  You should get both C and .py file (and .java
files if you're using a recent CVS version).  Here is what I get:

$BisonGen simple.bgen
Generate parser simple.c
Generate parser simple.java
Generate constants simpleConstants.java
Generate handler simpleHandler.java
Generate handler DefaultsimpleHandler.java

What do you get for output?  BTW, if you want to try a recent CVS
version, grab the snapshot:

ftp://ftp.fourthought.com/pub/cvs-snapshots/BisonGen-CVS.tar.gz (.zip
also available).

Also, you might want to ask BGen questions on the 4Suite mailing list,
where other BGen developers hang out.

http://lists.fourthought.com/pipermail/4suite/

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list


BisonGen parser generator. Newbie question

2005-11-18 Thread ishtar2020
Hello everybody!

I'm trying to run the calculator example included with the BisonGen
parser generator, but I've been unable to put it to work.

When I compile the xml file simple.bgen with the script
BisonGen.bat, the only parser I get is a C file. I've heard BisonGen
generates also a python file, which is, I believe, the one used
imported by test.py to run the testing.

Does anybody know what I'm doing wrong here?

Thank you in advance!

-- 
http://mail.python.org/mailman/listinfo/python-list