[Python-announce] pyparsing 3.1.0 released

Paul McGuire Sun, 18 Jun 2023 18:02:58 -0700

After several alpha and beta releases, I've finally pushed out version 3.1.0 of 
pyparsing. Here are the highlights:


NOTE: In the future release 3.2.0, use of many of the pre-PEP8 methods (such as 
`ParserElement.parseString`) will start to raise `DeprecationWarnings`. 3.2.0 
should get released some time later in 2023. I currently plan to completely 
drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release 
until at least late 2023 if not 2024. So there is plenty of time to convert 
existing parsers to the new function names before the old functions are 
completely removed. (Big help from Devin J. Pohly in structuring the code to 
enable this peaceful transition.)

Version 3.2.0 will also discontinue support for Python versions 3.6 and 3.7.


Version 3.1.0 - June, 2023
--------------------------

API CHANGES
-----------

- A slight change has been implemented when unquoting a quoted string parsed 
using the `QuotedString` class. Formerly, when unquoting and processing 
whitespace markers such as \t and \n, these substitutions would occur first, 
and then any additional '\' escaping would be done on the resulting string. 
This would parse "\\n" as "\<newline>". Now escapes and whitespace markers are 
all processed in a single pass working left to right, so the quoted string 
"\\n" would get unquoted to "\n" (a backslash followed by "n"). Fixes issue 
#474 raised by jakeanq, thanks!

- Reworked `delimited_list` function into the new `DelimitedList` class. 
`DelimitedList` has the same constructor interface as `delimited_list`, and in 
this release, `delimited_list` changes from a function to a synonym for 
`DelimitedList`. `delimited_list` and the older `delimitedList` method will be 
deprecated in a future release, in favor of `DelimitedList`.

- `ParserElement.validate()` is deprecated. It predates the support for 
left-recursive parsers, and was prone to false positives (warning that a 
grammar was invalid when it was in fact valid).  It will be removed in a future 
pyparsing release. In its place, developers should use debugging and analytical 
tools, such as `ParserElement.set_debug()` and 
`ParserElement.create_diagram()`. (Raised in Issue #444, thanks Andrea Micheli!)


NEW FEATURES AND ENHANCEMENTS
-----------------------------

- `Optional(expr)` may now be written as `expr | ""`

  This will make this code:

      "{" + Optional(Literal("A") | Literal("a")) + "}"

  writable as:

      "{" + (Literal("A") | Literal("a") | "") + "}"

  Some related changes implemented as part of this work:
  - `Literal("")` now internally generates an `Empty()` (and no longer raises 
an exception)
  - `Empty` is now a subclass of `Literal`

  Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly.

- Added new class method `ParserElement.using_each`, to simplify code that 
creates a sequence of `Literals`, `Keywords`, or other `ParserElement` 
subclasses.

  For instance, to define suppressible punctuation, you would previously write:

      LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")

  You can now write:

      LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")

  `using_each` will also accept optional keyword args, which it will pass 
through to the class initializer. Here is an expression for single-letter 
variable names that might be used in an algebraic expression:

      algebra_var = MatchFirst(
          Char.using_each(string.ascii_lowercase, as_keyword=True)
      )

- Added new builtin `python_quoted_string`, which will match any form of 
single-line or multiline quoted strings defined in Python. (Inspired by 
discussion with Andreas Schörgenhumer in Issue #421.)

- Extended `expr[]` notation for repetition of `expr` to accept a slice, where 
the slice's stop value indicates a `stop_on` expression:

      test = "BEGIN aaa bbb ccc END"
      BEGIN, END = Keyword.using_each("BEGIN END".split())
      body_word = Word(alphas)

      expr = BEGIN + Group(body_word[...:END]) + END
      # equivalent to
      # expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END

      print(expr.parse_string(test))

  Prints:

      ['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']

- Added named field "url" to `pyparsing.common.url`, returning the entire 
parsed URL string.

- Added bool `embed` argument to `ParserElement.create_diagram()`. When passed 
as True, the resulting diagram will omit the `<DOCTYPE>`, `<HEAD>`, and 
`<BODY>` tags so that it can be embedded in other HTML source. (Useful when 
embedding a call to `create_diagram()` in a PyScript HTML page.)

- Added `recurse` argument to `ParserElement.set_debug` to set the debug flag 
on an expression and all of its sub-expressions. Requested by multimeric in 
Issue #399.

- Added '·' (Unicode MIDDLE DOT) to the set of Latin1.identbodychars.

- `ParseResults` now has a new method `deepcopy()`, in addition to the current 
`copy()` method. `copy()` only makes a shallow copy - any contained 
`ParseResults` are copied as references - changes in the copy will be seen as 
changes in the original. In many cases, a shallow copy is sufficient, but some 
applications require a deep copy. `deepcopy()` makes a deeper copy: any 
contained `ParseResults` or other mappings or containers are built with copies 
from the original, and do not get changed if the original is later changed. 
Addresses issue #463, reported by Bryn Pickering.

- Added new class property `identifier` to all Unicode set classes in 
`pyparsing.unicode`, using the class's values for `cls.identchars` and 
`cls.identbodychars`. Now Unicode-aware parsers that formerly wrote:

      ppu = pyparsing.unicode
      ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)

  can now write:

      ident = ppu.Greek.identifier
      # or
      # ident = ppu.????????.identifier

- Error messages from `MatchFirst` and `Or` expressions will try to give more 
details if one of the alternatives matches better than the others, but still 
fails. Question raised in Issue #464 by msdemlei, thanks!


BUG FIXES AND GENERAL CHANGES
-----------------------------

- Added support for Python 3.12.

- Updated `ci.yml` permissions to limit default access to source - submitted by 
Joyce Brum of Google. Thanks so much!

- Updated `create_diagram()` code to be compatible with `railroad-diagrams` 
package version 3.0. Fixes Issue #477 (railroad diagrams generated with black 
bars), reported by Sam Morley-Short.

- Fixed bug in `NotAny`, where parse actions on the negated expr were not being 
run. This could cause `NotAny` to incorrectly fail if the expr would normally 
match, but would fail to match if a condition used as a parse action returned 
False. Fixes Issue #482, raised by byaka, thank you!

- Fixed `create_diagram()` to accept keyword args, to be passed through to the 
`template.render()` method to generate the output HTML (PR submitted by Aussie 
Schnore, good catch!)

- Fixed bug in `python_quoted_string` regex.

- Fixed bug when parse actions returned an empty string for an expression that 
had a results name, that the results name was not saved. That is:

      expr = Literal("X").add_parse_action(lambda tokens: "")("value")
      result = expr.parse_string("X")
      print(result["value"])

  would raise a `KeyError`. Now empty strings will be saved with the associated 
results name. Raised in Issue #470 by Nicco Kunzmann, thank you.

- Fixed bug in `SkipTo` where ignore expressions were not properly handled 
while scanning for the target expression. Issue #475, reported by elkniwt, 
thanks (this bug has been there for a looooong time!).

- Fixed bug in `Word` when `max=2`. Also added performance enhancement when 
specifying `exact` argument. Reported in issue #409 by panda-34, nice catch!

- `Word` arguments are now validated if `min` and `max` are both given, that 
`min` <= `max`; raises `ValueError` if values are invalid.

- Fixed bug in srange, when parsing escaped '/' and '\' inside a range set.

- Fixed exception messages for some `ParserElements` with custom names, which 
instead showed their contained expression names.

- Fixed bug in pyparsing.common.url, when input URL is not alone on an input 
line. Fixes Issue #459, reported by David Kennedy.

- Multiple added and corrected type annotations. With much help from Stephen 
Rosen, thanks!

- Some documentation and error message clarifications on pyparsing's keyword 
logic, cited by Basil Peace.

- General docstring cleanup for Sphinx doc generation, PRs submitted by Devin 
J. Pohly. A dirty job, but someone has to do it - much appreciated!


EXAMPLE UPDATES
---------------

- Added `tag_emitter.py` to examples. This example demonstrates how to insert 
tags into your parsed results that are not part of the original parsed text.

- Added `bf.py` Brainf*ck parser/executor example. Illustrates using a 
pyparsing grammar to parse language syntax, and attach executable AST nodes to 
the parsed results.

- `invRegex.py` example renamed to `inv_regex.py` and updated to PEP-8 variable 
and method naming. PR submitted by Ross J. Duff, thanks!

- Removed examples `sparser.py` and `pymicko.py`, since each included its own 
GPL license in the header. Since this conflicts with pyparsing's MIT license, 
they were removed from the distribution to avoid confusion among those making 
use of them in their own projects.

- Updated the `lucene_grammar.py` example (better support for '*' and '?' 
wildcards) and corrected the test cases - brought to my attention by Elijah 
Nicol, good catch!
_______________________________________________
Python-announce-list mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: [email protected]

[Python-announce] pyparsing 3.1.0 released

Reply via email to