Re: [pygnulib] simplify cache configure.ac parsing

2017-09-12 Thread Bruno Haible
Hi Dmitry,

> BTW, the version from the pygnulib differs a bit already from the
> gnulib-tool shell script

The shell script contains this code (unchanged since 2009):

guessed_auxdir="."
guessed_libtool=false
my_sed_traces='
  s,#.*$,,
  s,^dnl .*$,,
  s, dnl .*$,,
  /AC_CONFIG_AUX_DIR/ {
s,^.*AC_CONFIG_AUX_DIR([[ ]*\([^]"$`\\)]*\).*$,guessed_auxdir="\1",p
  }
  /A[CM]_PROG_LIBTOOL/ {
s,^.*$,guessed_libtool=true,p
  }'
eval `sed -n -e "$my_sed_traces" < "$configure_ac"`

The first 3 lines of the sed script remove comments; I guess gnulib-tool.py
ought to do the same, because we really don't be fooled by invocations that
have been commented out.

> "hello([AC_PREREQ([2.67])])"
> "AC_PREREQ([2.67])"
> "helloAC_PREREQ([2.67])world"

You can reasonably assume that a configure.ac will not contain the first
or third of these lines, because AC_PREREQ (and likewise A[CM]_PROG_LIBTOOL)
are usually used at the top-level only. The second line can occur, though,
as there is no pressure on the programmers to use no indentation.

Bruno




[pygnulib] simplify cache configure.ac parsing

2017-09-12 Thread Dmitry Selyutin
NOTE
This change does not affect the current gnulib-tool.py, just `python`
branch.
Still this change is going to be integrated later into the gnulib-tool.py.


I've been testing a new command-line parsing along with parsing cached
configuration (configure.ac, gnulib-cache.m4 and gnulib-comp.m4 processing).
I've noticed that we spend a lot of time whilst processing the contents of
AC_PREREQ and AC_CONFIG_AUX_DIR macros. These regular expressions have the
following form (I've removed some junk):

".*AC_PREREQ\\(\\[(.*?)\\]\\)"
".*AC_CONFIG_AUX_DIR\\(\\[(.*?)\\]\\)"

In Python, however, it seems to be enough to just use the following form:

"AC_PREREQ\\(\\[(.*?)\\]\\)"
"AC_CONFIG_AUX_DIR\\(\\[(.*?)\\]\\)"

Once I started using the latest form, the time required to process each of
these regular expressions decreased for about half a second. The regex works
even on the following cases:

"hello([AC_PREREQ([2.67])])"
"AC_PREREQ([2.67])"
"helloAC_PREREQ([2.67])world"

I suspect that the original form just was a copy-paste from the original
gnulib-tool, where it could have been used due to the usage of sed to parse
the contents of the configure.ac file. So the questions are:
1. Is the new behavior correct?
2. Shall I push this small optimization?


I'd like to do it, because right now everything else I've rewritten works
almost instantly, but I still have some doubts. What do you think?
BTW, the version from the pygnulib differs a bit already from the
gnulib-tool
shell script; I've attached the patch. I've also decided to use raw string
literals just to make regex less verbose.


-- 
With best regards,
Dmitry Selyutin
From 71a8d4a82caf17350cd3fad4ba6feb7b7fdb3e94 Mon Sep 17 00:00:00 2001
From: Dmitry Selyutin 
Date: Tue, 12 Sep 2017 18:47:55 +0300
Subject: [PATCH] config: simplify cache regular expressions

---
 pygnulib/config.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/pygnulib/config.py b/pygnulib/config.py
index d08181db6..b174b166a 100644
--- a/pygnulib/config.py
+++ b/pygnulib/config.py
@@ -430,9 +430,9 @@ class Base:
 class Cache(Base):
 """gnulib cached configuration"""
 _AUTOCONF_ = {
-"autoconf" : _re_.compile(".*AC_PREREQ\\(\\[(.*?)\\]\\)", _re_.S | _re_.M),
-"auxdir"   : _re_.compile("^AC_CONFIG_AUX_DIR\\(\\[(.*?)\\]\\)$", _re_.S | _re_.M),
-"libtool"  : _re_.compile("A[CM]_PROG_LIBTOOL", _re_.S | _re_.M)
+"autoconf" : _re_.compile(r"AC_PREREQ\(\[(.*?)\]\)", _re_.S | _re_.M),
+"auxdir"   : _re_.compile(r"AC_CONFIG_AUX_DIR\(\[(.*?)\]\)$", _re_.S | _re_.M),
+"libtool"  : _re_.compile(r"A[CM]_PROG_LIBTOOL", _re_.S | _re_.M)
 }
 _GNULIB_CACHE_ = {
 "local" : (str, "gl_LOCAL_DIR"),
@@ -470,7 +470,7 @@ class Cache(Base):
 _GNULIB_CACHE_STR_ += [_key_]
 else:
 _GNULIB_CACHE_LIST_ += [_key_]
-_GNULIB_CACHE_PATTERN_ = _re_.compile("^(gl_.*?)\\(\\[(.*?)\\]\\)$", _re_.S | _re_.M)
+_GNULIB_CACHE_PATTERN_ = _re_.compile(r"^(gl_.*?)\(\[(.*?)\]\)$", _re_.S | _re_.M)
 
 
 def __init__(self, root, m4_base, autoconf=None, **kwargs):
-- 
2.13.4