Hi,

Den fre 8 dec. 2023 kl 05:40 skrev Yasuhito FUTATSUKI <
futat...@yf.bsdclub.org>:

>
>
> On 2023/12/07 19:33, Daniel Sahlberg wrote:
> > Den tors 7 dec. 2023 kl 09:51 skrev Ruediger Pluem <rpl...@apache.org>:
> >
> >> I stumbled accross a Python 3 compatibility issue in
> >> tools/hook-scripts/mailer/mailer.py.
> >>
> >> The call of self.cfg.which_groups in line 565 passes an empty string as
> >> first parameter.
> >> In which_groups this empty string is passed to to_str in line 1489.
> >> In line 88 to_str does x.decode('utf-8')
> >> In Python 2 str objects have a decode method at least in later Python 3
> >> versions they have not. Hence I propose the following patch which fixes
> the
> >> issue for me:
> <snip>
>
> > There was some work done on mailer.py by Greg Stein, started here:
> > https://lists.apache.org/thread/v5bh1j8qz8kk0jv1tlctxqt1k454tz0h
> > However I don't think that touched these parts of the code.
>
> Yes, I also think the issue has existed before it.
>
> > I think there is more to this than that simple fix, but I'm no expert in
> > Python or in the Python bindings. There are two more calls to
> > which_groups():
> >
> > Line 491, called with the return from
> > svn.repos.ChangeCollector(...).get_changes().items() (first argument in a
> > tuple). I don't know which encoding this uses.
>
> We decided that for all C APIs which returns char * values, their Python 3
> wrapper functions return them as bytes object, and also we don't convert
> those values into str within helper functions in svn modules. Thus their
> path elements should be bytes object, and as they are internal paths,
> their encoding is UTF-8, regardless of locale.
>
> > Line 663, called with the return from sys.stdin.buffer.readlines(),
> already
> > processed by to_str() once.
> >
> > In particular the call on 663 should be double decoded unless I'm missing
> > something.
>
> Yes, there is inconsistency here.
>
> > So depending on what get_changes() return, maybe we should remove the
> > to_str() from which_groups() instead?
>
> As Config.which_group() call from Commit.__init__() has worked correctly,
> the pathes as matching pattern in which_group() are expected as str.
> So if remove to_str() from which_groups(), path argment for
> Config.which_group() should be a str object.
>
> > Note that we are still somewhat supporting Python 2 so we need to make
> sure
> > any changes does work on Python 2 as well.
>
> Then, how about this patch? It at least mailer-t1.sh passed both
> on python=python2.7 and on python=python3.9.
>
> [[[
> Fix inconsistency in path argment on Config.which_group()
>
> Previously, we call Config.which_group() with path as bytes in
> Commit.__init__() and with path as str in Lock.__init__() and
> in PropChange.__init__(), but Config.which_group handled path
> argment as bytes. To fix it we only use str as path argment on
> Config.wich_group().
>
> * tools/hook-scripts/mailer/mailer.py
>   (Config.which_groups): Treat path as str.
>   (Commit.__init__): convert bytes path into str before calling above.
>
> Found by: Ruediger Pluem (rpluem {_AT_} apache.org)
>
> Index: tools/hook-scripts/mailer/mailer.py
> ===================================================================
> --- tools/hook-scripts/mailer/mailer.py (revision 1913728)
> +++ tools/hook-scripts/mailer/mailer.py (working copy)
> @@ -488,7 +488,7 @@
>      # collect the set of groups and the unique sets of params for the
> options
>      self.groups = { }
>      for path, change in self.changelist:
> -      for (group, params) in self.cfg.which_groups(path, log):
> +      for (group, params) in self.cfg.which_groups(to_str(path), log):
>          # turn the params into a hashable object and stash it away
>          param_list = sorted(params.items())
>          # collect the set of paths belonging to this group
> @@ -1486,9 +1486,9 @@
>      "Return the path's associated groups."
>      groups = []
>      for group, pattern, exclude_pattern, repos_params, search_logmsg_re
> in self._group_re:
> -      match = pattern.match(to_str(path))
> +      match = pattern.match(path)
>        if match:
> -        if exclude_pattern and exclude_pattern.match(to_str(path)):
> +        if exclude_pattern and exclude_pattern.match(path):
>            continue
>          params = repos_params.copy()
>          params.update(match.groupdict())
> ]]]
>
> Cheers,
> --
> Yasuhito FUTATSUKI <futat...@yf.bsdclub.org>
>

This looks good to me! Thanks for the detailed explaination!

Kind regards,
Daniel Sahlberg

Reply via email to