Re: WIP patch to reduce rewrite_filename duplication

2024-04-26 Thread Bruno Haible
Collin Funk wrote:
> > Correct. Feel free to add tests for --copy-file, if you like.
> > I didn't spend time on it, since --copy-file is simple enough.
> 
> Sure, I can have a look at adding some. I see it is part of the Gnulib
> Savannah group so I can commit changes there. I assume same rules go
> there too, always send patches for review?

In the beginning, yes, the same rules hold as for the main gnulib
repository. People with more experience can commit right away.
And you have seen that I announced my test suite changes in an
informal way; code review is less important in this case because
possible regressions will only affect us ourselves.

Bruno






Re: WIP patch to reduce rewrite_filename duplication

2024-04-25 Thread Collin Funk
On 4/25/24 5:14 PM, Bruno Haible wrote:
> Yes, the other heuristic is that when many functions operate on the
> same object, like methods in class do, the pointer to that object
> is often passed first. And, as you noticed, the two heuristics collide...
> 
> Here, I would not think of 'rewrite_file_name' as a method pertaining
> to the class GLConfig. Rather, GLConfig is just one of the parameters.

Yeah, that makes sense. I'll keep that in mind.

> Perfect. OK to push.

Done.

> Correct. Feel free to add tests for --copy-file, if you like.
> I didn't spend time on it, since --copy-file is simple enough.

Sure, I can have a look at adding some. I see it is part of the Gnulib
Savannah group so I can commit changes there. I assume same rules go
there too, always send patches for review?

> Ah, that's because exitfail.h was modified in Gnulib a few days ago.
> Will adapt the override in gettext. Thanks for the notice.

Thanks! I see now that the #ifdef __cplusplus change messed up the
patch.

Collin



Re: WIP patch to reduce rewrite_filename duplication

2024-04-25 Thread Bruno Haible
Hi Collin,

> > Yep, that's the right way to do it. Maybe file_name should come first
> > (by the usual heuristic that the argument that shows most variation
> > comes first)?
> 
> Sure, that is fine with me. I think the other way is based on how I
> like to write C. Usually I like to place structures (or pointers to
> them) first. The best example I can think of off the top of my head
> is:
> 
> int fprintf (FILE *stream, const char *format, ...);
> 
> vs.
> 
>int fputs (const char *str, FILE *stream);
> 
> I always feel awkward using 'fputs' because of that.

Yes, the other heuristic is that when many functions operate on the
same object, like methods in class do, the pointer to that object
is often passed first. And, as you noticed, the two heuristics collide...

Here, I would not think of 'rewrite_file_name' as a method pertaining
to the class GLConfig. Rather, GLConfig is just one of the parameters.

> How does this patch look?

Perfect. OK to push.

> I ran all tests with Python 3.7 and they pass. I don't think
> gnulib-tool --copy-file is tested there though.

Correct. Feel free to add tests for --copy-file, if you like.
I didn't spend time on it, since --copy-file is simple enough.

> I tried to use gettext
> since I remember autogen.sh does it a few times but I see the
> following failure:
> 
> $ cd gettext
> $ env GNULIB_TOOL_IMPL=sh+py sh -x ./autogen.sh
> [...]
> Copying file gnulib-lib/exitfail.c
> 1 out of 1 hunk FAILED -- saving rejects to file /tmp/glDMpS4d/exitfail.h.rej
> /home/collin/.local/src/gnulib/gnulib-tool.sh: *** patch file 
> gnulib-local/lib/exitfail.h.diff didn't apply cleanly
> /home/collin/.local/src/gnulib/gnulib-tool.sh: *** Stop.
> + exit 1

Ah, that's because exitfail.h was modified in Gnulib a few days ago.
Will adapt the override in gettext. Thanks for the notice.

Bruno






Re: WIP patch to reduce rewrite_filename duplication

2024-04-25 Thread Collin Funk
Hi Bruno,

On 4/25/24 1:18 PM, Bruno Haible wrote:
> Yep, that's the right way to do it. Maybe file_name should come first
> (by the usual heuristic that the argument that shows most variation
> comes first)?

Sure, that is fine with me. I think the other way is based on how I
like to write C. Usually I like to place structures (or pointers to
them) first. The best example I can think of off the top of my head
is:

int fprintf (FILE *stream, const char *format, ...);

vs.

   int fputs (const char *str, FILE *stream);

I always feel awkward using 'fputs' because of that. But that is a
different language so I can adapt. :)

How does this patch look? I'll wait for you to review it before
pushing any changes.

I ran all tests with Python 3.7 and they pass. I don't think
gnulib-tool --copy-file is tested there though. I tried to use gettext
since I remember autogen.sh does it a few times but I see the
following failure:

$ cd gettext
$ env GNULIB_TOOL_IMPL=sh+py sh -x ./autogen.sh
[...]
Copying file gnulib-lib/exitfail.c
1 out of 1 hunk FAILED -- saving rejects to file /tmp/glDMpS4d/exitfail.h.rej
/home/collin/.local/src/gnulib/gnulib-tool.sh: *** patch file 
gnulib-local/lib/exitfail.h.diff didn't apply cleanly
/home/collin/.local/src/gnulib/gnulib-tool.sh: *** Stop.
+ exit 1

I assume that is unrelated to these changes and will be an easy fix
for you. When I remove the diff everything passes with
GNULIB_TOOL_IMPL=sh+py.

CollinFrom 95f83aabfcd2cd0624ae85357f7f40f34bb1a01b Mon Sep 17 00:00:00 2001
From: Collin Funk 
Date: Thu, 25 Apr 2024 15:30:29 -0700
Subject: [PATCH] gnulib-tool.py: Reduce code duplication in file name
 transformations.

* pygnulib/functions.py: New file for shared functions between modules.
Add a function based on functions removed from GLImport and GLTestDir.
Accepts a single file name instead of a list.
* pygnulib/GLImport.py (GLImport.prepare): Use the new function.
(GLImport.rewrite_new_files, GLImport.rewrite_old_files): Remove
functions.
* pygnulib/GLTestDir.py (GLTestDir.execute): Use the new function.
(GLTestDir.rewrite_files): Remove functions.
* pygnulib/main.py (main): Remove unused function import. Use the new
function.
---
 ChangeLog | 14 +++
 pygnulib/GLImport.py  | 97 +--
 pygnulib/GLTestDir.py | 55 +---
 pygnulib/functions.py | 52 +++
 pygnulib/main.py  | 17 +---
 5 files changed, 88 insertions(+), 147 deletions(-)
 create mode 100644 pygnulib/functions.py

diff --git a/ChangeLog b/ChangeLog
index abefd3bae2..276258b885 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,17 @@
+2024-04-25  Collin Funk  
+
+	gnulib-tool.py: Reduce code duplication in file name transformations.
+	* pygnulib/functions.py: New file for shared functions between modules.
+	Add a function based on functions removed from GLImport and GLTestDir.
+	Accepts a single file name instead of a list.
+	* pygnulib/GLImport.py (GLImport.prepare): Use the new function.
+	(GLImport.rewrite_new_files, GLImport.rewrite_old_files): Remove
+	functions.
+	* pygnulib/GLTestDir.py (GLTestDir.execute): Use the new function.
+	(GLTestDir.rewrite_files): Remove functions.
+	* pygnulib/main.py (main): Remove unused function import. Use the new
+	function.
+
 2024-04-25  Bruno Haible  
 
 	doc: Remove documentation of IRIX as supported platform.
diff --git a/pygnulib/GLImport.py b/pygnulib/GLImport.py
index cf1ebd10ec..47c0e83555 100644
--- a/pygnulib/GLImport.py
+++ b/pygnulib/GLImport.py
@@ -38,6 +38,7 @@
 relativize,
 rmtree,
 )
+from .functions import rewrite_file_name
 from .GLError import GLError
 from .GLConfig import GLConfig
 from .GLModuleSystem import GLModuleTable
@@ -303,80 +304,6 @@ def __repr__(self) -> str:
 result = '' % hex(id(self))
 return result
 
-def rewrite_old_files(self, files: list[str]) -> list[str]:
-'''Replace auxdir, docbase, sourcebase, m4base and testsbase from default
-to their version from cached config.'''
-if type(files) is not list:
-raise TypeError('files argument must has list type, not %s'
-% type(files).__name__)
-for file in files:
-if type(file) is not str:
-raise TypeError('each file must be a string instance')
-files = sorted(set(files))
-files = [ '%s%s' % (file, os.path.sep)
-  for file in files ]
-auxdir = self.cache['auxdir']
-docbase = self.cache['docbase']
-sourcebase = self.cache['sourcebase']
-m4base = self.cache['m4base']
-testsbase = self.cache['testsbase']
-result = []
-for file in files:
-if file.startswith('build-aux/'):
-path = substart('build-aux/', '%s/' % auxdir, file)
-elif file.startswith('doc/'):
-path = substart('doc/', '%s/' % docbase, file)
-elif file.startswith('lib/'):

Re: WIP patch to reduce rewrite_filename duplication

2024-04-25 Thread Bruno Haible
Hi Collin,

> > IMO, the creation of a rewrite_file_name method (singular! you mentioned
> > a couple of days ago that the ability to rewrite a single file name is
> > one of the motivations for this refactoring) in a central place goes in
> > the correct direction. However, moving knowledge about 'cache' and 'config'
> > into GLFileTable does not.
> 
> Sure, I agree. Though I'm not sure the best place to put it. All of
> our non-private functions are in constants.py, but this function seems
> out of place there. Since all of the functions there are string
> helpers and such. In other words, no GL* classes are imported there.

I agree, it's out-of-place in constants.py.

> Maybe it would be best to create a functions.py file?

Sounds reasonable, yes.

> For the actual function it would be something like this:
> 
> def rewrite_file_name(config: GLConfig,
>   file_name: str,
>   also_tests_lib: bool = False) -> str:
> ...
> if also_tests_lib and file_name.startswith('tests=lib/'):
># OK

Yep, that's the right way to do it. Maybe file_name should come first
(by the usual heuristic that the argument that shows most variation
comes first)?

Bruno






Re: WIP patch to reduce rewrite_filename duplication

2024-04-25 Thread Collin Funk
Hi Bruno,

Thank you for the detailed explanations as always.

On 4/25/24 1:48 AM, Bruno Haible wrote:
> Object-oriented programming is not easy, and it comes with constant
> hesitations and deliberations.

:)

My least favorite part of my degree was object-oriented design and
database normalization, for similar reasons.

> Another, bigger warning sign is that you can no longer easily explain
> what the GLFileTable is.
> Before, it was a data structure that holds file lists during an operation.
> After, it is a helper class that holds file lists and also meddles with
> file name rewriting as required by the caller. (That's the best description
> I can come up with!)

I see what you mean. I guess the strange setters would be a symptom of
that warning sign that GLFileTable is doing to much.

The way you described the GLFileTable reminds me of the reasoning for
adding the 'dataclasses' module added in Python 3.7 [1]. Since we are
both very big fans of decorators of course. :)

> IMO, the creation of a rewrite_file_name method (singular! you mentioned
> a couple of days ago that the ability to rewrite a single file name is
> one of the motivations for this refactoring) in a central place goes in
> the correct direction. However, moving knowledge about 'cache' and 'config'
> into GLFileTable does not.

Sure, I agree. Though I'm not sure the best place to put it. All of
our non-private functions are in constants.py, but this function seems
out of place there. Since all of the functions there are string
helpers and such. In other words, no GL* classes are imported there.

Maybe it would be best to create a functions.py file?

> Neither of the two proprosals smells well. Why not add a boolean
> 'also_tests_lib' parameter to the rewrite_file_name method, that tells
> whether to replace 'tests=lib/' or not? This way, the refactoring does
> not introduce a functional difference.

Sorry, I think I worded that poorly. Since 'tests=lib/' is internally
used I was thinking that maybe we should warn/error if it is used. I
guess that isn't a problem worth losing sleep over.

For the actual function it would be something like this:

def rewrite_file_name(config: GLConfig,
  file_name: str,
  also_tests_lib: bool = False) -> str:
...
if also_tests_lib and file_name.startswith('tests=lib/'):
   # OK

Since 'tests=lib/' is used internally I felt it makes more sense to
default it to False. That way the call in main.py can ignore it and
the calls in GLImport.py and GLTestDir.py can use it internally.

> In a case like this, it's better if you post the patch for review, rather
> than push it too quickly. Trust me: OO takes a long time to learn.

Yes, I agree.

[1] https://docs.python.org/3/library/dataclasses.html#module-dataclasses

Collin



Re: WIP patch to reduce rewrite_filename duplication

2024-04-25 Thread Bruno Haible
Hi Collin,

Object-oriented programming is not easy, and it comes with constant
hesitations and deliberations.

> Less repeated code to maintain.

Yes, this is one of the goals to keep in mind.

The other most relevant questions are:

  - Can the purpose of a class be explained in simple terms?
  - Is a class properly decoupled from the code that uses the class?

  - Are we using the common design patterns (such as getters/setters,
class-private caching, etc.)?

Here you already have noticed a warning sign:
> The setters also take two arguments which feels a bit unorthodox to me.

Another, bigger warning sign is that you can no longer easily explain
what the GLFileTable is.
Before, it was a data structure that holds file lists during an operation.
After, it is a helper class that holds file lists and also meddles with
file name rewriting as required by the caller. (That's the best description
I can come up with!)

The two warning signs together indicate that, in this case, you are
trying to move too much logic from the GLImport and GLTestDir classes into
GLFileTable.

Don't get me wrong: Moving application code to helper classes can be OK
in other cases. But when done so, it should decouple the class and the
caller. Which is not being done here: The fact that the caller has a
'cache' and a 'config' variable had no impact on the GLFileTable class,
and after the patch it would.

> I haven't pushed this patch yet, but it goes in the correct direction,
> in my opinion.

IMO, the creation of a rewrite_file_name method (singular! you mentioned
a couple of days ago that the ability to rewrite a single file name is
one of the motivations for this refactoring) in a central place goes in
the correct direction. However, moving knowledge about 'cache' and 'config'
into GLFileTable does not.

> > 'tests=lib/' should not occur in the scope of mode == 'copy-file'.
> > But this string is an indicator inside gnulib-tool; users of gnulib-tool
> > should not be allowed to pass it.
> 
> Does it make more sense to check for a filename starting with
> 'tests=lib/' and error out or is it safe to assume the input is clean?

Neither of the two proprosals smells well. Why not add a boolean
'also_tests_lib' parameter to the rewrite_file_name method, that tells
whether to replace 'tests=lib/' or not? This way, the refactoring does
not introduce a functional difference.

> Ideally I would like to use a module-level function or @staticmethod
> there.

Module-level functions are OK. @staticmethod raises questions; in particular,
whether the method belongs there, where you are attempting to put it.

> Any opinions on these ideas? If not, I will probably sleep on it and
> push an improved patch.

In a case like this, it's better if you post the patch for review, rather
than push it too quickly. Trust me: OO takes a long time to learn.

   Bruno