Re: [PATCH] gnulib-tool.py: Inline 'sed' invocations used on library files.
Collin Funk wrote: > With open() using binary mode with encoding='utf-8' causes a failure: > > with open('test.txt', 'wb', encoding='utf-8') as file: > file.write('abc') > > Traceback (most recent call last): > File "", line 1, in > ValueError: binary mode doesn't take an encoding argument Oops, you're right. My mistake. And likewise for 'rb'. > # Write files with '\n' as newline character. > with open('file.txt', 'w', encoding='utf-8', newline='\n') as file: Yes, you're right. I've committed a fix now. Sorry. > From the documentation from open, it seems the best way to deal with > this is for reading files [2]: > > # Accepts '\n', '\r', '\r\n' as newline. > with open('file.txt', 'r', encoding='utf-8') as file: > data = file.read() They can recommend it. But what we want here is to recognize Unix newlines, not macOS 9 newlines or DOS/Windows newlines. It should behave like gnulib-tool.sh, and thus newline='\n' is appropriate here. Bruno
Re: [PATCH] gnulib-tool.py: Inline 'sed' invocations used on library files.
On 3/27/24 6:55 PM, Collin Funk wrote: > [1] https://docs.python.org/3/library/functions.html#open This link was supposed to be: https://docs.python.org/3/library/codecs.html#codecs.unregister Can't type today I guess... Collin
Re: [PATCH] gnulib-tool.py: Inline 'sed' invocations used on library files.
On 3/27/24 6:24 PM, Bruno Haible wrote: > Thanks! Applied, with one tweak: Let's continue to use 'rb' and 'wb' as > file open() modes, not 'r' and 'w'. If gnulib-tool ever gets used on > Windows, we don't want the trouble caused by Windows CRLF newlines. > We want all generated files to use Unix LF newlines. (Some of the > constants.nlconvert nonsense will have to go away as well.) Oops, I've been using the the standard open() function since I'm not too familiar with the 'codecs' module. I believe they work a bit differently. With open() using binary mode with encoding='utf-8' causes a failure: with open('test.txt', 'wb', encoding='utf-8') as file: file.write('abc') Traceback (most recent call last): File "", line 1, in ValueError: binary mode doesn't take an encoding argument The default encoding if not passed is None. I use it since the default encoding is None. I assume in that case it is left up to the operating system. I know previous versions of Windows liked UTF-16, but maybe it is different now. The codecs module doesn't seem to have that restriction, but Python says that the regular open() and 'io' module should be used for text files [1]. >From the documentation from open, it seems the best way to deal with this is for reading files [2]: # Accepts '\n', '\r', '\r\n' as newline. with open('file.txt', 'r', encoding='utf-8') as file: data = file.read() And then for writing files: # Write files with '\n' as newline character. with open('file.txt', 'w', encoding='utf-8', newline='\n') as file: file.write(data) These changes are pretty simple though. I can get a Windows virtual machine running at some point to test these changes. [1] https://docs.python.org/3/library/functions.html#open [2] https://docs.python.org/3/library/functions.html#open Collin
Re: [PATCH] gnulib-tool.py: Inline 'sed' invocations used on library files.
I updated the gnulib-tool.py.TODO file again: The other 'sed' invocation can stay, since it occurs only once per gnulib-tool.py invocation). Bruno
Re: [PATCH] gnulib-tool.py: Inline 'sed' invocations used on library files.
Hi Collin, > Here is a fixed version, sorry. Thanks! Applied, with one tweak: Let's continue to use 'rb' and 'wb' as file open() modes, not 'r' and 'w'. If gnulib-tool ever gets used on Windows, we don't want the trouble caused by Windows CRLF newlines. We want all generated files to use Unix LF newlines. (Some of the constants.nlconvert nonsense will have to go away as well.) Bruno
Re: [PATCH] gnulib-tool.py: Inline 'sed' invocations used on library files.
On 3/27/24 5:51 PM, Collin Funk wrote: > Hi Bruno, > > Here is the 'sed' inlining I mentioned earlier. Oops, the I accidently sent the patch with this: # Determine script to apply to library files that go into $testsbase/. -sed_transform_testsrelated_lib_file = sed_transform_lib_file +sed_transform_testsrelated_lib_file = None Here is a fixed version, sorry. CollinFrom 83fe701bf90750f0743e0e6677489431b18d60b2 Mon Sep 17 00:00:00 2001 From: Collin Funk Date: Wed, 27 Mar 2024 17:39:58 -0700 Subject: [PATCH] gnulib-tool.py: Inline 'sed' invocations used on library files. * pygnulib/GLFileSystem.py (GLFileAssistant.__init__): Update type hints and docstrings to reflect changes necessary for using re.sub() instead of 'sed'. (GLFileAssistant.add_or_update): Use re.sub() instead of invoking 'sed'. * pygnulib/GLImport.py (GLImport.prepare): Update transformation variables to reflect changes to GLFileAssistant. --- ChangeLog| 10 ++ gnulib-tool.py.TODO | 1 - pygnulib/GLFileSystem.py | 43 +--- pygnulib/GLImport.py | 9 - 4 files changed, 37 insertions(+), 26 deletions(-) diff --git a/ChangeLog b/ChangeLog index bc31b14bd2..aa7df63120 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,13 @@ +2024-03-27 Collin Funk + + gnulib-tool.py: Inline 'sed' invocations used on library files. + * pygnulib/GLFileSystem.py (GLFileAssistant.__init__): Update type hints + and docstrings to reflect changes necessary for using re.sub() instead + of 'sed'. + (GLFileAssistant.add_or_update): Use re.sub() instead of invoking 'sed'. + * pygnulib/GLImport.py (GLImport.prepare): Update transformation + variables to reflect changes to GLFileAssistant. + 2024-03-27 Bruno Haible obstack: Work around ICE with Oracle cc 12.6 (regr. 2023-12-01). diff --git a/gnulib-tool.py.TODO b/gnulib-tool.py.TODO index d7da337f20..48c9e39fa8 100644 --- a/gnulib-tool.py.TODO +++ b/gnulib-tool.py.TODO @@ -9,7 +9,6 @@ Optimize: - os.chdir around subprocess creation -> cwd=... argument instead. - Inline all 'sed' invocations: main.py:1387:args = ['sed', '-e', sed_table, tempname] -GLFileSystem.py:382:args = ['sed', '-e', transformer] Various other refactorings, as deemed useful. diff --git a/pygnulib/GLFileSystem.py b/pygnulib/GLFileSystem.py index 0397af4df3..a155c7f0a0 100644 --- a/pygnulib/GLFileSystem.py +++ b/pygnulib/GLFileSystem.py @@ -19,6 +19,7 @@ from __future__ import annotations # Define global imports #=== import os +import re import codecs import filecmp import subprocess as sp @@ -163,8 +164,13 @@ class GLFileSystem(object): class GLFileAssistant(object): '''GLFileAssistant is used to help with file processing.''' -def __init__(self, config: GLConfig, transformers: dict = dict()): -'''Create GLFileAssistant instance.''' +def __init__(self, config: GLConfig, transformers: dict[str, tuple[re.Pattern, str] | None] = {}) -> None: +'''Create GLFileAssistant instance. + +config stores information shared between classes. +transformers is a dictionary which uses a file category as the key. The + value accessed is a tuple containing arguments for re.sub() or None if + no transformations are needed.''' if type(config) is not GLConfig: raise TypeError('config must be a GLConfig, not %s' % type(config).__name__) @@ -173,11 +179,11 @@ class GLFileAssistant(object): % type(transformers).__name__) for key in ['lib', 'aux', 'main', 'tests']: if key not in transformers: -transformers[key] = 's,x,x,' +transformers[key] = None else: # if key in transformers value = transformers[key] -if type(value) is not str: -raise TypeError('transformers[%s] must be a string, not %s' +if type(value) is not tuple and value != None: +raise TypeError('transformers[%s] must be a tuple or None, not %s' % (key, type(value).__name__)) self.original = None self.rewritten = None @@ -341,10 +347,10 @@ class GLFileAssistant(object): xoriginal = substart('tests=lib/', 'lib/', original) lookedup, tmpflag = self.filesystem.lookup(xoriginal) tmpfile = self.tmpfilename(rewritten) -sed_transform_lib_file = self.transformers.get('lib', '') -sed_transform_build_aux_file = self.transformers.get('aux', '') -sed_transform_main_lib_file = self.transformers.get('main', '') -sed_transform_testsrelated_lib_file = self.transformers.get('tests', '') +sed_transform_lib_file = self.transf
[PATCH] gnulib-tool.py: Inline 'sed' invocations used on library files.
Hi Bruno, Here is the 'sed' inlining I mentioned earlier. While adding the type hints I noticed that some docstrings are missing or are out of date. Some my fault, some not. :) I'll work on updating them as I make changes. It is easier to do that than go out of my way to fix them all at once. To test this change you can use the coreutils import test case. It uses the config-h module for lib/vfprintf.c and friends. CollinFrom 639d368ed6378e71a5a5478e04c2422239287ad2 Mon Sep 17 00:00:00 2001 From: Collin Funk Date: Wed, 27 Mar 2024 17:39:58 -0700 Subject: [PATCH] gnulib-tool.py: Inline 'sed' invocations used on library files. * pygnulib/GLFileSystem.py (GLFileAssistant.__init__): Update type hints and docstrings to reflect changes necessary for using re.sub() instead of 'sed'. (GLFileAssistant.add_or_update): Use re.sub() instead of invoking 'sed'. * pygnulib/GLImport.py (GLImport.prepare): Update transformation variables to reflect changes to GLFileAssistant. --- ChangeLog| 10 ++ gnulib-tool.py.TODO | 1 - pygnulib/GLFileSystem.py | 43 +--- pygnulib/GLImport.py | 11 +- 4 files changed, 38 insertions(+), 27 deletions(-) diff --git a/ChangeLog b/ChangeLog index bc31b14bd2..aa7df63120 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,13 @@ +2024-03-27 Collin Funk + + gnulib-tool.py: Inline 'sed' invocations used on library files. + * pygnulib/GLFileSystem.py (GLFileAssistant.__init__): Update type hints + and docstrings to reflect changes necessary for using re.sub() instead + of 'sed'. + (GLFileAssistant.add_or_update): Use re.sub() instead of invoking 'sed'. + * pygnulib/GLImport.py (GLImport.prepare): Update transformation + variables to reflect changes to GLFileAssistant. + 2024-03-27 Bruno Haible obstack: Work around ICE with Oracle cc 12.6 (regr. 2023-12-01). diff --git a/gnulib-tool.py.TODO b/gnulib-tool.py.TODO index d7da337f20..48c9e39fa8 100644 --- a/gnulib-tool.py.TODO +++ b/gnulib-tool.py.TODO @@ -9,7 +9,6 @@ Optimize: - os.chdir around subprocess creation -> cwd=... argument instead. - Inline all 'sed' invocations: main.py:1387:args = ['sed', '-e', sed_table, tempname] -GLFileSystem.py:382:args = ['sed', '-e', transformer] Various other refactorings, as deemed useful. diff --git a/pygnulib/GLFileSystem.py b/pygnulib/GLFileSystem.py index 0397af4df3..a155c7f0a0 100644 --- a/pygnulib/GLFileSystem.py +++ b/pygnulib/GLFileSystem.py @@ -19,6 +19,7 @@ from __future__ import annotations # Define global imports #=== import os +import re import codecs import filecmp import subprocess as sp @@ -163,8 +164,13 @@ class GLFileSystem(object): class GLFileAssistant(object): '''GLFileAssistant is used to help with file processing.''' -def __init__(self, config: GLConfig, transformers: dict = dict()): -'''Create GLFileAssistant instance.''' +def __init__(self, config: GLConfig, transformers: dict[str, tuple[re.Pattern, str] | None] = {}) -> None: +'''Create GLFileAssistant instance. + +config stores information shared between classes. +transformers is a dictionary which uses a file category as the key. The + value accessed is a tuple containing arguments for re.sub() or None if + no transformations are needed.''' if type(config) is not GLConfig: raise TypeError('config must be a GLConfig, not %s' % type(config).__name__) @@ -173,11 +179,11 @@ class GLFileAssistant(object): % type(transformers).__name__) for key in ['lib', 'aux', 'main', 'tests']: if key not in transformers: -transformers[key] = 's,x,x,' +transformers[key] = None else: # if key in transformers value = transformers[key] -if type(value) is not str: -raise TypeError('transformers[%s] must be a string, not %s' +if type(value) is not tuple and value != None: +raise TypeError('transformers[%s] must be a tuple or None, not %s' % (key, type(value).__name__)) self.original = None self.rewritten = None @@ -341,10 +347,10 @@ class GLFileAssistant(object): xoriginal = substart('tests=lib/', 'lib/', original) lookedup, tmpflag = self.filesystem.lookup(xoriginal) tmpfile = self.tmpfilename(rewritten) -sed_transform_lib_file = self.transformers.get('lib', '') -sed_transform_build_aux_file = self.transformers.get('aux', '') -sed_transform_main_lib_file = self.transformers.get('main', '') -sed_transform_testsrelated_lib_file = self.transformers.get('tests', '') +sed_transform_lib_file = self.transforme