Re: [PATCH] read-file: add variants that clear internal memory

2020-05-29 Thread Daiki Ueno
Bruno Haible  writes:

>> On a different note, it was suggested to disable stdio buffering if
>> RF_SENSITIVE is set.  I am attaching a patch for this.
>
> Reading from a regular file in an unbuffered way can be terribly slow.
> But here, fread_file reads in large chunks, therefore it's OK.
>
> Also, in the specification of fread_file, I would add a note. Maybe like
> this?

Thank you, amended the documentation and pushed.

Regards,
-- 
Daiki Ueno



Re: [PATCH] read-file: add variants that clear internal memory

2020-05-29 Thread Bruno Haible
Hi Daiki,

> On a different note, it was suggested to disable stdio buffering if
> RF_SENSITIVE is set.  I am attaching a patch for this.

Reading from a regular file in an unbuffered way can be terribly slow.
But here, fread_file reads in large chunks, therefore it's OK.

Also, in the specification of fread_file, I would add a note. Maybe like
this?

diff --git a/lib/read-file.c b/lib/read-file.c
index 36780cc..f13c528 100644
--- a/lib/read-file.c
+++ b/lib/read-file.c
@@ -43,8 +43,11 @@
*LENGTH.  On errors, *LENGTH is undefined, errno preserves the
values set by system functions (if any), and NULL is returned.
 
-   If the RF_SENSITIVE flag is set in FLAGS, the memory buffer
-   internally allocated will be cleared upon failure.  */
+   If the RF_SENSITIVE flag is set in FLAGS:
+ - You should control the buffering of STREAM using 'setvbuf'.  Either
+   clear the buffer of STREAM after closing it, or disable buffering of
+   STREAM before calling this function.
+ - The memory buffer internally allocated will be cleared upon failure.  */
 char *
 fread_file (FILE *stream, int flags, size_t *length)
 {




Re: [PATCH] read-file: add variants that clear internal memory

2020-05-28 Thread Daiki Ueno
Bruno Haible  writes:

> Let me update the uses of the module 'read-file' in Gnulib.
> I think the next weekly CI run would have caught this.

Thank you; I completely missed those uses in Gnulib.

On a different note, it was suggested to disable stdio buffering if
RF_SENSITIVE is set.  I am attaching a patch for this.

>From 9165e495461db91b8abc42661fc543784d26d0d6 Mon Sep 17 00:00:00 2001
From: Daiki Ueno 
Date: Fri, 29 May 2020 05:45:40 +0200
Subject: [PATCH] read-file: disable buffering if RF_SENSITIVE is set

* lib/read-file.c (read_file): Call setvbuf if RF_SENSITIVE.
Suggested by Glenn Strauss.
---
 ChangeLog   | 6 ++
 lib/read-file.c | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 77c637414..0a0e2301a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2020-05-29  Daiki Ueno  
+
+	read-file: disable buffering if RF_SENSITIVE is set
+	* lib/read-file.c (read_file): Call setvbuf if RF_SENSITIVE.
+	Suggested by Glenn Strauss.
+
 2020-05-29  Daiki Ueno  
 
 	fopen-gnu-tests: fix "\x" escape usage
diff --git a/lib/read-file.c b/lib/read-file.c
index 36780cc15..3520cbb7b 100644
--- a/lib/read-file.c
+++ b/lib/read-file.c
@@ -195,6 +195,9 @@ read_file (const char *filename, int flags, size_t *length)
   if (!stream)
 return NULL;
 
+  if (flags & RF_SENSITIVE)
+setvbuf (stream, NULL, _IONBF, 0);
+
   out = fread_file (stream, flags, length);
 
   save_errno = errno;
-- 
2.26.2


Regards,
-- 
Daiki Ueno


Re: [PATCH] read-file: add variants that clear internal memory

2020-05-28 Thread Bruno Haible
> Both has been fixed and pushed.  Thank you for the review!

Let me update the uses of the module 'read-file' in Gnulib.
I think the next weekly CI run would have caught this.


2020-05-28  Bruno Haible  

Fix build errors due to read-file changes (regression from 2020-05-27).
* lib/git-merge-changelog.c (read_changelog_file): Update read_file
invocation.
* tests/test-sameacls.c (main): Likewise.
* tests/test-pipe-filter-gi1.c (main): Call read_file instead of
read_binary_file.
* tests/test-pipe-filter-ii1.c (main): Likewise.

diff --git a/lib/git-merge-changelog.c b/lib/git-merge-changelog.c
index 1e6dae1..7b74a49 100644
--- a/lib/git-merge-changelog.c
+++ b/lib/git-merge-changelog.c
@@ -1,5 +1,5 @@
 /* git-merge-changelog - git "merge" driver for GNU style ChangeLog files.
-   Copyright (C) 2008-2010 Bruno Haible 
+   Copyright (C) 2008-2020 Bruno Haible 
 
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -300,7 +300,7 @@ read_changelog_file (const char *filename, struct 
changelog_file *result)
   /* Read the file in text mode, otherwise it's hard to recognize empty
  lines.  */
   size_t length;
-  char *contents = read_file (filename, );
+  char *contents = read_file (filename, 0, );
   if (contents == NULL)
 {
   fprintf (stderr, "could not read file '%s'\n", filename);
diff --git a/tests/test-pipe-filter-gi1.c b/tests/test-pipe-filter-gi1.c
index 4ee9375..0994610 100644
--- a/tests/test-pipe-filter-gi1.c
+++ b/tests/test-pipe-filter-gi1.c
@@ -80,7 +80,7 @@ main (int argc, char *argv[])
 
   /* Read some text from a file.  */
   input_filename = argv[2];
-  input = read_binary_file (input_filename, _size);
+  input = read_file (input_filename, RF_BINARY, _size);
   ASSERT (input != NULL);
 
   /* Convert it to uppercase, line by line.  */
diff --git a/tests/test-pipe-filter-ii1.c b/tests/test-pipe-filter-ii1.c
index 5f31d37..5a56c55 100644
--- a/tests/test-pipe-filter-ii1.c
+++ b/tests/test-pipe-filter-ii1.c
@@ -102,7 +102,7 @@ main (int argc, char *argv[])
 
   /* Read some text from a file.  */
   input_filename = argv[2];
-  input = read_binary_file (input_filename, _size);
+  input = read_file (input_filename, RF_BINARY, _size);
   ASSERT (input != NULL);
 
   /* Convert it to uppercase, line by line.  */
diff --git a/tests/test-sameacls.c b/tests/test-sameacls.c
index cdb10f4..6aad92f 100644
--- a/tests/test-sameacls.c
+++ b/tests/test-sameacls.c
@@ -55,14 +55,14 @@ main (int argc, char *argv[])
 size_t size2;
 char *contents2;
 
-contents1 = read_file (file1, );
+contents1 = read_file (file1, 0, );
 if (contents1 == NULL)
   {
 fprintf (stderr, "error reading file %s: errno = %d\n", file1, errno);
 fflush (stderr);
 abort ();
   }
-contents2 = read_file (file2, );
+contents2 = read_file (file2, 0, );
 if (contents2 == NULL)
   {
 fprintf (stderr, "error reading file %s: errno = %d\n", file2, errno);




Re: [PATCH] read-file: add variants that clear internal memory

2020-05-27 Thread Daiki Ueno
Bruno Haible  writes:

> The first one is nearly perfect. Only the NEWS entry should go into section
> "User visible incompatible changes", not into section "Important general 
> notes".
>
> The second one:
>   - In fread_file, around line 100, you use a realloc workaround that appears
> to be not as reliable as the malloc+free based workaround around line 135.
>   - In read_file, please don't use tabs for indentation. We've switched to
> spaces-only indentation long ago.
>
> Once this is fixed, please push.

Both has been fixed and pushed.  Thank you for the review!

Regards,
-- 
Daiki Ueno



Re: [PATCH] read-file: add variants that clear internal memory

2020-05-27 Thread Bruno Haible
Hi Daiki,

> Here are the two separate commits for this.

The first one is nearly perfect. Only the NEWS entry should go into section
"User visible incompatible changes", not into section "Important general notes".

The second one:
  - In fread_file, around line 100, you use a realloc workaround that appears
to be not as reliable as the malloc+free based workaround around line 135.
  - In read_file, please don't use tabs for indentation. We've switched to
spaces-only indentation long ago.

Once this is fixed, please push.

Bruno




Re: [PATCH] read-file: add variants that clear internal memory

2020-05-27 Thread Daiki Ueno
Bruno Haible  writes:

> It would be useful to first concentrate on the first part, the refactoring
> that introduces flags and RF_BINARY. This would provide a patch that is easier
> to review and does not have the following problems:
>   - internal_fread_file still exists, although fread_file is a no-op wrapper
> around it.
>   - In tests/test-read-file.c, please terminate the main() function with a
> return statement. We assume C99 only in modules that explicitly list 'c99'
> as a dependency. If it's trivial to avoid this dependency, let's do it.
>   - The NEWS file needs an entry.

Indeed, I missed the last paragraph of your suggestion in the previous
mail; sorry about that.  Here are the two separate commits for this.

>From 60608590e2b106708dd74fd31331567af5166d2e Mon Sep 17 00:00:00 2001
From: Daiki Ueno 
Date: Wed, 27 May 2020 08:14:44 +0200
Subject: [PATCH 1/2] read-file: add flags to modify reading behavior

* lib/read-file.h (RF_BINARY): New define.
(fread_file, read_file): Take FLAGS argument.
(read_binary_file): Remove.
* lib/read-file.c (internal_read_file): Merge into ...
(read_file): ... here.
* modules/read-file-tests (Files): Add "tests/macros.h".
* tests/test-read-file.c (main): Refactor using ASSERT macro.
* NEWS: Mention this change.
---
 ChangeLog   | 12 
 NEWS|  5 +
 lib/read-file.c | 43 ++---
 lib/read-file.h |  7 ---
 modules/read-file-tests |  1 +
 tests/test-read-file.c  | 17 +---
 6 files changed, 50 insertions(+), 35 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 07d4d5124..94faf6984 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,15 @@
+2020-05-27  Daiki Ueno  
+
+	read-file: add flags to modify reading behavior
+	* lib/read-file.h (RF_BINARY): New define.
+	(fread_file, read_file): Take FLAGS argument.
+	(read_binary_file): Remove.
+	* lib/read-file.c (internal_read_file): Merge into ...
+	(read_file): ... here.
+	* modules/read-file-tests (Files): Add "tests/macros.h".
+	* tests/test-read-file.c (main): Refactor using ASSERT macro.
+	* NEWS: Mention this change.
+
 2020-05-26  Daiki Ueno  
 
 	read-file: make use of fopen-gnu
diff --git a/NEWS b/NEWS
index 99973c5c3..c559a65e9 100644
--- a/NEWS
+++ b/NEWS
@@ -3,6 +3,11 @@ Important general notes
 
 DateModules Changes
 
+2020-05-27  read-file   The functions provided by this module now take an
+'int flags' argument to modify the file reading
+behavior.  The read_binary_file function has been
+removed as it is no longer necessary.
+
 2019-12-11  Support for These modules are now supported in C++ mode as well.
 ISO C or POSIX  This means, while the autoconfiguration uses the C
 functions   compiler, the resulting header files and function
diff --git a/lib/read-file.c b/lib/read-file.c
index 293bc3e8a..904f1c901 100644
--- a/lib/read-file.c
+++ b/lib/read-file.c
@@ -40,7 +40,7 @@
*LENGTH.  On errors, *LENGTH is undefined, errno preserves the
values set by system functions (if any), and NULL is returned.  */
 char *
-fread_file (FILE *stream, size_t *length)
+fread_file (FILE *stream, int flags _GL_UNUSED, size_t *length)
 {
   char *buf = NULL;
   size_t alloc = BUFSIZ;
@@ -134,9 +134,19 @@ fread_file (FILE *stream, size_t *length)
   }
 }
 
-static char *
-internal_read_file (const char *filename, size_t *length, const char *mode)
+/* Open and read the contents of FILENAME, and return a newly
+   allocated string with the content, and set *LENGTH to the length of
+   the string.  The string is zero-terminated, but the terminating
+   zero byte is not counted in *LENGTH.  On errors, *LENGTH is
+   undefined, errno preserves the values set by system functions (if
+   any), and NULL is returned.
+
+   If the RF_BINARY flag is set in FLAGS, the file is opened in binary
+   mode.  */
+char *
+read_file (const char *filename, int flags, size_t *length)
 {
+  const char *mode = (flags & RF_BINARY) ? "rbe" : "re";
   FILE *stream = fopen (filename, mode);
   char *out;
   int save_errno;
@@ -144,7 +154,7 @@ internal_read_file (const char *filename, size_t *length, const char *mode)
   if (!stream)
 return NULL;
 
-  out = fread_file (stream, length);
+  out = fread_file (stream, flags, length);
 
   save_errno = errno;
 
@@ -161,28 +171,3 @@ internal_read_file (const char *filename, size_t *length, const char *mode)
 
   return out;
 }
-
-/* Open and read the contents of FILENAME, and return a newly
-   allocated string with the content, and set *LENGTH to the length of
-   the string.  The string is zero-terminated, but the terminating
-   zero byte is not counted in *LENGTH.  On errors, *LENGTH is
-   undefined, errno preserves the values set by system functions (if
-   any), and NULL is returned.  */
-char *
-read_file (const char *filename, size_t 

Re: [PATCH] read-file: add variants that clear internal memory

2020-05-26 Thread Bruno Haible
Hi Daiki,

> > If you agree, I'd like to see two commits:
> >   1. the introduction of the flags and RF_BINARY,
> >   2. the RF_SENSITIVE flag.
> >
> > Do you want me to code the first commit, or do you want to do it?
> 
> Sure, that would make things much simpler.  I'm attaching a patch along
> these lines.

It would be useful to first concentrate on the first part, the refactoring
that introduces flags and RF_BINARY. This would provide a patch that is easier
to review and does not have the following problems:
  - internal_fread_file still exists, although fread_file is a no-op wrapper
around it.
  - In tests/test-read-file.c, please terminate the main() function with a
return statement. We assume C99 only in modules that explicitly list 'c99'
as a dependency. If it's trivial to avoid this dependency, let's do it.
  - The NEWS file needs an entry.

Bruno




Re: [PATCH] read-file: add variants that clear internal memory

2020-05-26 Thread Daiki Ueno
Bruno Haible  writes:

> If you agree, I'd like to see two commits:
>   1. the introduction of the flags and RF_BINARY,
>   2. the RF_SENSITIVE flag.
>
> Do you want me to code the first commit, or do you want to do it?

Sure, that would make things much simpler.  I'm attaching a patch along
these lines.

Regards,
-- 
Daiki Ueno
>From 5a40a392b3b551cf92d4be022b4efd900f339e39 Mon Sep 17 00:00:00 2001
From: Daiki Ueno 
Date: Tue, 26 May 2020 10:22:37 +0200
Subject: [PATCH] read-file: add flags to modify reading behavior

* lib/read-file.h (RF_BINARY, RF_SENSITIVE): New define.
(fread_file, read_file): Take FLAGS argument.
(read_binary_file): Remove.
* lib/read-file.c (internal_fread_file): Take into account of
RF_SENSITIVE flag.
(internal_read_file): Take into account of RF_BINARY and
RF_SENSITIVE flags.
* modules/read-file (Depends-on): Add explicit_bzero.
This adds an alternative behavior of those functions to explicitly
clear the internal memory block when it becomes unused.  This is
useful for reading sensitive information from a file.
---
 ChangeLog   | 15 
 lib/read-file.c | 85 -
 lib/read-file.h | 10 +++--
 modules/read-file   |  1 +
 modules/read-file-tests |  1 +
 tests/test-read-file.c  | 17 +++--
 6 files changed, 97 insertions(+), 32 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 07d4d5124..d1ee01a54 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,18 @@
+2020-05-26  Daiki Ueno  
+
+	read-file: add flags to modify reading behavior
+	* lib/read-file.h (RF_BINARY, RF_SENSITIVE): New define.
+	(fread_file, read_file): Take FLAGS argument.
+	(read_binary_file): Remove.
+	* lib/read-file.c (internal_fread_file): Take into account of
+	RF_SENSITIVE flag.
+	(internal_read_file): Take into account of RF_BINARY and
+	RF_SENSITIVE flags.
+	* modules/read-file (Depends-on): Add explicit_bzero.
+	This adds an alternative behavior of those functions to explicitly
+	clear the internal memory block when it becomes unused.  This is
+	useful for reading sensitive information from a file.
+
 2020-05-26  Daiki Ueno  
 
 	read-file: make use of fopen-gnu
diff --git a/lib/read-file.c b/lib/read-file.c
index 293bc3e8a..d95aae168 100644
--- a/lib/read-file.c
+++ b/lib/read-file.c
@@ -31,16 +31,17 @@
 /* Get malloc, realloc, free. */
 #include 
 
+/* Get explicit_bzero, memcpy. */
+#include 
+
 /* Get errno. */
 #include 
 
-/* Read a STREAM and return a newly allocated string with the content,
-   and set *LENGTH to the length of the string.  The string is
-   zero-terminated, but the terminating zero byte is not counted in
-   *LENGTH.  On errors, *LENGTH is undefined, errno preserves the
-   values set by system functions (if any), and NULL is returned.  */
-char *
-fread_file (FILE *stream, size_t *length)
+/* Get assert. */
+#include 
+
+static char *
+internal_fread_file (FILE *stream, int flags, size_t *length)
 {
   char *buf = NULL;
   size_t alloc = BUFSIZ;
@@ -94,7 +95,12 @@ fread_file (FILE *stream, size_t *length)
 /* Shrink the allocated memory if possible.  */
 if (size < alloc - 1)
   {
-char *smaller_buf = realloc (buf, size + 1);
+char *smaller_buf;
+
+if (flags & RF_SENSITIVE)
+  explicit_bzero (buf + size, alloc - size);
+
+smaller_buf = realloc (buf, size + 1);
 if (smaller_buf != NULL)
   buf = smaller_buf;
   }
@@ -106,6 +112,7 @@ fread_file (FILE *stream, size_t *length)
 
 {
   char *new_buf;
+  size_t save_alloc = alloc;
 
   if (alloc == PTRDIFF_MAX)
 {
@@ -118,7 +125,21 @@ fread_file (FILE *stream, size_t *length)
   else
 alloc = PTRDIFF_MAX;
 
-  if (!(new_buf = realloc (buf, alloc)))
+  if (flags & RF_SENSITIVE)
+{
+  new_buf = malloc (alloc);
+  if (!new_buf)
+{
+  /* BUF should be cleared below after the loop.  */
+  save_errno = errno;
+  break;
+}
+  memcpy (new_buf, buf, save_alloc);
+  explicit_bzero (buf, save_alloc);
+  free (buf);
+  buf = new_buf;
+}
+  else if (!(new_buf = realloc (buf, alloc)))
 {
   save_errno = errno;
   break;
@@ -128,14 +149,32 @@ fread_file (FILE *stream, size_t *length)
 }
   }
 
+if (flags & RF_SENSITIVE)
+  explicit_bzero (buf, alloc);
+
 free (buf);
 errno = save_errno;
 return NULL;
   }
 }
 
+/* Read a STREAM and return a newly allocated string with the content,
+   and set *LENGTH to the length of the string.  The string is
+   zero-terminated, but the terminating zero byte is not counted in
+   *LENGTH.  On errors, *LENGTH is undefined, errno preserves the
+   values set by 

Re: [PATCH] read-file: add variants that clear internal memory

2020-05-26 Thread Bruno Haible
Hi Daiki,

> The functions provided by the read-file module are handy, but they are
> suboptimal for reading sensitive materials, because they do not clear
> the allocated memory blocks upon failure.
> ...
> It's tempting to make this behavior enabled by default, but I worry that
> it may cause any performance drawback.

Correct. For sensitive data, often different algorithms need to be used.
Explicit clearing of memory, avoiding algorithms whose running time depends
on the data, and possibly more.

> The attached patch adds a set of variants that deal with that.

Instead of doubling the number of functions of this header file, how about
adding a flags argument to the functions?

  #define RF_BINARY0x1
  #define RF_SENSITIVE 0x2

  extern char *fread_file (FILE * stream, int flags, size_t * length);

  extern char *read_file (const char *filename, int flags, size_t * length);

This way, the public interface of this header file even shrinks to 2 functions.

Yes, this breaks source-code backward compatibility, but Gnulib policy allows
this [1], and the users will have an easy migration path: just add a zero
argument for the flags.

If you agree, I'd like to see two commits:
  1. the introduction of the flags and RF_BINARY,
  2. the RF_SENSITIVE flag.

Do you want me to code the first commit, or do you want to do it?

Bruno

[1] https://www.gnu.org/software/gnulib/manual/html_node/Steady-Development.html