bug#49741: basenc --base64url decoding bug

2021-09-03 Thread Emil Lundberg
Thanks a lot everyone!

/Emil

On 2021-08-30 06:14, Assaf Gordon wrote:
> tag 49741 fixed
> close 49741
> stop
>
> On 2021-08-22 4:15 p.m., Assaf Gordon wrote:
> Attached a suggested fix.
>
> pushed in:
>
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=709d1f8253072804cc27189a6f2b873d8d563399
>
>





bug#49741: basenc --base64url decoding bug

2021-08-29 Thread Assaf Gordon

tag 49741 fixed
close 49741
stop

On 2021-08-22 4:15 p.m., Assaf Gordon wrote:

Attached a suggested fix.


pushed in:

https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=709d1f8253072804cc27189a6f2b873d8d563399






bug#49741: basenc --base64url decoding bug

2021-08-22 Thread Assaf Gordon

On 2021-08-17 3:37 a.m., Jim Meyering wrote:

On Tue, Aug 17, 2021 at 2:02 AM Pádraig Brady  wrote:

On 16/08/2021 22:17, Assaf Gordon wrote:


Attached a suggested fix.


minor nit in NEWS:

a nit in the commit log:


Thanks, attached updated patch.
Will push this week if there are no other comments.

-assaf



>From 090663068a23662b36ddc0603fc1c2c752b6aff1 Mon Sep 17 00:00:00 2001
From: Assaf Gordon 
Date: Mon, 16 Aug 2021 15:03:36 -0600
Subject: [PATCH] basenc: fix bug49741: using wrong decoding buffer length

Emil Lundberg  reports in
https://bugs.gnu.org/49741 about a 'basenc --base64 -d' decoding bug.
The input buffer length was not divisible by 3, resulting in
decoding errors.

* NEWS: Mention fix.
* src/basenc.c (DEC_BLOCKSIZE): Change from 1024*5 to 4200 (35*3*5*8)
which is divisible by 3,4,5,8 - satisfying both base32 and base64;
Use compile-time verify() macro to enforce the above.
* tests/misc/basenc.pl: Add test.
---
 NEWS | 4 
 src/basenc.c | 4 +++-
 tests/misc/basenc.pl | 9 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index ddec56bdf..efdb1450e 100644
--- a/NEWS
+++ b/NEWS
@@ -60,6 +60,10 @@ GNU coreutils NEWS-*- outline -*-
   invalid combinations of case character classes.
   [bug introduced in coreutils-8.6]
 
+  basenc --base64 --decode no longer silently discards decoded characters
+  on (1024*5) buffer boundaries
+  [bug introduced in coreutils-8.31]
+
 ** Changes in behavior
 
   cp and install now default to copy-on-write (COW) if available.
diff --git a/src/basenc.c b/src/basenc.c
index 5c97a3652..2ffdb2d27 100644
--- a/src/basenc.c
+++ b/src/basenc.c
@@ -213,7 +213,9 @@ verify (DEC_BLOCKSIZE % 12 == 0);  /* So complete encoded blocks are used.  */
 
 /* Note that increasing this may decrease performance if --ignore-garbage
is used, because of the memmove operation below.  */
-# define DEC_BLOCKSIZE (1024*5)
+# define DEC_BLOCKSIZE (4200)
+verify (DEC_BLOCKSIZE % 40 == 0); /* complete encoded blocks for base32 */
+verify (DEC_BLOCKSIZE % 12 == 0); /* complete encoded blocks for base64 */
 
 static int (*base_length) (int i);
 static bool (*isbase) (char ch);
diff --git a/tests/misc/basenc.pl b/tests/misc/basenc.pl
index 3383aaeef..ac5394731 100755
--- a/tests/misc/basenc.pl
+++ b/tests/misc/basenc.pl
@@ -37,6 +37,13 @@ my $base64url_out_nl = $base64url_out;
 $base64url_out_nl =~ s/(..)/\1\n/g; # add newline every two characters
 
 
+# Bug 49741:
+# The input  is 'abc' in base64, in an 8K buffer (larger than 1024*5,
+# the buffer size which caused the bug).
+my $base64_bug49741_in = "YWJj" x 2000 ;
+my $base64_bug49741_out = "abc" x 2000 ;
+
+
 my $base32_in = "\xfd\xd8\x07\xd1\xa5";
 my $base32_out = "7XMAPUNF";
 my $x = $base32_out;
@@ -111,6 +118,8 @@ my @Tests =
  ['b64u_7', '--base64url -d',  {IN=>$base64_out},
   {EXIT=>1},  {ERR=>"$prog: invalid input\n"}],
 
+ ['b64_bug49741', '--base64 -d',  {IN=>$base64_bug49741_in},
+  {OUT=>$base64_bug49741_out}],
 
 
 
-- 
2.20.1



bug#49741: basenc --base64url decoding bug

2021-08-17 Thread Jim Meyering
On Tue, Aug 17, 2021 at 2:02 AM Pádraig Brady  wrote:
> On 16/08/2021 22:17, Assaf Gordon wrote:
> > Hello Emil and all,
> >
> > Thanks for the clear and easily reproducible bug report.
> >
> > Attached a suggested fix.
> > Comments very welcomed,
>
> minor nit in NEWS:
> s/silently discard/silently discards/
>
> Otherwise it looks good!

Nice, indeed!
a nit in the commit log:

-The input buffer was not divisible by 3
+The input buffer length was not divisible by 3





bug#49741: basenc --base64url decoding bug

2021-08-16 Thread Pádraig Brady

On 16/08/2021 22:17, Assaf Gordon wrote:

Hello Emil and all,

Thanks for the clear and easily reproducible bug report.

Attached a suggested fix.
Comments very welcomed,


minor nit in NEWS:
s/silently discard/silently discards/

Otherwise it looks good.

thanks!
Pádraig





bug#49741: basenc --base64url decoding bug

2021-08-16 Thread Assaf Gordon

Hello Emil and all,

Thanks for the clear and easily reproducible bug report.

Attached a suggested fix.
Comments very welcomed,

- Assaf

>From 11330058443e7cc92b4a53322d810725d42b4e34 Mon Sep 17 00:00:00 2001
From: Assaf Gordon 
Date: Mon, 16 Aug 2021 15:03:36 -0600
Subject: [PATCH] basenc: fix bug49741: using wrong decoding buffer length

Emil Lundberg  reports in
https://bugs.gnu.org/49741 about a 'basenc --base64 -d' decoding bug.
The input buffer was not divisible by 3, resulting in decoding errors.

* NEWS: Mention fix.
* src/basenc.c (DEC_BLOCKSIZE): Change from 1024*5 to 4200 (35*3*5*8)
which is divisible by 3,4,5,8 - satisfying both base32 and base64;
Use compile-time verify() macro to enforce the above.
* tests/misc/basenc.pl: Add test.
---
 NEWS | 4 
 src/basenc.c | 4 +++-
 tests/misc/basenc.pl | 9 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index ddec56bdf..d490ed101 100644
--- a/NEWS
+++ b/NEWS
@@ -60,6 +60,10 @@ GNU coreutils NEWS-*- outline -*-
   invalid combinations of case character classes.
   [bug introduced in coreutils-8.6]
 
+  basenc --base64 --decode no longer silently discard decoded characters
+  on (1024*5) buffer boundaries
+  [bug introduced in coreutils-8.31]
+
 ** Changes in behavior
 
   cp and install now default to copy-on-write (COW) if available.
diff --git a/src/basenc.c b/src/basenc.c
index 5c97a3652..2ffdb2d27 100644
--- a/src/basenc.c
+++ b/src/basenc.c
@@ -213,7 +213,9 @@ verify (DEC_BLOCKSIZE % 12 == 0);  /* So complete encoded blocks are used.  */
 
 /* Note that increasing this may decrease performance if --ignore-garbage
is used, because of the memmove operation below.  */
-# define DEC_BLOCKSIZE (1024*5)
+# define DEC_BLOCKSIZE (4200)
+verify (DEC_BLOCKSIZE % 40 == 0); /* complete encoded blocks for base32 */
+verify (DEC_BLOCKSIZE % 12 == 0); /* complete encoded blocks for base64 */
 
 static int (*base_length) (int i);
 static bool (*isbase) (char ch);
diff --git a/tests/misc/basenc.pl b/tests/misc/basenc.pl
index 3383aaeef..ac5394731 100755
--- a/tests/misc/basenc.pl
+++ b/tests/misc/basenc.pl
@@ -37,6 +37,13 @@ my $base64url_out_nl = $base64url_out;
 $base64url_out_nl =~ s/(..)/\1\n/g; # add newline every two characters
 
 
+# Bug 49741:
+# The input  is 'abc' in base64, in an 8K buffer (larger than 1024*5,
+# the buffer size which caused the bug).
+my $base64_bug49741_in = "YWJj" x 2000 ;
+my $base64_bug49741_out = "abc" x 2000 ;
+
+
 my $base32_in = "\xfd\xd8\x07\xd1\xa5";
 my $base32_out = "7XMAPUNF";
 my $x = $base32_out;
@@ -111,6 +118,8 @@ my @Tests =
  ['b64u_7', '--base64url -d',  {IN=>$base64_out},
   {EXIT=>1},  {ERR=>"$prog: invalid input\n"}],
 
+ ['b64_bug49741', '--base64 -d',  {IN=>$base64_bug49741_in},
+  {OUT=>$base64_bug49741_out}],
 
 
 
-- 
2.20.1



bug#49741: basenc --base64url decoding bug

2021-08-13 Thread Assaf Gordon

Hi,

I will also work on it this weekend.

 -assaf


On 2021-08-12 7:37 p.m., Paul Eggert wrote:
Simon, this looks like some sort of minor buffering problem in 'basenc 
--base64', since plain 'base64' works correctly. Is this something you 
have time to look into?


https://bugs.gnu.org/49741










bug#49741: basenc --base64url decoding bug

2021-08-12 Thread Paul Eggert
Simon, this looks like some sort of minor buffering problem in 'basenc 
--base64', since plain 'base64' works correctly. Is this something you 
have time to look into?


https://bugs.gnu.org/49741





bug#49741: basenc --base64url decoding bug

2021-07-26 Thread Emil Lundberg
Hi! I seem to have encountered a bug in basenc. While decoding a large
base64url-encoded JSON blob, the decoder drops some characters,
rendering the output invalid JSON. I've verified against Python's
built-in base64url decoder, which correctly produces the expected result
while basenc does not.

I've attached the test case, which I've tried to minimize as much as I
can. All my attempts to remove more of the JSON values have made the bug
not trigger, or at least not as easily detectable.

Reproduction instructions:

$ uname -a
Linux HOST 5.13.4-arch1-1 #1 SMP PREEMPT Tue, 20 Jul 2021 16:58:51 +
x86_64 GNU/Linux

$ basenc --version
basenc (GNU coreutils) 8.32

$ cat expected-output.txt | sha256sum
fdb9a77c44e9cd612ad3a3cc210e03ea9782e342bb8293b49530e032b2e4ed0e  -

$ cat actual-output.txt | sha256sum
86bce7aa1d0c2da8432cfbb6da4ad2e559012dadbd1abde711e96b2c518d2b11  -

$ basenc -d --base64 input.txt | sha256sum
86bce7aa1d0c2da8432cfbb6da4ad2e559012dadbd1abde711e96b2c518d2b11  -

$ diff actual-output.txt expected-output.txt
160c160
< "minor: 0
---
> "minor": 0

Installed from Arch Linux official repos, package version coreutils 8.32-1.

Thanks for making basenc, and please let me know if I can do anything
more to help!

/Emil