Re: Feature request - base64 Filename Safe Alphabet

2008-06-18 Thread Bo Borgerson
Simon Josefsson wrote:
 Christopher Kerr [EMAIL PROTECTED] writes:
 
 After being burned by using `head -c6 /dev/urandom | base64` as part of a 
 directory name, I realised that it would be useful if base64 had an option 
 to 
 generate URL and Filename safe encodings, as specified in RFC 3548 section 4.

 This would make
 cat FILE | base64 --filename-safe
 equivalent to
 cat FILE | base64 | tr '+/' '-_'
 using the current coreutils tools.
 
 I think --filename-safe is a good idea.  The documentation should
 discuss the potential for generating files starting with '-' or '--'.
 Patching gnulib's base64.c to support an arbitrary alphabet seems messy.
 Patches welcome though.

Hi Simon,

I thought I'd take a stab at this and see where it goes.

What I've done is exposed an additional set of functions, *_a, which
take an arbitrary alphabet as an extra parameter.  Each historical
function now calls one of these with the 'main' alphabet.  I then added
a parallel set of functions, *_filesafe, which call the *_a functions
with the alphabet described above.

It is a little messy, I think, because the large hand-initialized
data-structures are duplicated.  The messiness could be reduced by
having base64 just expose the *_a interface for using an arbitrary
alphabet, and adding a second module (base64_filesafe?) that provided
that specific alternate (with all its attendant bulk).

In any case, as with my previous patches I've tried not to alter the
behavior of any already existing functions.

I've also attached a small patch against coreutils' base64 utility that
provides the desired behavior.  There are no documentation/tests/etc
yet.  It's only for demonstration purposes.

How does this look to you?

Thanks,

Bo
From fcb70d9fdd1c7979f0e3ee499a48248fd771 Mon Sep 17 00:00:00 2001
From: Bo Borgerson [EMAIL PROTECTED]
Date: Wed, 18 Jun 2008 19:16:01 -0400
Subject: [PATCH] base64: Provide an interface for alphabet configurationi and a filesafe alphabet.

* lib/base64.c (base64_encode_a): Was base64_encode.  Takes an alphabet.
(base64_encode_alloc_a): Was base64_encode_alloc. Takes an alphabet.
(isbase64_a): Was isbase64.  Takes an alphabet.
(isbase64 isbase64_filesafe): Call isbase64_a with appropriate alphabet.
(decode_4): Takes an alphabet.
(base64_decode_ctx_a): Was base64_decode_ctx. Takes an alphabet.
(base64_decode_alloc_ctx_a): Was base64_decode_alloc_ctx. Takes an alphabet.
* lib/base64.h (base64_encode): Now a wrapper around base64_encode_a.
(base64_encode_filesafe): Likewise.
(base64_encode_alloc): Now a wrapper around base64_encode_alloc_a.
(base64_encode_alloc_filesafe): Likewise.
(base64_decode_ctx): Now a wrapper around base64_decode_ctx_a.
(base64_decode_ctx_filesafe): Likewise.
(base64_decode): Likewise.
(base64_decode_alloc_ctx): Now a wrapper around base64_decode_alloc_ctx_a.
(base64_decode_alloc_ctx_filesafe): Likewise.
(base64_decode_alloc): Likewise.

Signed-off-by: Bo Borgerson [EMAIL PROTECTED]
---
 lib/base64.c |  327 ++---
 lib/base64.h |   54 --
 2 files changed, 287 insertions(+), 94 deletions(-)

diff --git a/lib/base64.c b/lib/base64.c
index 8aff430..01baa62 100644
--- a/lib/base64.c
+++ b/lib/base64.c
@@ -61,17 +61,22 @@ to_uchar (char ch)
   return ch;
 }
 
+const char b64str_main[64] =
+  ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/;
+
+const char b64str_filesafe[64] =
+  ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_;
+
+
 /* Base64 encode IN array of size INLEN into OUT array of size OUTLEN.
If OUTLEN is less than BASE64_LENGTH(INLEN), write as many bytes as
possible.  If OUTLEN is larger than BASE64_LENGTH(INLEN), also zero
terminate the output buffer. */
 void
-base64_encode (const char *restrict in, size_t inlen,
-	   char *restrict out, size_t outlen)
+base64_encode_a (const char *restrict in, size_t inlen,
+		 char *restrict out, size_t outlen,
+		 const char *b64str)
 {
-  static const char b64str[64] =
-ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/;
-
   while (inlen  outlen)
 {
   *out++ = b64str[(to_uchar (in[0])  2)  0x3f];
@@ -113,7 +118,8 @@ base64_encode (const char *restrict in, size_t inlen,
indicates length of the requested memory block, i.e.,
BASE64_LENGTH(inlen) + 1. */
 size_t
-base64_encode_alloc (const char *in, size_t inlen, char **out)
+base64_encode_alloc_a (const char *in, size_t inlen, char **out,
+		   const char *b64str)
 {
   size_t outlen = 1 + BASE64_LENGTH (inlen);
 
@@ -153,7 +159,7 @@ base64_encode_alloc (const char *in, size_t inlen, char **out)
 
IBM C V6 for AIX mishandles #define B64(x) ...'x'..., so use _
as the formal parameter rather than x.  */
-#define B64(_)	\
+#define B64M(_)	\
   ((_) == 'A' ? 0\
: (_) == 'B' ? 1\
: (_) == 'C' ? 2\
@@ -220,71 +226,206 @@ base64_encode_alloc (const char *in, size_t inlen, char **out)
: (_) == '/' ? 63\

Re: Feature request - base64 Filename Safe Alphabet

2008-06-01 Thread Simon Josefsson
Christopher Kerr [EMAIL PROTECTED] writes:

 After being burned by using `head -c6 /dev/urandom | base64` as part of a 
 directory name, I realised that it would be useful if base64 had an option to 
 generate URL and Filename safe encodings, as specified in RFC 3548 section 4.

 This would make
 cat FILE | base64 --filename-safe
 equivalent to
 cat FILE | base64 | tr '+/' '-_'
 using the current coreutils tools.

I think --filename-safe is a good idea.  The documentation should
discuss the potential for generating files starting with '-' or '--'.
Patching gnulib's base64.c to support an arbitrary alphabet seems messy.
Patches welcome though.

Regarding the discussion of different characters to use, let me add that
'+' is not a URI safe character, so it would be unsafe from that aspect.

I believe the parameter name clash is the least problematic consequence
that we can chose.

/Simon


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-05-05 Thread Jim Meyering
Bo Borgerson [EMAIL PROTECTED] wrote:
 Jim Meyering wrote:
 if you make him happy, I'll probably be happy, too ;-)

 Hi Simon,

 This is an attempt to merge the coreutils and gnulib base64 libraries.

 My goal is to preserve the gnulib interface and behavior while also
 supporting the coreutils extensions.

 This version of the patch should have good performance in both cases, as
 well.

 Please let me know if this meets your requirements.

It meets mine, with one small change:
I found strict_newlines to be a little unclear.
If you use something like ignore_newlines instead, that's not
only clearer to me, but with its reversed semantics it also lets
you avoid three negations.

Simon?

From a302f7beca7d0e2bfcb7770ff31947e3d2965db2 Mon Sep 17 00:00:00 2001
 From: Bo Borgerson [EMAIL PROTECTED]
 Date: Wed, 30 Apr 2008 17:40:38 -0400
 Subject: [PATCH] An upstream compatible base64

 * gl/lib/base64.c (base64_decode_ctx): If no context structure was passed in,
 treat newlines as garbage (this is the historical behavior).  Formerly
 base64_decode.
 (base64_decode_alloc_ctx): Formerly base64_decode_alloc.
 * gl/lib/base64.h (base64_decode): Macro for four-argument calls.
 (base64_decode_alloc): Likewise.
 * src/base64.c (do_decode): Call base64_decode_ctx instead of base64_decode.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-05-05 Thread Bo Borgerson
Jim Meyering wrote:
 I found strict_newlines to be a little unclear.
 If you use something like ignore_newlines instead, that's not
 only clearer to me, but with its reversed semantics it also lets
 you avoid three negations.


Thanks, that's much nicer.

The attached patch contains this change and is rebased against the
current HEAD.

I've also made this available via:

$ git fetch git://repo.or.cz/coreutils/bo.git base64-merge:base64-merge


Thanks,

Bo
From 9131d82c32e00b606eb79d083ef8309178460ac5 Mon Sep 17 00:00:00 2001
From: Bo Borgerson [EMAIL PROTECTED]
Date: Wed, 30 Apr 2008 17:40:38 -0400
Subject: [PATCH] An upstream compatible base64

* gl/lib/base64.c (base64_decode_ctx): If no context structure was passed in,
treat newlines as garbage (this is the historical behavior).  Formerly
base64_decode.
(base64_decode_alloc_ctx): Formerly base64_decode_alloc.
* gl/lib/base64.h (base64_decode): Macro for four-argument calls.
(base64_decode_alloc): Likewise.
* src/base64.c (do_decode): Call base64_decode_ctx instead of base64_decode.

Signed-off-by: Bo Borgerson [EMAIL PROTECTED]
---
 gl/lib/base64.c |   45 +++--
 gl/lib/base64.h |   19 +--
 src/base64.c|2 +-
 3 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/gl/lib/base64.c b/gl/lib/base64.c
index 43f12c6..a33f102 100644
--- a/gl/lib/base64.c
+++ b/gl/lib/base64.c
@@ -449,20 +449,32 @@ decode_4 (char const *restrict in, size_t inlen,
Initially, CTX must have been initialized via base64_decode_ctx_init.
Subsequent calls to this function must reuse whatever state is recorded
in that buffer.  It is necessary for when a quadruple of base64 input
-   bytes spans two input buffers.  */
+   bytes spans two input buffers.
+
+   If CTX is NULL then newlines are treated as garbage and the input
+   buffer is processed as a unit.  */
 
 bool
-base64_decode (struct base64_decode_context *ctx,
-	   const char *restrict in, size_t inlen,
-	   char *restrict out, size_t *outlen)
+base64_decode_ctx (struct base64_decode_context *ctx,
+		   const char *restrict in, size_t inlen,
+		   char *restrict out, size_t *outlen)
 {
   size_t outleft = *outlen;
-  bool flush_ctx = inlen == 0;
+  bool ignore_newlines = ctx != NULL;
+  bool flush_ctx = false;
+  unsigned int ctx_i = 0;
+
+  if (ignore_newlines)
+{
+  ctx_i = ctx-i;
+  flush_ctx = inlen == 0;
+}
+
 
   while (true)
 {
   size_t outleft_save = outleft;
-  if (ctx-i == 0  !flush_ctx)
+  if (ctx_i == 0  !flush_ctx)
 	{
 	  while (true)
 	{
@@ -482,7 +494,7 @@ base64_decode (struct base64_decode_context *ctx,
 
   /* Handle the common case of 72-byte wrapped lines.
 	 This also handles any other multiple-of-4-byte wrapping.  */
-  if (inlen  *in == '\n')
+  if (inlen  *in == '\n'  ignore_newlines)
 	{
 	  ++in;
 	  --inlen;
@@ -495,12 +507,17 @@ base64_decode (struct base64_decode_context *ctx,
 
   {
 	char const *in_end = in + inlen;
-	char const *non_nl = get_4 (ctx, in, in_end, inlen);
+	char const *non_nl;
+
+	if (ignore_newlines)
+	  non_nl = get_4 (ctx, in, in_end, inlen);
+	else
+	  non_nl = in;  /* Might have nl in this case. */
 
 	/* If the input is empty or consists solely of newlines (0 non-newlines),
 	   then we're done.  Likewise if there are fewer than 4 bytes when not
-	   flushing context.  */
-	if (inlen == 0 || (inlen  4  !flush_ctx))
+	   flushing context and not treating newlines as garbage.  */
+	if (inlen == 0 || (inlen  4  !flush_ctx  ignore_newlines))
 	  {
 	inlen = 0;
 	break;
@@ -529,9 +546,9 @@ base64_decode (struct base64_decode_context *ctx,
input was invalid, in which case *OUT is NULL and *OUTLEN is
undefined. */
 bool
-base64_decode_alloc (struct base64_decode_context *ctx,
-		 const char *in, size_t inlen, char **out,
-		 size_t *outlen)
+base64_decode_alloc_ctx (struct base64_decode_context *ctx,
+			 const char *in, size_t inlen, char **out,
+			 size_t *outlen)
 {
   /* This may allocate a few bytes too many, depending on input,
  but it's not worth the extra CPU time to compute the exact size.
@@ -544,7 +561,7 @@ base64_decode_alloc (struct base64_decode_context *ctx,
   if (!*out)
 return true;
 
-  if (!base64_decode (ctx, in, inlen, *out, needlen))
+  if (!base64_decode_ctx (ctx, in, inlen, *out, needlen))
 {
   free (*out);
   *out = NULL;
diff --git a/gl/lib/base64.h b/gl/lib/base64.h
index ba436e0..fa242c8 100644
--- a/gl/lib/base64.h
+++ b/gl/lib/base64.h
@@ -42,12 +42,19 @@ extern void base64_encode (const char *restrict in, size_t inlen,
 extern size_t base64_encode_alloc (const char *in, size_t inlen, char **out);
 
 extern void base64_decode_ctx_init (struct base64_decode_context *ctx);
-extern bool base64_decode (struct base64_decode_context *ctx,
-			   const char *restrict in, size_t inlen,
-			   char *restrict out, size_t *outlen);
 
-extern bool base64_decode_alloc 

Re: Feature request - base64 Filename Safe Alphabet

2008-05-05 Thread Simon Josefsson
Bo Borgerson [EMAIL PROTECTED] writes:

 Jim Meyering wrote:
 I found strict_newlines to be a little unclear.
 If you use something like ignore_newlines instead, that's not
 only clearer to me, but with its reversed semantics it also lets
 you avoid three negations.


 Thanks, that's much nicer.

 The attached patch contains this change and is rebased against the
 current HEAD.

 I've also made this available via:

 $ git fetch git://repo.or.cz/coreutils/bo.git base64-merge:base64-merge

Hi Bo.  Many thanks for picking up the slack here, I should have
improved gnulib's base64 more with coreutils long time ago.

Your patch is rather difficult to read for me, since I'm not that
familiar with the coreutils changes, and more importantly: to be applied
to gnulib, I need a patch against gnulib.

Would you mind creating a patchset that applies to the gnulib git
repository?  If it looks fine to me, I'll apply it and then coreutils
can sync against it without any local modifications.

I suspect your patch do things the way I suggested in the post to the
gnulib list some time ago, which is nice.

Thanks,
Simon


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-05-05 Thread Bo Borgerson
Simon Josefsson wrote:
 Your patch is rather difficult to read for me, since I'm not that
 familiar with the coreutils changes, and more importantly: to be applied
 to gnulib, I need a patch against gnulib.


Hi Simon,

Thanks for looking at this.


 Would you mind creating a patchset that applies to the gnulib git
 repository?


Not at all.

It wasn't very easy to read as a single revision, so I did it in two
steps.  The first step is pure addition: New functions and a definition
of the decode context structure.  The second step is still not the most
legible diff, but it should be a little easier to get your bearings in.


 I suspect your patch do things the way I suggested in the post to the
 gnulib list some time ago, which is nice.


Yes, I think so, at least in terms of interface.


Thanks again,

Bo
From 3a9bdc6228eba0645bb482f88502bdf19aff609f Mon Sep 17 00:00:00 2001
From: Bo Borgerson [EMAIL PROTECTED]
Date: Mon, 5 May 2008 10:54:31 -0400
Subject: [PATCH] A coreutils compatible base64 - part 1

* lib/base64.c (get_4): Get four non-newline characters from the input buffer.
Use the context structure's buffer to create a contiguous block if necessary.
Currently unused.
(decode_4): Helper function to be used by base64_decode_ctx.  Currently unused.
(base64_decode_ctx_init): Initialize a decode context structure.
* lib/base64.h (struct base64_decode_context) To be used by base64_decode_ctx

Signed-off-by: Bo Borgerson [EMAIL PROTECTED]
---
 lib/base64.c |  135 ++
 lib/base64.h |8 +++
 2 files changed, 143 insertions(+), 0 deletions(-)

diff --git a/lib/base64.c b/lib/base64.c
index f237cd6..40ae640 100644
--- a/lib/base64.c
+++ b/lib/base64.c
@@ -300,6 +300,141 @@ isbase64 (char ch)
   return uchar_in_range (to_uchar (ch))  0 = b64[to_uchar (ch)];
 }
 
+/* Initialize decode-context buffer, CTX.  */
+void
+base64_decode_ctx_init (struct base64_decode_context *ctx)
+{
+  ctx-i = 0;
+}
+
+/* If CTX-i is 0 or 4, there are four or more bytes in [*IN..IN_END), and
+   none of those four is a newline, then return *IN.  Otherwise, copy up to
+   4 - CTX-i non-newline bytes from that range into CTX-buf, starting at
+   index CTX-i and setting CTX-i to reflect the number of bytes copied,
+   and return CTX-buf.  In either case, advance *IN to point to the byte
+   after the last one processed, and set *N_NON_NEWLINE to the number of
+   verified non-newline bytes accessible through the returned pointer.  */
+static inline char *
+get_4 (struct base64_decode_context *ctx,
+   char const *restrict *in, char const *restrict in_end,
+   size_t *n_non_newline)
+{
+  if (ctx-i == 4)
+ctx-i = 0;
+
+  if (ctx-i == 0)
+{
+  char const *t = *in;
+  if (4 = in_end - *in  memchr (t, '\n', 4) == NULL)
+	{
+	  /* This is the common case: no newline.  */
+	  *in += 4;
+	  *n_non_newline = 4;
+	  return (char *) t;
+	}
+}
+
+  {
+/* Copy non-newline bytes into BUF.  */
+char const *p = *in;
+while (p  in_end)
+  {
+	char c = *p++;
+	if (c != '\n')
+	  {
+	ctx-buf[ctx-i++] = c;
+	if (ctx-i == 4)
+	  break;
+	  }
+  }
+
+*in = p;
+*n_non_newline = ctx-i;
+return ctx-buf;
+  }
+}
+
+#define return_false\
+  do		\
+{		\
+  *outp = out;\
+  return false;\
+}		\
+  while (false)
+
+/* Decode up to four bytes of base64-encoded data, IN, of length INLEN
+   into the output buffer, *OUT, of size *OUTLEN bytes.  Return true if
+   decoding is successful, false otherwise.  If *OUTLEN is too small,
+   as many bytes as possible are written to *OUT.  On return, advance
+   *OUT to point to the byte after the last one written, and decrement
+   *OUTLEN to reflect the number of bytes remaining in *OUT.  */
+static inline bool
+decode_4 (char const *restrict in, size_t inlen,
+	  char *restrict *outp, size_t *outleft)
+{
+  char *out = *outp;
+  if (inlen  2)
+return false;
+
+  if (!isbase64 (in[0]) || !isbase64 (in[1]))
+return false;
+
+  if (*outleft)
+{
+  *out++ = ((b64[to_uchar (in[0])]  2)
+		| (b64[to_uchar (in[1])]  4));
+  --*outleft;
+}
+
+  if (inlen == 2)
+return_false;
+
+  if (in[2] == '=')
+{
+  if (inlen != 4)
+	return_false;
+
+  if (in[3] != '=')
+	return_false;
+}
+  else
+{
+  if (!isbase64 (in[2]))
+	return_false;
+
+  if (*outleft)
+	{
+	  *out++ = (((b64[to_uchar (in[1])]  4)  0xf0)
+		| (b64[to_uchar (in[2])]  2));
+	  --*outleft;
+	}
+
+  if (inlen == 3)
+	return_false;
+
+  if (in[3] == '=')
+	{
+	  if (inlen != 4)
+	return_false;
+	}
+  else
+	{
+	  if (!isbase64 (in[3]))
+	return_false;
+
+	  if (*outleft)
+	{
+	  *out++ = (((b64[to_uchar (in[2])]  6)  0xc0)
+			| b64[to_uchar (in[3])]);
+	  --*outleft;
+	}
+	}
+}
+
+  *outp = out;
+  return true;
+}
+
 /* Decode base64 encoded input array IN of length INLEN to output
array OUT that can hold 

Re: Feature request - base64 Filename Safe Alphabet

2008-05-05 Thread Jim Meyering
Bo Borgerson [EMAIL PROTECTED] wrote:
 Jim Meyering wrote:
 I found strict_newlines to be a little unclear.
 If you use something like ignore_newlines instead, that's not
 only clearer to me, but with its reversed semantics it also lets
 you avoid three negations.

 Thanks, that's much nicer.

 The attached patch contains this change and is rebased against the
 current HEAD.

 I've also made this available via:

 $ git fetch git://repo.or.cz/coreutils/bo.git base64-merge:base64-merge
...
From 9131d82c32e00b606eb79d083ef8309178460ac5 Mon Sep 17 00:00:00 2001
 From: Bo Borgerson [EMAIL PROTECTED]
 Date: Wed, 30 Apr 2008 17:40:38 -0400
 Subject: [PATCH] An upstream compatible base64

 * gl/lib/base64.c (base64_decode_ctx): If no context structure was passed in,
 treat newlines as garbage (this is the historical behavior).  Formerly
 base64_decode.
 (base64_decode_alloc_ctx): Formerly base64_decode_alloc.
 * gl/lib/base64.h (base64_decode): Macro for four-argument calls.
 (base64_decode_alloc): Likewise.
 * src/base64.c (do_decode): Call base64_decode_ctx instead of base64_decode.

Thanks.  I've applied and pushed that.
With that, updating from gnulib (assuming no changes required there)
will be a no-op.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-30 Thread Bo Borgerson
Jim Meyering wrote:
 Beware:
 there are two versions of base64.c.
 The one in gnulib and another in coreutils/gl/lib.
 
 Simon and I have been thinking about how to merge these
 two for some time, but I haven't found time since our last exchange.
 
 Volunteers welcome ;-)


Hi,

This is an attempt at making a base64.c that supports the context
structure for coreutils but still presents a four-argument decode
interface for gnulib.

It doesn't address the differences in newline handling, and it's
definitely less efficient for four-argument decode calls.  Is this the
direction you were thinking for a merge of the two?

Thanks,

Bo
From e63ed95710560a7da7f4fd681add4f0e8172bc7a Mon Sep 17 00:00:00 2001
From: Bo Borgerson [EMAIL PROTECTED]
Date: Wed, 30 Apr 2008 17:40:38 -0400
Subject: [PATCH] A step toward an upstream compatible base64

* gl/lib/base64.c (base64_decode_ctx): If no context structure was passed in,
initialize a local one and use it.  Be sure to flush.  Formerly base64_decode.
(base64_decode_alloc_ctx): Formerly base64_decode_alloc.
* gl/lib/base64.h (base64_decode): Macro for four-argument calls.
(base64_decode_alloc): Likewise.
* src/base64.c (do_decode): Call base64_decode_ctx instead of base64_decode.

Signed-off-by: Bo Borgerson [EMAIL PROTECTED]
---
 gl/lib/base64.c |   22 +++---
 gl/lib/base64.h |   19 +--
 src/base64.c|2 +-
 3 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/gl/lib/base64.c b/gl/lib/base64.c
index 43f12c6..4a79eef 100644
--- a/gl/lib/base64.c
+++ b/gl/lib/base64.c
@@ -452,12 +452,20 @@ decode_4 (char const *restrict in, size_t inlen,
bytes spans two input buffers.  */
 
 bool
-base64_decode (struct base64_decode_context *ctx,
-	   const char *restrict in, size_t inlen,
-	   char *restrict out, size_t *outlen)
+base64_decode_ctx (struct base64_decode_context *ctx,
+		   const char *restrict in, size_t inlen,
+		   char *restrict out, size_t *outlen)
 {
   size_t outleft = *outlen;
   bool flush_ctx = inlen == 0;
+  struct base64_decode_context local_ctx;
+
+  if (ctx == NULL)
+{
+  ctx = local_ctx;
+  base64_decode_ctx_init (ctx);
+  flush_ctx = true;
+}
 
   while (true)
 {
@@ -529,9 +537,9 @@ base64_decode (struct base64_decode_context *ctx,
input was invalid, in which case *OUT is NULL and *OUTLEN is
undefined. */
 bool
-base64_decode_alloc (struct base64_decode_context *ctx,
-		 const char *in, size_t inlen, char **out,
-		 size_t *outlen)
+base64_decode_alloc_ctx (struct base64_decode_context *ctx,
+			 const char *in, size_t inlen, char **out,
+			 size_t *outlen)
 {
   /* This may allocate a few bytes too many, depending on input,
  but it's not worth the extra CPU time to compute the exact size.
@@ -544,7 +552,7 @@ base64_decode_alloc (struct base64_decode_context *ctx,
   if (!*out)
 return true;
 
-  if (!base64_decode (ctx, in, inlen, *out, needlen))
+  if (!base64_decode_ctx (ctx, in, inlen, *out, needlen))
 {
   free (*out);
   *out = NULL;
diff --git a/gl/lib/base64.h b/gl/lib/base64.h
index ba436e0..fa242c8 100644
--- a/gl/lib/base64.h
+++ b/gl/lib/base64.h
@@ -42,12 +42,19 @@ extern void base64_encode (const char *restrict in, size_t inlen,
 extern size_t base64_encode_alloc (const char *in, size_t inlen, char **out);
 
 extern void base64_decode_ctx_init (struct base64_decode_context *ctx);
-extern bool base64_decode (struct base64_decode_context *ctx,
-			   const char *restrict in, size_t inlen,
-			   char *restrict out, size_t *outlen);
 
-extern bool base64_decode_alloc (struct base64_decode_context *ctx,
- const char *in, size_t inlen,
- char **out, size_t *outlen);
+extern bool base64_decode_ctx (struct base64_decode_context *ctx,
+			   const char *restrict in, size_t inlen,
+			   char *restrict out, size_t *outlen);
+
+extern bool base64_decode_alloc_ctx (struct base64_decode_context *ctx,
+ const char *in, size_t inlen,
+ char **out, size_t *outlen);
+
+#define base64_decode(in, inlen, out, outlen) \
+	base64_decode_ctx (NULL, in, inlen, out, outlen)
+
+#define base64_decode_alloc(in, inlen, out, outlen) \
+	base64_decode_alloc_ctx (NULL, in, inlen, out, outlen)
 
 #endif /* BASE64_H */
diff --git a/src/base64.c b/src/base64.c
index aa2fc8f..983b8cb 100644
--- a/src/base64.c
+++ b/src/base64.c
@@ -223,7 +223,7 @@ do_decode (FILE *in, FILE *out, bool ignore_garbage)
 	  if (k == 1  ctx.i == 0)
 	break;
 	  n = BLOCKSIZE;
-	  ok = base64_decode (ctx, inbuf, (k == 0 ? sum : 0), outbuf, n);
+	  ok = base64_decode_ctx (ctx, inbuf, (k == 0 ? sum : 0), outbuf, n);
 
 	  if (fwrite (outbuf, 1, n, out)  n)
 	error (EXIT_FAILURE, errno, _(write error));
-- 
1.5.4.3

___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-30 Thread Jim Meyering
Bo Borgerson [EMAIL PROTECTED] wrote:
 This is an attempt at making a base64.c that supports the context
 structure for coreutils but still presents a four-argument decode
 interface for gnulib.

 It doesn't address the differences in newline handling, and it's
 definitely less efficient for four-argument decode calls.  Is this the
 direction you were thinking for a merge of the two?

Thanks for working on that.
You be the judge, considering Simon's position from January:

  http://thread.gmane.org/gmane.comp.lib.gnulib.bugs/8670/focus=12523

Sorry I didn't dig that up initially.
Since his packages are the main consumer other than coreutils,
if you make him happy, I'll probably be happy, too ;-)


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-30 Thread Bo Borgerson
Jim Meyering wrote:
   http://thread.gmane.org/gmane.comp.lib.gnulib.bugs/8670/focus=12523
 
 Sorry I didn't dig that up initially.
 Since his packages are the main consumer other than coreutils,
 if you make him happy, I'll probably be happy, too ;-)


Ah, thanks, that helps put things in context.

It shouldn't be too hard to make the four-argument decode calls choke on
newlines in a backwards compatible way.  I'll look into it and submit an
updated patch.

Thanks,

Bo


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-30 Thread Bo Borgerson
Jim Meyering wrote:
 if you make him happy, I'll probably be happy, too ;-)


Hi Simon,

This is an attempt to merge the coreutils and gnulib base64 libraries.

My goal is to preserve the gnulib interface and behavior while also
supporting the coreutils extensions.

This version of the patch should have good performance in both cases, as
well.

Please let me know if this meets your requirements.

Thanks,

Bo
From a302f7beca7d0e2bfcb7770ff31947e3d2965db2 Mon Sep 17 00:00:00 2001
From: Bo Borgerson [EMAIL PROTECTED]
Date: Wed, 30 Apr 2008 17:40:38 -0400
Subject: [PATCH] An upstream compatible base64

* gl/lib/base64.c (base64_decode_ctx): If no context structure was passed in,
treat newlines as garbage (this is the historical behavior).  Formerly
base64_decode.
(base64_decode_alloc_ctx): Formerly base64_decode_alloc.
* gl/lib/base64.h (base64_decode): Macro for four-argument calls.
(base64_decode_alloc): Likewise.
* src/base64.c (do_decode): Call base64_decode_ctx instead of base64_decode.

Signed-off-by: Bo Borgerson [EMAIL PROTECTED]
---
 gl/lib/base64.c |   45 +++--
 gl/lib/base64.h |   19 +--
 src/base64.c|2 +-
 3 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/gl/lib/base64.c b/gl/lib/base64.c
index 43f12c6..bfe4ad2 100644
--- a/gl/lib/base64.c
+++ b/gl/lib/base64.c
@@ -449,20 +449,32 @@ decode_4 (char const *restrict in, size_t inlen,
Initially, CTX must have been initialized via base64_decode_ctx_init.
Subsequent calls to this function must reuse whatever state is recorded
in that buffer.  It is necessary for when a quadruple of base64 input
-   bytes spans two input buffers.  */
+   bytes spans two input buffers.
+
+   If CTX is NULL then newlines are treated as garbage and the input
+   buffer is processed as a unit.  */
 
 bool
-base64_decode (struct base64_decode_context *ctx,
-	   const char *restrict in, size_t inlen,
-	   char *restrict out, size_t *outlen)
+base64_decode_ctx (struct base64_decode_context *ctx,
+		   const char *restrict in, size_t inlen,
+		   char *restrict out, size_t *outlen)
 {
   size_t outleft = *outlen;
-  bool flush_ctx = inlen == 0;
+  bool strict_newlines = ctx == NULL;
+  bool flush_ctx = false;
+  unsigned int ctx_i = 0;
+
+  if (!strict_newlines)
+{
+  ctx_i = ctx-i;
+  flush_ctx = inlen == 0;
+}
+
 
   while (true)
 {
   size_t outleft_save = outleft;
-  if (ctx-i == 0  !flush_ctx)
+  if (ctx_i == 0  !flush_ctx)
 	{
 	  while (true)
 	{
@@ -482,7 +494,7 @@ base64_decode (struct base64_decode_context *ctx,
 
   /* Handle the common case of 72-byte wrapped lines.
 	 This also handles any other multiple-of-4-byte wrapping.  */
-  if (inlen  *in == '\n')
+  if (inlen  *in == '\n'  !strict_newlines)
 	{
 	  ++in;
 	  --inlen;
@@ -495,12 +507,17 @@ base64_decode (struct base64_decode_context *ctx,
 
   {
 	char const *in_end = in + inlen;
-	char const *non_nl = get_4 (ctx, in, in_end, inlen);
+	char const *non_nl;
+
+	if (strict_newlines)
+	  non_nl = in;  /* Might have nl in this case. */
+	else
+	  non_nl = get_4 (ctx, in, in_end, inlen);
 
 	/* If the input is empty or consists solely of newlines (0 non-newlines),
 	   then we're done.  Likewise if there are fewer than 4 bytes when not
-	   flushing context.  */
-	if (inlen == 0 || (inlen  4  !flush_ctx))
+	   flushing context and not treating newlines as garbage.  */
+	if (inlen == 0 || (inlen  4  !flush_ctx  !strict_newlines))
 	  {
 	inlen = 0;
 	break;
@@ -529,9 +546,9 @@ base64_decode (struct base64_decode_context *ctx,
input was invalid, in which case *OUT is NULL and *OUTLEN is
undefined. */
 bool
-base64_decode_alloc (struct base64_decode_context *ctx,
-		 const char *in, size_t inlen, char **out,
-		 size_t *outlen)
+base64_decode_alloc_ctx (struct base64_decode_context *ctx,
+			 const char *in, size_t inlen, char **out,
+			 size_t *outlen)
 {
   /* This may allocate a few bytes too many, depending on input,
  but it's not worth the extra CPU time to compute the exact size.
@@ -544,7 +561,7 @@ base64_decode_alloc (struct base64_decode_context *ctx,
   if (!*out)
 return true;
 
-  if (!base64_decode (ctx, in, inlen, *out, needlen))
+  if (!base64_decode_ctx (ctx, in, inlen, *out, needlen))
 {
   free (*out);
   *out = NULL;
diff --git a/gl/lib/base64.h b/gl/lib/base64.h
index ba436e0..fa242c8 100644
--- a/gl/lib/base64.h
+++ b/gl/lib/base64.h
@@ -42,12 +42,19 @@ extern void base64_encode (const char *restrict in, size_t inlen,
 extern size_t base64_encode_alloc (const char *in, size_t inlen, char **out);
 
 extern void base64_decode_ctx_init (struct base64_decode_context *ctx);
-extern bool base64_decode (struct base64_decode_context *ctx,
-			   const char *restrict in, size_t inlen,
-			   char *restrict out, size_t *outlen);
 
-extern bool base64_decode_alloc (struct base64_decode_context *ctx,
- const 

Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Christopher Kerr
After being burned by using `head -c6 /dev/urandom | base64` as part of a 
directory name, I realised that it would be useful if base64 had an option to 
generate URL and Filename safe encodings, as specified in RFC 3548 section 4.

This would make
cat FILE | base64 --filename-safe
equivalent to
cat FILE | base64 | tr '+/' '-_'
using the current coreutils tools.



signature.asc
Description: This is a digitally signed message part.
___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Pádraig Brady
Christopher Kerr wrote:
 After being burned by using `head -c6 /dev/urandom | base64` as part of a 
 directory name, I realised that it would be useful if base64 had an option to 
 generate URL and Filename safe encodings, as specified in RFC 3548 section 4.
 
 This would make
 cat FILE | base64 --filename-safe
 equivalent to
 cat FILE | base64 | tr '+/' '-_'

Not a bad idea. I've needed stuff like that before:
http://www.pixelbeat.org/libs/base64.c

Perhaps `tr '+/' '._'` would be better so that
you don't need to worry about - at the start of a filename?

Pádraig.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Bo Borgerson
Christopher Kerr wrote:
 After being burned by using `head -c6 /dev/urandom | base64` as part of a 
 directory name, I realised that it would be useful if base64 had an option to 
 generate URL and Filename safe encodings, as specified in RFC 3548 section 4.
 
 This would make
 cat FILE | base64 --filename-safe
 equivalent to
 cat FILE | base64 | tr '+/' '-_'
 using the current coreutils tools.

Hi,

lib/base64.c looks fairly easy to pull apart so that current functions
base64_encode and base64_decode become wrappers around internal
functions that take an additional argument describing the alphabet.

New functions base64_encode_filesafe and base64_decode_filesafe could
then be added without breaking the pre-existing interface or duplicating
a lot of code.

B


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Eric Blake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Pádraig Brady on 4/29/2008 6:59 AM:
| Perhaps `tr '+/' '._'` would be better so that
| you don't need to worry about - at the start of a filename?

Which is worse - a file that can be confused with an option, or a file
that is hidden by default from fnmatch?

- --
Don't work too hard, make some time for fun as well!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgXHW4ACgkQ84KuGfSFAYD33wCgzj9rA1X48T1AHMoxWRtoCDRT
AfsAn1jesOXoe3gkWszZiMxnmOD/av/y
=OBbi
-END PGP SIGNATURE-


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Bo Borgerson
Pádraig Brady wrote:
 Perhaps `tr '+/' '._'` would be better so that
 you don't need to worry about - at the start of a filename?


I'm think `.' at the beginning of a filename also has the potential to
give users unexpected behavior.

Bo



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Jim Meyering
Bo Borgerson [EMAIL PROTECTED] wrote:
 Christopher Kerr wrote:
 After being burned by using `head -c6 /dev/urandom | base64` as part of a
 directory name, I realised that it would be useful if base64 had an option to
 generate URL and Filename safe encodings, as specified in RFC 3548 section 4.

 This would make
 cat FILE | base64 --filename-safe
 equivalent to
 cat FILE | base64 | tr '+/' '-_'
 using the current coreutils tools.

 Hi,

 lib/base64.c looks fairly easy to pull apart so that current functions
 base64_encode and base64_decode become wrappers around internal
 functions that take an additional argument describing the alphabet.

 New functions base64_encode_filesafe and base64_decode_filesafe could
 then be added without breaking the pre-existing interface or duplicating
 a lot of code.

Beware:
there are two versions of base64.c.
The one in gnulib and another in coreutils/gl/lib.

Simon and I have been thinking about how to merge these
two for some time, but I haven't found time since our last exchange.

Volunteers welcome ;-)


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Jim Meyering
Christopher Kerr [EMAIL PROTECTED] wrote:
 After being burned by using `head -c6 /dev/urandom | base64` as part of a
 directory name, I realised that it would be useful if base64 had an option to
 generate URL and Filename safe encodings, as specified in RFC 3548 section 4.

 This would make
 cat FILE | base64 --filename-safe
 equivalent to
 cat FILE | base64 | tr '+/' '-_'
 using the current coreutils tools.

Christopher,

In case you read via means other than direct mail,
[because my reply to you bounced]
this is to inform you that your mail server (Cambridge's)
is *still* rejecting mail from my static IP based on a single
bogus MAPS RBL entry that includes the entire 82.230.0.0/16 network:

  [EMAIL PROTECTED]: host mx.cam.ac.uk[131.111.8.149] said: 550-82.230.74.64 
is
  listed at rbl-plus.mail-abuse.ja.net; See 550
  http://mail-abuse.com/cgi-bin/lookup?82.230.74.64 (in reply to RCPT TO
  command)

I've encountered/reported the same problem numerous times for your
domain over the last year or so.

Maybe you can help someone understand that this is not good
for the University's reputation.

Jim


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Pádraig Brady
Bo Borgerson wrote:
 Pádraig Brady wrote:
 Perhaps `tr '+/' '._'` would be better so that
 you don't need to worry about - at the start of a filename?
 
 
 I'm think `.' at the beginning of a filename also has the potential to
 give users unexpected behavior.

Doh! never thought of that :)

tr '+/' '._' = hidden files
tr '+/' '-_' = awkward option clashes
tr '/' '_' = not POSIX portable

ho hum, the awkward option clashes is probably best.

cheers,
Pádraig.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Eric Blake
Pádraig Brady P at draigBrady.com writes:

 tr '+/' '._' = hidden files
 tr '+/' '-_' = awkward option clashes
 tr '/' '_' = not POSIX portable
 
 ho hum, the awkward option clashes is probably best.

You can always use ./ prefix to avoid option clashes, and for most tools, you 
can also use -- strategically.

Also, while '+' is not allowed in short 8.3 DOS file names, these days, all 
practical platforms that target FAT file systems also support long file names 
where '+' is perfectly fine:
http://www.gnu.org/software/autoconf/manual/autoconf.html#File-System-
Conventions

Besides, think of 'g++', as an example of using + in file names.

-- 
Eric Blake




___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Bo Borgerson
Pádraig Brady wrote:
 tr '+/' '._' = hidden files
 tr '+/' '-_' = awkward option clashes
 tr '/' '_' = not POSIX portable
 
 ho hum, the awkward option clashes is probably best.

Yeah, there's no really ideal option, is there...

It almost might be nice to have a totally user-configurable alphabet.

Something like:

$ base64 --62=- --63=_


Bo


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Gabriel Barazer

On 04/29/2008 4:47:53 PM +0200, Bo Borgerson [EMAIL PROTECTED] wrote:

Pádraig Brady wrote:

tr '+/' '._' = hidden files
tr '+/' '-_' = awkward option clashes
tr '/' '_' = not POSIX portable


AFAIK, POSIX filenames allow any character except the slash character 
and the null byte.




ho hum, the awkward option clashes is probably best.


Especially when this is the RFC recommanded translation. This would 
avoid confusing people with multiple translation sets and stick to the 
RFC (considered by many as the authoritative translation)




Yeah, there's no really ideal option, is there...


it is very easy to escape a dash character, either manually (the tab key 
makes it very easy with some shells), or in scripts (all languages have 
a shell escape function).




It almost might be nice to have a totally user-configurable alphabet.


IMHO this is a bad idea because this would confuse even more people 
trying to use it. We could end with dozen of incompatible, non-portable 
shell scripts, with none using the same translation set.


A totally user-configurable alphabet is always possible with base64 | 
tr which is designed to do that.


Gabriel


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Bo Borgerson
Gabriel Barazer wrote:
 AFAIK, POSIX filenames allow any character except the slash character
 and the null byte.

 Especially when this is the RFC recommanded translation. This would
 avoid confusing people with multiple translation sets and stick to the
 RFC (considered by many as the authoritative translation)
 
 it is very easy to escape a dash character, either manually (the tab key
 makes it very easy with some shells), or in scripts (all languages have
 a shell escape function).
 
 IMHO this is a bad idea because this would confuse even more people
 trying to use it. We could end with dozen of incompatible, non-portable
 shell scripts, with none using the same translation set.
 
 A totally user-configurable alphabet is always possible with base64 |
 tr which is designed to do that.


Yes, you're absolutely right.  All very good points.

I do still think the original poster's suggestion of a `--filename-safe'
option is worth considering.  As you mentioned the inclusion of such a
base64 alphabet in the RFC means it's likely to be a widely accepted
alternative.


Bo


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Feature request - base64 Filename Safe Alphabet

2008-04-29 Thread Andreas Schwab
Gabriel Barazer [EMAIL PROTECTED] writes:

 On 04/29/2008 4:47:53 PM +0200, Bo Borgerson [EMAIL PROTECTED] wrote:
 Pádraig Brady wrote:
 tr '+/' '._' = hidden files
 tr '+/' '-_' = awkward option clashes
 tr '/' '_' = not POSIX portable

 AFAIK, POSIX filenames allow any character except the slash character and
 the null byte.

That is actually a Unix property.  For a portable POSIX file name the
alphabet is much more restricted: [A-Za-z0-9_.-], without leading
hyphen.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils