Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-14 Thread Rory Campbell-Lange
Thanks for the pointer, Roger.

After finally getting the normalising to rawstd base64 encoding to work I was 
trying to get my head around the fact that base64 content seems to often have 
several newlines around it.

Then I found encoding/base64, which has the func (r *newlineFilteringReader) 
Read(p []byte) (int, error) which elegantly resolves this.
https://cs.opensource.google/go/go/+/refs/tags/go1.23.4:src/encoding/base64/base64.go;l=622

I stole the function and simply added '=' in addition to '\n' and '\r' to the 
list of runes to skip. I'll see how I go with that but might need to look at 
your longer list of "garbage" runes. 

I'm going to enjoy looking through the code. Thank you!

Rory


On 14/01/25, roger peppe (rogpe...@gmail.com) wrote:
> Tangentially related to this thread, a while back, I wrote a Go
> implementation of the base64 command that is agnostic about which encoding
> it reads (and can write all the possible encodings). It can be installed
> with:
> go install github.com/rogpeppe/misc/cmd/base64@latest
> 
> It's arguably a little too lenient in what it accepts, but it works for me
> :)
> 
> The source is here
> https://github.com/rogpeppe/misc/blob/f64633da4fd4/cmd/base64/base64.go
> 
> On Tue, 14 Jan 2025 at 14:53, Rory Campbell-Lange 
> wrote:
> 
> > Thanks for finding that foolish error, Brian.
> >
> > To wrap the thread up, the implementation below seems to work ok for
> > reading both base64.RawStdEncoding and base64.StdEncoding encoded data
> > using the base64.RawStdEncoding decoder.
> >
> > Example usage:
> >
> > b64 := NewB64Translator(bytes.NewReader(encodedBytes))
> > b, err := io.ReadAll(base64.NewDecoder(base64.RawStdEncoding, b64))
> >
> > The implementation:
> >
> > type B64Translator struct {
> > br *bufio.Reader
> > }
> >
> > func NewB64Translator(r io.Reader) *B64Translator {
> > return &B64Translator{
> > br: bufio.NewReader(r),
> > }
> > }
> >
> > // Read reads off the buffered reader expecting base64.StdEncoding
> > bytes
> > // with (potentially) 1-3 '=' padding characters at the end.
> > // RawStdEncoding can be used for both StdEncoded and RawStdEncoded
> > data
> > // if the padding is removed.
> > func (b *B64Translator) Read(p []byte) (n int, err error) {
> > h := make([]byte, len(p))
> > n, err = b.br.Read(h)
> > if err != nil {
> > return n, err
> > }
> > // check if there is any padding in the last three bytes
> > tail := make([]byte, 3)
> > if n > 3 {
> > _ = copy(tail, h[n-3:n])
> > } else {
> > _ = copy(tail, h[:n])
> > }
> > c := bytes.Count(tail, []byte("="))
> > copy(p, h[:n-c])
> > return n - c, nil
> > }
> >
> > For larger data the "tail" approach seems to have a tiny speed improvement
> > over a naive bytes.Count(b, []byte("=")) over the whole buffer.
> >
> > Thanks to everyone for their help.
> >
> > Rory
> >
> > On 14/01/25, 'Brian Candler' via golang-nuts (golang-nuts@googlegroups.com)
> > wrote:
> > > I was more or less right. The input string, which you encoded to
> > > "Qm9uam91ciwgam95ZXV4IGxpb24K", contains an encoded newline at the end.
> > > It's not spurious.
> > >
> > > Confirmed by the "echo" pipeline I gave above, or in Go itself:
> > > https://go.dev/play/p/6kSxiCfCTo4
> > >
> > > You can also confirm it by multiplying the length of the input by 3/4
> > >
> > > % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | wc -c
> > >   28
> > >
> > > 28*3/4 = 21
> > > B o n j o u r
> > > , _ j o y e u
> > > x _ l i o n \n
> > >
> > >
> > > On Tuesday, 14 January 2025 at 10:10:22 UTC Brian Candler wrote:
> > >
> > > > Sorry ignore that, I hadn't checked your playground link.
> > > >
> > > > On Tuesday, 14 January 2025 at 10:07:53 UTC Brian Candler wrote:
> > > >
> > > >> > AS I wrote earlier, I'm trying to avoid reading the entire email
> > part
> > > >> into memory to discover if I should use base64.StdEncoding or
> > > >> base64.RawStdEncoding.
> > > >>
> > > >> As I asked before, why would you ever need to use RawStdEncoding? It
> > just
> > > >> means the MIME part was invalid, most likely corrupted/truncated.
> > > >>
> > > >> > One odd thing is that I'm getting extraneous newlines (shown by
> > stars
> > > >> in the output), eg:
> > > >>
> > > >> You are feeding two different inputs which do not differ by
> > truncation
> > > >> alone.
> > > >>
> > > >> % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | base64 -D | hexdump -c
> > > >> 000   B   o   n   j   o   u   r   ,   j   o   y   e   u   x
> > > >> 010   l   i   o   n  \n
> > > >> 015
> > > >>
> > > >> % echo -n "IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==" | base64 -D | hexdump -c
> > > >> 000   "   B   o   n   j   o   u   r   ,   j   o   y   e   u
> >  x
> > > >> 010   l   i   o   n   "
> > > >> 016
> > > >>
> > > >> The second one has encoded double-qu

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-14 Thread roger peppe
Tangentially related to this thread, a while back, I wrote a Go
implementation of the base64 command that is agnostic about which encoding
it reads (and can write all the possible encodings). It can be installed
with:
go install github.com/rogpeppe/misc/cmd/base64@latest

It's arguably a little too lenient in what it accepts, but it works for me
:)

The source is here
https://github.com/rogpeppe/misc/blob/f64633da4fd4/cmd/base64/base64.go

On Tue, 14 Jan 2025 at 14:53, Rory Campbell-Lange 
wrote:

> Thanks for finding that foolish error, Brian.
>
> To wrap the thread up, the implementation below seems to work ok for
> reading both base64.RawStdEncoding and base64.StdEncoding encoded data
> using the base64.RawStdEncoding decoder.
>
> Example usage:
>
> b64 := NewB64Translator(bytes.NewReader(encodedBytes))
> b, err := io.ReadAll(base64.NewDecoder(base64.RawStdEncoding, b64))
>
> The implementation:
>
> type B64Translator struct {
> br *bufio.Reader
> }
>
> func NewB64Translator(r io.Reader) *B64Translator {
> return &B64Translator{
> br: bufio.NewReader(r),
> }
> }
>
> // Read reads off the buffered reader expecting base64.StdEncoding
> bytes
> // with (potentially) 1-3 '=' padding characters at the end.
> // RawStdEncoding can be used for both StdEncoded and RawStdEncoded
> data
> // if the padding is removed.
> func (b *B64Translator) Read(p []byte) (n int, err error) {
> h := make([]byte, len(p))
> n, err = b.br.Read(h)
> if err != nil {
> return n, err
> }
> // check if there is any padding in the last three bytes
> tail := make([]byte, 3)
> if n > 3 {
> _ = copy(tail, h[n-3:n])
> } else {
> _ = copy(tail, h[:n])
> }
> c := bytes.Count(tail, []byte("="))
> copy(p, h[:n-c])
> return n - c, nil
> }
>
> For larger data the "tail" approach seems to have a tiny speed improvement
> over a naive bytes.Count(b, []byte("=")) over the whole buffer.
>
> Thanks to everyone for their help.
>
> Rory
>
> On 14/01/25, 'Brian Candler' via golang-nuts (golang-nuts@googlegroups.com)
> wrote:
> > I was more or less right. The input string, which you encoded to
> > "Qm9uam91ciwgam95ZXV4IGxpb24K", contains an encoded newline at the end.
> > It's not spurious.
> >
> > Confirmed by the "echo" pipeline I gave above, or in Go itself:
> > https://go.dev/play/p/6kSxiCfCTo4
> >
> > You can also confirm it by multiplying the length of the input by 3/4
> >
> > % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | wc -c
> >   28
> >
> > 28*3/4 = 21
> > B o n j o u r
> > , _ j o y e u
> > x _ l i o n \n
> >
> >
> > On Tuesday, 14 January 2025 at 10:10:22 UTC Brian Candler wrote:
> >
> > > Sorry ignore that, I hadn't checked your playground link.
> > >
> > > On Tuesday, 14 January 2025 at 10:07:53 UTC Brian Candler wrote:
> > >
> > >> > AS I wrote earlier, I'm trying to avoid reading the entire email
> part
> > >> into memory to discover if I should use base64.StdEncoding or
> > >> base64.RawStdEncoding.
> > >>
> > >> As I asked before, why would you ever need to use RawStdEncoding? It
> just
> > >> means the MIME part was invalid, most likely corrupted/truncated.
> > >>
> > >> > One odd thing is that I'm getting extraneous newlines (shown by
> stars
> > >> in the output), eg:
> > >>
> > >> You are feeding two different inputs which do not differ by
> truncation
> > >> alone.
> > >>
> > >> % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | base64 -D | hexdump -c
> > >> 000   B   o   n   j   o   u   r   ,   j   o   y   e   u   x
> > >> 010   l   i   o   n  \n
> > >> 015
> > >>
> > >> % echo -n "IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==" | base64 -D | hexdump -c
> > >> 000   "   B   o   n   j   o   u   r   ,   j   o   y   e   u
>  x
> > >> 010   l   i   o   n   "
> > >> 016
> > >>
> > >> The second one has encoded double-quotes before and after the content.
> > >>
> > >> On Monday, 13 January 2025 at 22:43:51 UTC Rory Campbell-Lange wrote:
> > >>
> > >>> AS I wrote earlier, I'm trying to avoid reading the entire email
> part
> > >>> into memory to discover if I should use base64.StdEncoding or
> > >>> base64.RawStdEncoding.
> > >>>
> > >>> The following seems to work reasonably well:
> > >>>
> > >>> type B64Translator struct {
> > >>> br *bufio.Reader
> > >>> }
> > >>>
> > >>> func NewB64Translator(r io.Reader) *B64Translator {
> > >>> return &B64Translator{
> > >>> br: bufio.NewReader(r),
> > >>> }
> > >>> }
> > >>>
> > >>> // Read reads off the buffered reader expecting base64.StdEncoding
> bytes
> > >>> // with (potentially) 1-3 '=' padding characters at the end.
> > >>> // RawStdEncoding can be used for both StdEncoded and RawStdEncoded
> data
> > >>> // if the padding is removed.
> > >>> func (b *B64Translator) Read(p []byte) (n int, err error) {
> > >>> h := make([]byte, len(p))
> > >>> n, err = b.br.Read(h)

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-14 Thread Rory Campbell-Lange
Thanks for finding that foolish error, Brian.

To wrap the thread up, the implementation below seems to work ok for reading 
both base64.RawStdEncoding and base64.StdEncoding encoded data using the 
base64.RawStdEncoding decoder.

Example usage:

b64 := NewB64Translator(bytes.NewReader(encodedBytes))
b, err := io.ReadAll(base64.NewDecoder(base64.RawStdEncoding, b64))

The implementation: 

type B64Translator struct {
br *bufio.Reader
}

func NewB64Translator(r io.Reader) *B64Translator {
return &B64Translator{
br: bufio.NewReader(r),
}
}

// Read reads off the buffered reader expecting base64.StdEncoding bytes
// with (potentially) 1-3 '=' padding characters at the end.
// RawStdEncoding can be used for both StdEncoded and RawStdEncoded data
// if the padding is removed.
func (b *B64Translator) Read(p []byte) (n int, err error) {
h := make([]byte, len(p))
n, err = b.br.Read(h)
if err != nil {
return n, err
}
// check if there is any padding in the last three bytes
tail := make([]byte, 3)
if n > 3 {
_ = copy(tail, h[n-3:n])
} else {
_ = copy(tail, h[:n])
}
c := bytes.Count(tail, []byte("="))
copy(p, h[:n-c])
return n - c, nil
}

For larger data the "tail" approach seems to have a tiny speed improvement over 
a naive bytes.Count(b, []byte("=")) over the whole buffer.

Thanks to everyone for their help.

Rory

On 14/01/25, 'Brian Candler' via golang-nuts (golang-nuts@googlegroups.com) 
wrote:
> I was more or less right. The input string, which you encoded to 
> "Qm9uam91ciwgam95ZXV4IGxpb24K", contains an encoded newline at the end. 
> It's not spurious.
> 
> Confirmed by the "echo" pipeline I gave above, or in Go itself:
> https://go.dev/play/p/6kSxiCfCTo4
> 
> You can also confirm it by multiplying the length of the input by 3/4 
> 
> % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | wc -c
>   28
> 
> 28*3/4 = 21
> B o n j o u r
> , _ j o y e u
> x _ l i o n \n
> 
> 
> On Tuesday, 14 January 2025 at 10:10:22 UTC Brian Candler wrote:
> 
> > Sorry ignore that, I hadn't checked your playground link.
> >
> > On Tuesday, 14 January 2025 at 10:07:53 UTC Brian Candler wrote:
> >
> >> > AS I wrote earlier, I'm trying to avoid reading the entire email part 
> >> into memory to discover if I should use base64.StdEncoding or 
> >> base64.RawStdEncoding.
> >>
> >> As I asked before, why would you ever need to use RawStdEncoding? It just 
> >> means the MIME part was invalid, most likely corrupted/truncated.
> >>
> >> > One odd thing is that I'm getting extraneous newlines (shown by stars 
> >> in the output), eg:
> >>
> >> You are feeding two different inputs which do not differ by truncation 
> >> alone.
> >>
> >> % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | base64 -D | hexdump -c
> >> 000   B   o   n   j   o   u   r   ,   j   o   y   e   u   x
> >> 010   l   i   o   n  \n
> >> 015
> >>
> >> % echo -n "IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==" | base64 -D | hexdump -c
> >> 000   "   B   o   n   j   o   u   r   ,   j   o   y   e   u   x
> >> 010   l   i   o   n   "
> >> 016
> >>
> >> The second one has encoded double-quotes before and after the content.
> >>
> >> On Monday, 13 January 2025 at 22:43:51 UTC Rory Campbell-Lange wrote:
> >>
> >>> AS I wrote earlier, I'm trying to avoid reading the entire email part 
> >>> into memory to discover if I should use base64.StdEncoding or 
> >>> base64.RawStdEncoding. 
> >>>
> >>> The following seems to work reasonably well: 
> >>>
> >>> type B64Translator struct { 
> >>> br *bufio.Reader 
> >>> } 
> >>>
> >>> func NewB64Translator(r io.Reader) *B64Translator { 
> >>> return &B64Translator{ 
> >>> br: bufio.NewReader(r), 
> >>> } 
> >>> } 
> >>>
> >>> // Read reads off the buffered reader expecting base64.StdEncoding bytes 
> >>> // with (potentially) 1-3 '=' padding characters at the end. 
> >>> // RawStdEncoding can be used for both StdEncoded and RawStdEncoded data 
> >>> // if the padding is removed. 
> >>> func (b *B64Translator) Read(p []byte) (n int, err error) { 
> >>> h := make([]byte, len(p)) 
> >>> n, err = b.br.Read(h) 
> >>> if err != nil { 
> >>> return n, err 
> >>> } 
> >>> // to be optimised 
> >>> c := bytes.Count(h, []byte("=")) 
> >>> copy(p, h[:n-c]) 
> >>> // fmt.Println(string(h), n, string(p), n-c) 
> >>> return n - c, nil 
> >>> } 
> >>>
> >>> https://go.dev/play/p/H6ii7Vy-8as 
> >>>
> >>> One odd thing is that I'm getting extraneous newlines (shown by stars in 
> >>> the output), eg: 
> >>>
> >>> -- 
> >>> raw: Bonjour joyeux lion 
> >>> Qm9uam91ciwgam95ZXV4IGxpb24K 
> >>> ok: false 
> >>> decoded: Bonjour, joyeux lion* < e.g. here 
> >>> -- 
> >>> std: "Bonjour, joyeux lion" 
> >>> IkJvbmpvdXIsIGpveWV1eCBsaW9uIg== 
> >>> ok: true 
> >>> decoded: "Bonjour, joyeux lion" 
> >>> -- 
> >>>
> >>> Any thou

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-14 Thread 'Brian Candler' via golang-nuts
I was more or less right. The input string, which you encoded to 
"Qm9uam91ciwgam95ZXV4IGxpb24K", contains an encoded newline at the end. 
It's not spurious.

Confirmed by the "echo" pipeline I gave above, or in Go itself:
https://go.dev/play/p/6kSxiCfCTo4

You can also confirm it by multiplying the length of the input by 3/4 

% echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | wc -c
  28

28*3/4 = 21
B o n j o u r
, _ j o y e u
x _ l i o n \n


On Tuesday, 14 January 2025 at 10:10:22 UTC Brian Candler wrote:

> Sorry ignore that, I hadn't checked your playground link.
>
> On Tuesday, 14 January 2025 at 10:07:53 UTC Brian Candler wrote:
>
>> > AS I wrote earlier, I'm trying to avoid reading the entire email part 
>> into memory to discover if I should use base64.StdEncoding or 
>> base64.RawStdEncoding.
>>
>> As I asked before, why would you ever need to use RawStdEncoding? It just 
>> means the MIME part was invalid, most likely corrupted/truncated.
>>
>> > One odd thing is that I'm getting extraneous newlines (shown by stars 
>> in the output), eg:
>>
>> You are feeding two different inputs which do not differ by truncation 
>> alone.
>>
>> % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | base64 -D | hexdump -c
>> 000   B   o   n   j   o   u   r   ,   j   o   y   e   u   x
>> 010   l   i   o   n  \n
>> 015
>>
>> % echo -n "IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==" | base64 -D | hexdump -c
>> 000   "   B   o   n   j   o   u   r   ,   j   o   y   e   u   x
>> 010   l   i   o   n   "
>> 016
>>
>> The second one has encoded double-quotes before and after the content.
>>
>> On Monday, 13 January 2025 at 22:43:51 UTC Rory Campbell-Lange wrote:
>>
>>> AS I wrote earlier, I'm trying to avoid reading the entire email part 
>>> into memory to discover if I should use base64.StdEncoding or 
>>> base64.RawStdEncoding. 
>>>
>>> The following seems to work reasonably well: 
>>>
>>> type B64Translator struct { 
>>> br *bufio.Reader 
>>> } 
>>>
>>> func NewB64Translator(r io.Reader) *B64Translator { 
>>> return &B64Translator{ 
>>> br: bufio.NewReader(r), 
>>> } 
>>> } 
>>>
>>> // Read reads off the buffered reader expecting base64.StdEncoding bytes 
>>> // with (potentially) 1-3 '=' padding characters at the end. 
>>> // RawStdEncoding can be used for both StdEncoded and RawStdEncoded data 
>>> // if the padding is removed. 
>>> func (b *B64Translator) Read(p []byte) (n int, err error) { 
>>> h := make([]byte, len(p)) 
>>> n, err = b.br.Read(h) 
>>> if err != nil { 
>>> return n, err 
>>> } 
>>> // to be optimised 
>>> c := bytes.Count(h, []byte("=")) 
>>> copy(p, h[:n-c]) 
>>> // fmt.Println(string(h), n, string(p), n-c) 
>>> return n - c, nil 
>>> } 
>>>
>>> https://go.dev/play/p/H6ii7Vy-8as 
>>>
>>> One odd thing is that I'm getting extraneous newlines (shown by stars in 
>>> the output), eg: 
>>>
>>> -- 
>>> raw: Bonjour joyeux lion 
>>> Qm9uam91ciwgam95ZXV4IGxpb24K 
>>> ok: false 
>>> decoded: Bonjour, joyeux lion* < e.g. here 
>>> -- 
>>> std: "Bonjour, joyeux lion" 
>>> IkJvbmpvdXIsIGpveWV1eCBsaW9uIg== 
>>> ok: true 
>>> decoded: "Bonjour, joyeux lion" 
>>> -- 
>>>
>>> Any thoughts on that would be gratefully received. 
>>>
>>> Rory 
>>>
>>>
>>> On 13/01/25, Rory Campbell-Lange (ro...@campbell-lange.net) wrote: 
>>> > Thanks very much for the playground link and thoughts. 
>>> > 
>>> > The use case is reading base64 email parts, which could be of a very 
>>> large size. It is unclear when processing these parts if they are base64 
>>> padded or not. 
>>> > 
>>> > I'm trying to avoid reading the entire email part into memory. 
>>> Consequently I think your earlier idea of adding padding (or removing it) 
>>> in a wrapper could work. Perhaps wrapping the reader with another using a 
>>> bufio.Reader to track bytes read and detect EOF. At EOF the wrapper could 
>>> add padding if needed. 
>>> > 
>>> > Rory 
>>> > 
>>> > On 13/01/25, Axel Wagner (axel.wa...@googlemail.com) wrote: 
>>> > > Just realized: If you twist the idea around, you get something easy 
>>> to 
>>> > > implement and more correct. 
>>> > > Instead of stripping padding if it exist, you can ensure that the 
>>> body *is* 
>>> > > padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS 
>>> > > You can then feed that to base64.StdEncoding. If the wrapped Reader 
>>> returns 
>>> > > padded Base64, this does nothing. If it returns unpadded Base64, it 
>>> adds 
>>> > > padding. If it returns incorrect Base64, it will create a padded 
>>> stream, 
>>> > > that will then get rejected by the Base64 decoder. 
>>> > > 
>>> > > On Mon, 13 Jan 2025 at 10:31, Axel Wagner  
>>>
>>> > > wrote: 
>>> > > 
>>> > > > Hi, 
>>> > > > 
>>> > > > one way to solve your problem is to wrap the body into an 
>>> io.Reader that 
>>> > > > strips off everything after the first `=` it finds. That can then 
>>> be fed to 
>>> > > > base64.RawStdEncoding. This approach requires no extra buffering 
>>> or copying 
>>> > > > and is

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-14 Thread 'Brian Candler' via golang-nuts
Sorry ignore that, I hadn't checked your playground link.

On Tuesday, 14 January 2025 at 10:07:53 UTC Brian Candler wrote:

> > AS I wrote earlier, I'm trying to avoid reading the entire email part 
> into memory to discover if I should use base64.StdEncoding or 
> base64.RawStdEncoding.
>
> As I asked before, why would you ever need to use RawStdEncoding? It just 
> means the MIME part was invalid, most likely corrupted/truncated.
>
> > One odd thing is that I'm getting extraneous newlines (shown by stars in 
> the output), eg:
>
> You are feeding two different inputs which do not differ by truncation 
> alone.
>
> % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | base64 -D | hexdump -c
> 000   B   o   n   j   o   u   r   ,   j   o   y   e   u   x
> 010   l   i   o   n  \n
> 015
>
> % echo -n "IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==" | base64 -D | hexdump -c
> 000   "   B   o   n   j   o   u   r   ,   j   o   y   e   u   x
> 010   l   i   o   n   "
> 016
>
> The second one has encoded double-quotes before and after the content.
>
> On Monday, 13 January 2025 at 22:43:51 UTC Rory Campbell-Lange wrote:
>
>> AS I wrote earlier, I'm trying to avoid reading the entire email part 
>> into memory to discover if I should use base64.StdEncoding or 
>> base64.RawStdEncoding. 
>>
>> The following seems to work reasonably well: 
>>
>> type B64Translator struct { 
>> br *bufio.Reader 
>> } 
>>
>> func NewB64Translator(r io.Reader) *B64Translator { 
>> return &B64Translator{ 
>> br: bufio.NewReader(r), 
>> } 
>> } 
>>
>> // Read reads off the buffered reader expecting base64.StdEncoding bytes 
>> // with (potentially) 1-3 '=' padding characters at the end. 
>> // RawStdEncoding can be used for both StdEncoded and RawStdEncoded data 
>> // if the padding is removed. 
>> func (b *B64Translator) Read(p []byte) (n int, err error) { 
>> h := make([]byte, len(p)) 
>> n, err = b.br.Read(h) 
>> if err != nil { 
>> return n, err 
>> } 
>> // to be optimised 
>> c := bytes.Count(h, []byte("=")) 
>> copy(p, h[:n-c]) 
>> // fmt.Println(string(h), n, string(p), n-c) 
>> return n - c, nil 
>> } 
>>
>> https://go.dev/play/p/H6ii7Vy-8as 
>>
>> One odd thing is that I'm getting extraneous newlines (shown by stars in 
>> the output), eg: 
>>
>> -- 
>> raw: Bonjour joyeux lion 
>> Qm9uam91ciwgam95ZXV4IGxpb24K 
>> ok: false 
>> decoded: Bonjour, joyeux lion* < e.g. here 
>> -- 
>> std: "Bonjour, joyeux lion" 
>> IkJvbmpvdXIsIGpveWV1eCBsaW9uIg== 
>> ok: true 
>> decoded: "Bonjour, joyeux lion" 
>> -- 
>>
>> Any thoughts on that would be gratefully received. 
>>
>> Rory 
>>
>>
>> On 13/01/25, Rory Campbell-Lange (ro...@campbell-lange.net) wrote: 
>> > Thanks very much for the playground link and thoughts. 
>> > 
>> > The use case is reading base64 email parts, which could be of a very 
>> large size. It is unclear when processing these parts if they are base64 
>> padded or not. 
>> > 
>> > I'm trying to avoid reading the entire email part into memory. 
>> Consequently I think your earlier idea of adding padding (or removing it) 
>> in a wrapper could work. Perhaps wrapping the reader with another using a 
>> bufio.Reader to track bytes read and detect EOF. At EOF the wrapper could 
>> add padding if needed. 
>> > 
>> > Rory 
>> > 
>> > On 13/01/25, Axel Wagner (axel.wa...@googlemail.com) wrote: 
>> > > Just realized: If you twist the idea around, you get something easy 
>> to 
>> > > implement and more correct. 
>> > > Instead of stripping padding if it exist, you can ensure that the 
>> body *is* 
>> > > padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS 
>> > > You can then feed that to base64.StdEncoding. If the wrapped Reader 
>> returns 
>> > > padded Base64, this does nothing. If it returns unpadded Base64, it 
>> adds 
>> > > padding. If it returns incorrect Base64, it will create a padded 
>> stream, 
>> > > that will then get rejected by the Base64 decoder. 
>> > > 
>> > > On Mon, 13 Jan 2025 at 10:31, Axel Wagner  
>>
>> > > wrote: 
>> > > 
>> > > > Hi, 
>> > > > 
>> > > > one way to solve your problem is to wrap the body into an io.Reader 
>> that 
>> > > > strips off everything after the first `=` it finds. That can then 
>> be fed to 
>> > > > base64.RawStdEncoding. This approach requires no extra buffering or 
>> copying 
>> > > > and is easy to implement: https://go.dev/play/p/CwcVz7oietI 
>> > > > 
>> > > > The downside is, that this will not verify that the body is 
>> *either* 
>> > > > correctly padded Base64 *or* unpadded Base64. So, it will not 
>> report an 
>> > > > error if fed something like "AAA=garbage". 
>> > > > That can be remedied by buffering up to four bytes and, when 
>> encountering 
>> > > > an EOF, check that there are at most three trailing `=` and that 
>> the total 
>> > > > length of the stream is divisible by four. It's more finicky to 
>> implement, 
>> > > > but it should also be possible without any extra copies and only 
>> requires a 

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-14 Thread 'Brian Candler' via golang-nuts
> AS I wrote earlier, I'm trying to avoid reading the entire email part 
into memory to discover if I should use base64.StdEncoding or 
base64.RawStdEncoding.

As I asked before, why would you ever need to use RawStdEncoding? It just 
means the MIME part was invalid, most likely corrupted/truncated.

> One odd thing is that I'm getting extraneous newlines (shown by stars in 
the output), eg:

You are feeding two different inputs which do not differ by truncation 
alone.

% echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | base64 -D | hexdump -c
000   B   o   n   j   o   u   r   ,   j   o   y   e   u   x
010   l   i   o   n  \n
015

% echo -n "IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==" | base64 -D | hexdump -c
000   "   B   o   n   j   o   u   r   ,   j   o   y   e   u   x
010   l   i   o   n   "
016

The second one has encoded double-quotes before and after the content.

On Monday, 13 January 2025 at 22:43:51 UTC Rory Campbell-Lange wrote:

> AS I wrote earlier, I'm trying to avoid reading the entire email part into 
> memory to discover if I should use base64.StdEncoding or 
> base64.RawStdEncoding.
>
> The following seems to work reasonably well:
>
> type B64Translator struct {
> br *bufio.Reader
> }
>
> func NewB64Translator(r io.Reader) *B64Translator {
> return &B64Translator{
> br: bufio.NewReader(r),
> }
> }
>
> // Read reads off the buffered reader expecting base64.StdEncoding bytes
> // with (potentially) 1-3 '=' padding characters at the end.
> // RawStdEncoding can be used for both StdEncoded and RawStdEncoded data
> // if the padding is removed.
> func (b *B64Translator) Read(p []byte) (n int, err error) {
> h := make([]byte, len(p))
> n, err = b.br.Read(h)
> if err != nil {
> return n, err
> }
> // to be optimised
> c := bytes.Count(h, []byte("="))
> copy(p, h[:n-c])
> // fmt.Println(string(h), n, string(p), n-c)
> return n - c, nil
> }
>
> https://go.dev/play/p/H6ii7Vy-8as
>
> One odd thing is that I'm getting extraneous newlines (shown by stars in 
> the output), eg:
>
> --
> raw: Bonjour joyeux lion
> Qm9uam91ciwgam95ZXV4IGxpb24K
> ok: false
> decoded: Bonjour, joyeux lion* < e.g. here
> --
> std: "Bonjour, joyeux lion"
> IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==
> ok: true
> decoded: "Bonjour, joyeux lion"
> --
>
> Any thoughts on that would be gratefully received. 
>
> Rory
>
>
> On 13/01/25, Rory Campbell-Lange (ro...@campbell-lange.net) wrote:
> > Thanks very much for the playground link and thoughts.
> > 
> > The use case is reading base64 email parts, which could be of a very 
> large size. It is unclear when processing these parts if they are base64 
> padded or not.
> > 
> > I'm trying to avoid reading the entire email part into memory. 
> Consequently I think your earlier idea of adding padding (or removing it) 
> in a wrapper could work. Perhaps wrapping the reader with another using a 
> bufio.Reader to track bytes read and detect EOF. At EOF the wrapper could 
> add padding if needed.
> > 
> > Rory
> > 
> > On 13/01/25, Axel Wagner (axel.wa...@googlemail.com) wrote:
> > > Just realized: If you twist the idea around, you get something easy to
> > > implement and more correct.
> > > Instead of stripping padding if it exist, you can ensure that the body 
> *is*
> > > padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS
> > > You can then feed that to base64.StdEncoding. If the wrapped Reader 
> returns
> > > padded Base64, this does nothing. If it returns unpadded Base64, it 
> adds
> > > padding. If it returns incorrect Base64, it will create a padded 
> stream,
> > > that will then get rejected by the Base64 decoder.
> > > 
> > > On Mon, 13 Jan 2025 at 10:31, Axel Wagner 
> > > wrote:
> > > 
> > > > Hi,
> > > >
> > > > one way to solve your problem is to wrap the body into an io.Reader 
> that
> > > > strips off everything after the first `=` it finds. That can then be 
> fed to
> > > > base64.RawStdEncoding. This approach requires no extra buffering or 
> copying
> > > > and is easy to implement: https://go.dev/play/p/CwcVz7oietI
> > > >
> > > > The downside is, that this will not verify that the body is *either*
> > > > correctly padded Base64 *or* unpadded Base64. So, it will not report 
> an
> > > > error if fed something like "AAA=garbage".
> > > > That can be remedied by buffering up to four bytes and, when 
> encountering
> > > > an EOF, check that there are at most three trailing `=` and that the 
> total
> > > > length of the stream is divisible by four. It's more finicky to 
> implement,
> > > > but it should also be possible without any extra copies and only 
> requires a
> > > > very small extra buffer.
> > > >
> > > > On Sun, 12 Jan 2025 at 22:29, Rory Campbell-Lange <
> ro...@campbell-lange.net>
> > > > wrote:
> > > >
> > > >> Thanks very much for the links, pointers and possible solution.
> > > >>
> > > >> Trying to read base64 standard (padded) encoded data with
> > > >> base64.RawStdEncoding can produce an error such as

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-13 Thread robert engels
You wouldn’t get an eof if the data is properly encoded. Not sure what the 
problem is.

You need to be doing something with the Reader - most likely writing to a file, 
streaming to a database record, etc.

I would simplify the code to a single test case that demonstrates the issue you 
are having with the code.

> On Jan 13, 2025, at 5:34 PM, Rory Campbell-Lange  
> wrote:
> 
> I'm just doing the reverse of that, I think, by removing the padding.
> 
> I can't seem to trigger an EOF with this code below:
> 
>>>   n, err = b.br.Read(h)
>>>   if err != nil {
>>>   return n, err
>>>   }
> 
> 
> On 13/01/25, robert engels (reng...@ix.netcom.com 
> ) wrote:
>> As has been pointing out, you don’t need to read the whole thing into 
>> memory, just wrap the data provider with one that adds the padding it 
>> doesn’t exist - and always read with the padded decoder.
>> 
>> To add the padding you only need to keep track of the count of characters 
>> read before eof to determine how many padding characters to synthetically 
>> add - if the original data is padding this will be 0 (if it was padded 
>> correctly).
>> 
>>> On Jan 13, 2025, at 4:42 PM, Rory Campbell-Lange  
>>> wrote:
>>> 
>>> AS I wrote earlier, I'm trying to avoid reading the entire email part into 
>>> memory to discover if I should use base64.StdEncoding or 
>>> base64.RawStdEncoding.
>>> 
>>> The following seems to work reasonably well:
>>> 
>>>   type B64Translator struct {
>>>   br *bufio.Reader
>>>   }
>>> 
>>>   func NewB64Translator(r io.Reader) *B64Translator {
>>>   return &B64Translator{
>>>   br: bufio.NewReader(r),
>>>   }
>>>   }
>>> 
>>>   // Read reads off the buffered reader expecting base64.StdEncoding bytes
>>>   // with (potentially) 1-3 '=' padding characters at the end.
>>>   // RawStdEncoding can be used for both StdEncoded and RawStdEncoded data
>>>   // if the padding is removed.
>>>   func (b *B64Translator) Read(p []byte) (n int, err error) {
>>>   h := make([]byte, len(p))
>>>   n, err = b.br.Read(h)
>>>   if err != nil {
>>>   return n, err
>>>   }
>>>   // to be optimised
>>>   c := bytes.Count(h, []byte("="))
>>>   copy(p, h[:n-c])
>>>   // fmt.Println(string(h), n, string(p), n-c)
>>>   return n - c, nil
>>>   }
>>> 
>>> https://go.dev/play/p/H6ii7Vy-8as
>>> 
>>> One odd thing is that I'm getting extraneous newlines (shown by stars in 
>>> the output), eg:
>>> 
>>> --
>>>raw: Bonjour joyeux lion
>>> Qm9uam91ciwgam95ZXV4IGxpb24K
>>> ok: false
>>>decoded: Bonjour, joyeux lion* < e.g. here
>>> --
>>>std: "Bonjour, joyeux lion"
>>> IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==
>>> ok: true
>>>decoded: "Bonjour, joyeux lion"
>>> --
>>> 
>>> Any thoughts on that would be gratefully received. 
>>> 
>>> Rory
>>> 
>>> 
>>> On 13/01/25, Rory Campbell-Lange (r...@campbell-lange.net 
>>>  ) wrote:
 Thanks very much for the playground link and thoughts.
 
 The use case is reading base64 email parts, which could be of a very large 
 size. It is unclear when processing these parts if they are base64 padded 
 or not.
 
 I'm trying to avoid reading the entire email part into memory. 
 Consequently I think your earlier idea of adding padding (or removing it) 
 in a wrapper could work. Perhaps wrapping the reader with another using a 
 bufio.Reader to track bytes read and detect EOF. At EOF the wrapper could 
 add padding if needed.
 
 Rory
 
 On 13/01/25, Axel Wagner (axel.wagner...@googlemail.com 
 )
  wrote:
> Just realized: If you twist the idea around, you get something easy to
> implement and more correct.
> Instead of stripping padding if it exist, you can ensure that the body 
> *is*
> padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS
> You can then feed that to base64.StdEncoding. If the wrapped Reader 
> returns
> padded Base64, this does nothing. If it returns unpadded Base64, it adds
> padding. If it returns incorrect Base64, it will create a padded stream,
> that will then get rejected by the Base64 decoder.
> 
> On Mon, 13 Jan 2025 at 10:31, Axel Wagner  >
> wrote:
> 
>> Hi,
>> 
>> one way to solve your problem is to wrap the body into an io.Reader that
>> strips off everything after the first `=` it finds. That can then be fed 
>> to
>> base64.RawStdEncoding. This approach requires no extra buffering or 
>> copying
>> and is easy to implement: https://go.dev/play/p/C

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-13 Thread Rory Campbell-Lange
I'm just doing the reverse of that, I think, by removing the padding.

I can't seem to trigger an EOF with this code below:

> >n, err = b.br.Read(h)
> >if err != nil {
> >return n, err
> >}


On 13/01/25, robert engels (reng...@ix.netcom.com) wrote:
> As has been pointing out, you don’t need to read the whole thing into memory, 
> just wrap the data provider with one that adds the padding it doesn’t exist - 
> and always read with the padded decoder.
> 
> To add the padding you only need to keep track of the count of characters 
> read before eof to determine how many padding characters to synthetically add 
> - if the original data is padding this will be 0 (if it was padded correctly).
> 
> > On Jan 13, 2025, at 4:42 PM, Rory Campbell-Lange  
> > wrote:
> > 
> > AS I wrote earlier, I'm trying to avoid reading the entire email part into 
> > memory to discover if I should use base64.StdEncoding or 
> > base64.RawStdEncoding.
> > 
> > The following seems to work reasonably well:
> > 
> >type B64Translator struct {
> >br *bufio.Reader
> >}
> > 
> >func NewB64Translator(r io.Reader) *B64Translator {
> >return &B64Translator{
> >br: bufio.NewReader(r),
> >}
> >}
> > 
> >// Read reads off the buffered reader expecting base64.StdEncoding bytes
> >// with (potentially) 1-3 '=' padding characters at the end.
> >// RawStdEncoding can be used for both StdEncoded and RawStdEncoded data
> >// if the padding is removed.
> >func (b *B64Translator) Read(p []byte) (n int, err error) {
> >h := make([]byte, len(p))
> >n, err = b.br.Read(h)
> >if err != nil {
> >return n, err
> >}
> >// to be optimised
> >c := bytes.Count(h, []byte("="))
> >copy(p, h[:n-c])
> >// fmt.Println(string(h), n, string(p), n-c)
> >return n - c, nil
> >}
> > 
> > https://go.dev/play/p/H6ii7Vy-8as
> > 
> > One odd thing is that I'm getting extraneous newlines (shown by stars in 
> > the output), eg:
> > 
> > --
> >raw: Bonjour joyeux lion
> > Qm9uam91ciwgam95ZXV4IGxpb24K
> > ok: false
> >decoded: Bonjour, joyeux lion* < e.g. here
> > --
> >std: "Bonjour, joyeux lion"
> > IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==
> > ok: true
> >decoded: "Bonjour, joyeux lion"
> > --
> > 
> > Any thoughts on that would be gratefully received. 
> > 
> > Rory
> > 
> > 
> > On 13/01/25, Rory Campbell-Lange (r...@campbell-lange.net 
> > ) wrote:
> >> Thanks very much for the playground link and thoughts.
> >> 
> >> The use case is reading base64 email parts, which could be of a very large 
> >> size. It is unclear when processing these parts if they are base64 padded 
> >> or not.
> >> 
> >> I'm trying to avoid reading the entire email part into memory. 
> >> Consequently I think your earlier idea of adding padding (or removing it) 
> >> in a wrapper could work. Perhaps wrapping the reader with another using a 
> >> bufio.Reader to track bytes read and detect EOF. At EOF the wrapper could 
> >> add padding if needed.
> >> 
> >> Rory
> >> 
> >> On 13/01/25, Axel Wagner (axel.wagner...@googlemail.com 
> >> ) wrote:
> >>> Just realized: If you twist the idea around, you get something easy to
> >>> implement and more correct.
> >>> Instead of stripping padding if it exist, you can ensure that the body 
> >>> *is*
> >>> padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS
> >>> You can then feed that to base64.StdEncoding. If the wrapped Reader 
> >>> returns
> >>> padded Base64, this does nothing. If it returns unpadded Base64, it adds
> >>> padding. If it returns incorrect Base64, it will create a padded stream,
> >>> that will then get rejected by the Base64 decoder.
> >>> 
> >>> On Mon, 13 Jan 2025 at 10:31, Axel Wagner  >>> >
> >>> wrote:
> >>> 
>  Hi,
>  
>  one way to solve your problem is to wrap the body into an io.Reader that
>  strips off everything after the first `=` it finds. That can then be fed 
>  to
>  base64.RawStdEncoding. This approach requires no extra buffering or 
>  copying
>  and is easy to implement: https://go.dev/play/p/CwcVz7oietI
>  
>  The downside is, that this will not verify that the body is *either*
>  correctly padded Base64 *or* unpadded Base64. So, it will not report an
>  error if fed something like "AAA=garbage".
>  That can be remedied by buffering up to four bytes and, when encountering
>  an EOF, check that there are at most three trailing `=` and that the 
>  total
>  length of the stream is divisible by four. It's more finicky to 
>  implement,
>  but it should also be possible witho

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-13 Thread robert engels
As has been pointing out, you don’t need to read the whole thing into memory, 
just wrap the data provider with one that adds the padding it doesn’t exist - 
and always read with the padded decoder.

To add the padding you only need to keep track of the count of characters read 
before eof to determine how many padding characters to synthetically add - if 
the original data is padding this will be 0 (if it was padded correctly).

> On Jan 13, 2025, at 4:42 PM, Rory Campbell-Lange  
> wrote:
> 
> AS I wrote earlier, I'm trying to avoid reading the entire email part into 
> memory to discover if I should use base64.StdEncoding or 
> base64.RawStdEncoding.
> 
> The following seems to work reasonably well:
> 
>type B64Translator struct {
>br *bufio.Reader
>}
> 
>func NewB64Translator(r io.Reader) *B64Translator {
>return &B64Translator{
>br: bufio.NewReader(r),
>}
>}
> 
>// Read reads off the buffered reader expecting base64.StdEncoding bytes
>// with (potentially) 1-3 '=' padding characters at the end.
>// RawStdEncoding can be used for both StdEncoded and RawStdEncoded data
>// if the padding is removed.
>func (b *B64Translator) Read(p []byte) (n int, err error) {
>h := make([]byte, len(p))
>n, err = b.br.Read(h)
>if err != nil {
>return n, err
>}
>// to be optimised
>c := bytes.Count(h, []byte("="))
>copy(p, h[:n-c])
>// fmt.Println(string(h), n, string(p), n-c)
>return n - c, nil
>}
> 
> https://go.dev/play/p/H6ii7Vy-8as
> 
> One odd thing is that I'm getting extraneous newlines (shown by stars in the 
> output), eg:
> 
>   --
>  raw: Bonjour joyeux lion
>   Qm9uam91ciwgam95ZXV4IGxpb24K
>   ok: false
>  decoded: Bonjour, joyeux lion* < e.g. here
>   --
>  std: "Bonjour, joyeux lion"
>   IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==
>   ok: true
>  decoded: "Bonjour, joyeux lion"
>   --
> 
> Any thoughts on that would be gratefully received. 
> 
> Rory
> 
> 
> On 13/01/25, Rory Campbell-Lange (r...@campbell-lange.net 
> ) wrote:
>> Thanks very much for the playground link and thoughts.
>> 
>> The use case is reading base64 email parts, which could be of a very large 
>> size. It is unclear when processing these parts if they are base64 padded or 
>> not.
>> 
>> I'm trying to avoid reading the entire email part into memory. Consequently 
>> I think your earlier idea of adding padding (or removing it) in a wrapper 
>> could work. Perhaps wrapping the reader with another using a bufio.Reader to 
>> track bytes read and detect EOF. At EOF the wrapper could add padding if 
>> needed.
>> 
>> Rory
>> 
>> On 13/01/25, Axel Wagner (axel.wagner...@googlemail.com 
>> ) wrote:
>>> Just realized: If you twist the idea around, you get something easy to
>>> implement and more correct.
>>> Instead of stripping padding if it exist, you can ensure that the body *is*
>>> padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS
>>> You can then feed that to base64.StdEncoding. If the wrapped Reader returns
>>> padded Base64, this does nothing. If it returns unpadded Base64, it adds
>>> padding. If it returns incorrect Base64, it will create a padded stream,
>>> that will then get rejected by the Base64 decoder.
>>> 
>>> On Mon, 13 Jan 2025 at 10:31, Axel Wagner >> >
>>> wrote:
>>> 
 Hi,
 
 one way to solve your problem is to wrap the body into an io.Reader that
 strips off everything after the first `=` it finds. That can then be fed to
 base64.RawStdEncoding. This approach requires no extra buffering or copying
 and is easy to implement: https://go.dev/play/p/CwcVz7oietI
 
 The downside is, that this will not verify that the body is *either*
 correctly padded Base64 *or* unpadded Base64. So, it will not report an
 error if fed something like "AAA=garbage".
 That can be remedied by buffering up to four bytes and, when encountering
 an EOF, check that there are at most three trailing `=` and that the total
 length of the stream is divisible by four. It's more finicky to implement,
 but it should also be possible without any extra copies and only requires a
 very small extra buffer.
 
 On Sun, 12 Jan 2025 at 22:29, Rory Campbell-Lange >>> >
 wrote:
 
> Thanks very much for the links, pointers and possible solution.
> 
> Trying to read base64 standard (padded) encoded data with
> base64.RawStdEncoding can produce an error such as
> 
>illegal base64 data at input byte 
> 
> Reading base64 raw (unpadded) encoded data produces the EOF error.
> 
> I'll

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-13 Thread Rory Campbell-Lange
AS I wrote earlier, I'm trying to avoid reading the entire email part into 
memory to discover if I should use base64.StdEncoding or base64.RawStdEncoding.

The following seems to work reasonably well:

type B64Translator struct {
br *bufio.Reader
}

func NewB64Translator(r io.Reader) *B64Translator {
return &B64Translator{
br: bufio.NewReader(r),
}
}

// Read reads off the buffered reader expecting base64.StdEncoding bytes
// with (potentially) 1-3 '=' padding characters at the end.
// RawStdEncoding can be used for both StdEncoded and RawStdEncoded data
// if the padding is removed.
func (b *B64Translator) Read(p []byte) (n int, err error) {
h := make([]byte, len(p))
n, err = b.br.Read(h)
if err != nil {
return n, err
}
// to be optimised
c := bytes.Count(h, []byte("="))
copy(p, h[:n-c])
// fmt.Println(string(h), n, string(p), n-c)
return n - c, nil
}

https://go.dev/play/p/H6ii7Vy-8as

One odd thing is that I'm getting extraneous newlines (shown by stars in the 
output), eg:

--
   raw: Bonjour joyeux lion
Qm9uam91ciwgam95ZXV4IGxpb24K
ok: false
   decoded: Bonjour, joyeux lion* < e.g. here
--
   std: "Bonjour, joyeux lion"
IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==
ok: true
   decoded: "Bonjour, joyeux lion"
--

Any thoughts on that would be gratefully received. 

Rory


On 13/01/25, Rory Campbell-Lange (r...@campbell-lange.net) wrote:
> Thanks very much for the playground link and thoughts.
> 
> The use case is reading base64 email parts, which could be of a very large 
> size. It is unclear when processing these parts if they are base64 padded or 
> not.
> 
> I'm trying to avoid reading the entire email part into memory. Consequently I 
> think your earlier idea of adding padding (or removing it) in a wrapper could 
> work. Perhaps wrapping the reader with another using a bufio.Reader to track 
> bytes read and detect EOF. At EOF the wrapper could add padding if needed.
> 
> Rory
> 
> On 13/01/25, Axel Wagner (axel.wagner...@googlemail.com) wrote:
> > Just realized: If you twist the idea around, you get something easy to
> > implement and more correct.
> > Instead of stripping padding if it exist, you can ensure that the body *is*
> > padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS
> > You can then feed that to base64.StdEncoding. If the wrapped Reader returns
> > padded Base64, this does nothing. If it returns unpadded Base64, it adds
> > padding. If it returns incorrect Base64, it will create a padded stream,
> > that will then get rejected by the Base64 decoder.
> > 
> > On Mon, 13 Jan 2025 at 10:31, Axel Wagner 
> > wrote:
> > 
> > > Hi,
> > >
> > > one way to solve your problem is to wrap the body into an io.Reader that
> > > strips off everything after the first `=` it finds. That can then be fed 
> > > to
> > > base64.RawStdEncoding. This approach requires no extra buffering or 
> > > copying
> > > and is easy to implement: https://go.dev/play/p/CwcVz7oietI
> > >
> > > The downside is, that this will not verify that the body is *either*
> > > correctly padded Base64 *or* unpadded Base64. So, it will not report an
> > > error if fed something like "AAA=garbage".
> > > That can be remedied by buffering up to four bytes and, when encountering
> > > an EOF, check that there are at most three trailing `=` and that the total
> > > length of the stream is divisible by four. It's more finicky to implement,
> > > but it should also be possible without any extra copies and only requires 
> > > a
> > > very small extra buffer.
> > >
> > > On Sun, 12 Jan 2025 at 22:29, Rory Campbell-Lange 
> > > 
> > > wrote:
> > >
> > >> Thanks very much for the links, pointers and possible solution.
> > >>
> > >> Trying to read base64 standard (padded) encoded data with
> > >> base64.RawStdEncoding can produce an error such as
> > >>
> > >> illegal base64 data at input byte 
> > >>
> > >> Reading base64 raw (unpadded) encoded data produces the EOF error.
> > >>
> > >> I'll go with trying to read the standard encoded data up to maybe 1MB and
> > >> then switch to base64.RawStdEncoding if I hit the "illegal base64 data"
> > >> problem, maybe with reference to bufio.Reader which has most of the 
> > >> methods
> > >> suggested below.
> > >>
> > >> Yes, the use of a "Rewind" method would be crucial. I guess this would
> > >> need to:
> > >> 1. error if more than one buffer of data has been read
> > >> 2. else re-read from byte 0
> > >>
> > >> Thanks again very much for these suggestions.
> > >>
> > >> Rory
> > >>
> > >> On 12/01/25, robert engels (reng...@ix.netcom.com) wrote:
> > >> > Also, see this
> > >> https://stackoverflow.com/questions/69753478/use

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-13 Thread Rory Campbell-Lange
Thanks very much for the playground link and thoughts.

The use case is reading base64 email parts, which could be of a very large 
size. It is unclear when processing these parts if they are base64 padded or 
not.

I'm trying to avoid reading the entire email part into memory. Consequently I 
think your earlier idea of adding padding (or removing it) in a wrapper could 
work. Perhaps wrapping the reader with another using a bufio.Reader to track 
bytes read and detect EOF. At EOF the wrapper could add padding if needed.

Rory

On 13/01/25, Axel Wagner (axel.wagner...@googlemail.com) wrote:
> Just realized: If you twist the idea around, you get something easy to
> implement and more correct.
> Instead of stripping padding if it exist, you can ensure that the body *is*
> padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS
> You can then feed that to base64.StdEncoding. If the wrapped Reader returns
> padded Base64, this does nothing. If it returns unpadded Base64, it adds
> padding. If it returns incorrect Base64, it will create a padded stream,
> that will then get rejected by the Base64 decoder.
> 
> On Mon, 13 Jan 2025 at 10:31, Axel Wagner 
> wrote:
> 
> > Hi,
> >
> > one way to solve your problem is to wrap the body into an io.Reader that
> > strips off everything after the first `=` it finds. That can then be fed to
> > base64.RawStdEncoding. This approach requires no extra buffering or copying
> > and is easy to implement: https://go.dev/play/p/CwcVz7oietI
> >
> > The downside is, that this will not verify that the body is *either*
> > correctly padded Base64 *or* unpadded Base64. So, it will not report an
> > error if fed something like "AAA=garbage".
> > That can be remedied by buffering up to four bytes and, when encountering
> > an EOF, check that there are at most three trailing `=` and that the total
> > length of the stream is divisible by four. It's more finicky to implement,
> > but it should also be possible without any extra copies and only requires a
> > very small extra buffer.
> >
> > On Sun, 12 Jan 2025 at 22:29, Rory Campbell-Lange 
> > wrote:
> >
> >> Thanks very much for the links, pointers and possible solution.
> >>
> >> Trying to read base64 standard (padded) encoded data with
> >> base64.RawStdEncoding can produce an error such as
> >>
> >> illegal base64 data at input byte 
> >>
> >> Reading base64 raw (unpadded) encoded data produces the EOF error.
> >>
> >> I'll go with trying to read the standard encoded data up to maybe 1MB and
> >> then switch to base64.RawStdEncoding if I hit the "illegal base64 data"
> >> problem, maybe with reference to bufio.Reader which has most of the methods
> >> suggested below.
> >>
> >> Yes, the use of a "Rewind" method would be crucial. I guess this would
> >> need to:
> >> 1. error if more than one buffer of data has been read
> >> 2. else re-read from byte 0
> >>
> >> Thanks again very much for these suggestions.
> >>
> >> Rory
> >>
> >> On 12/01/25, robert engels (reng...@ix.netcom.com) wrote:
> >> > Also, see this
> >> https://stackoverflow.com/questions/69753478/use-base64-stdencoding-or-base64-rawstdencoding-to-decode-base64-string-in-go
> >> as I expected the error should be reported earlier than the end of stream
> >> if the chosen format is wrong.
> >> >
> >> > > On Jan 12, 2025, at 2:57 PM, robert engels 
> >> wrote:
> >> > >
> >> > > Also, this is what Gemini provided which looks basically correct -
> >> but I think encapsulating it with a Rewind() method would be easier to
> >> understand.
> >> > >
> >> > >
> >> > >
> >> > > While Go doesn't have a built-in PushbackReader like some other
> >> languages (e.g., Java), you can implement similar functionality using a
> >> custom struct and a buffer.
> >> > >
> >> > > Here's an example implementation:
> >> > >
> >> > > package main
> >> > >
> >> > > import (
> >> > > "bytes"
> >> > > "io"
> >> > > )
> >> > >
> >> > > type PushbackReader struct {
> >> > > reader io.Reader
> >> > > buffer *bytes.Buffer
> >> > > }
> >> > >
> >> > > func NewPushbackReader(r io.Reader) *PushbackReader {
> >> > > return &PushbackReader{
> >> > > reader: r,
> >> > > buffer: new(bytes.Buffer),
> >> > > }
> >> > > }
> >> > >
> >> > > func (p *PushbackReader) Read(b []byte) (n int, err error) {
> >> > > if p.buffer.Len() > 0 {
> >> > > return p.buffer.Read(b)
> >> > > }
> >> > > return p.reader.Read(b)
> >> > > }
> >> > >
> >> > > func (p *PushbackReader) UnreadByte() error {
> >> > > if p.buffer.Len() == 0 {
> >> > > return io.EOF
> >> > > }
> >> > > lastByte := p.buffer.Bytes()[p.buffer.Len()-1]
> >> > > p.buffer.Truncate(p.buffer.Len() - 1)
> >> > > p.buffer.WriteByte(lastByte)
> >> > > return nil
> >> > > }
> >> > >
> >> > > func (p *PushbackReader) Unread(buf []byte) error {
> >> > > if p.buffer.Len() == 0 {
> >> > > return io.EOF
> >> > > }
> >> > > p.buffer.Write(buf)
> >> > >   

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-13 Thread 'Axel Wagner' via golang-nuts
Just realized: If you twist the idea around, you get something easy to
implement and more correct.
Instead of stripping padding if it exist, you can ensure that the body *is*
padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS
You can then feed that to base64.StdEncoding. If the wrapped Reader returns
padded Base64, this does nothing. If it returns unpadded Base64, it adds
padding. If it returns incorrect Base64, it will create a padded stream,
that will then get rejected by the Base64 decoder.

On Mon, 13 Jan 2025 at 10:31, Axel Wagner 
wrote:

> Hi,
>
> one way to solve your problem is to wrap the body into an io.Reader that
> strips off everything after the first `=` it finds. That can then be fed to
> base64.RawStdEncoding. This approach requires no extra buffering or copying
> and is easy to implement: https://go.dev/play/p/CwcVz7oietI
>
> The downside is, that this will not verify that the body is *either*
> correctly padded Base64 *or* unpadded Base64. So, it will not report an
> error if fed something like "AAA=garbage".
> That can be remedied by buffering up to four bytes and, when encountering
> an EOF, check that there are at most three trailing `=` and that the total
> length of the stream is divisible by four. It's more finicky to implement,
> but it should also be possible without any extra copies and only requires a
> very small extra buffer.
>
> On Sun, 12 Jan 2025 at 22:29, Rory Campbell-Lange 
> wrote:
>
>> Thanks very much for the links, pointers and possible solution.
>>
>> Trying to read base64 standard (padded) encoded data with
>> base64.RawStdEncoding can produce an error such as
>>
>> illegal base64 data at input byte 
>>
>> Reading base64 raw (unpadded) encoded data produces the EOF error.
>>
>> I'll go with trying to read the standard encoded data up to maybe 1MB and
>> then switch to base64.RawStdEncoding if I hit the "illegal base64 data"
>> problem, maybe with reference to bufio.Reader which has most of the methods
>> suggested below.
>>
>> Yes, the use of a "Rewind" method would be crucial. I guess this would
>> need to:
>> 1. error if more than one buffer of data has been read
>> 2. else re-read from byte 0
>>
>> Thanks again very much for these suggestions.
>>
>> Rory
>>
>> On 12/01/25, robert engels (reng...@ix.netcom.com) wrote:
>> > Also, see this
>> https://stackoverflow.com/questions/69753478/use-base64-stdencoding-or-base64-rawstdencoding-to-decode-base64-string-in-go
>> as I expected the error should be reported earlier than the end of stream
>> if the chosen format is wrong.
>> >
>> > > On Jan 12, 2025, at 2:57 PM, robert engels 
>> wrote:
>> > >
>> > > Also, this is what Gemini provided which looks basically correct -
>> but I think encapsulating it with a Rewind() method would be easier to
>> understand.
>> > >
>> > >
>> > >
>> > > While Go doesn't have a built-in PushbackReader like some other
>> languages (e.g., Java), you can implement similar functionality using a
>> custom struct and a buffer.
>> > >
>> > > Here's an example implementation:
>> > >
>> > > package main
>> > >
>> > > import (
>> > > "bytes"
>> > > "io"
>> > > )
>> > >
>> > > type PushbackReader struct {
>> > > reader io.Reader
>> > > buffer *bytes.Buffer
>> > > }
>> > >
>> > > func NewPushbackReader(r io.Reader) *PushbackReader {
>> > > return &PushbackReader{
>> > > reader: r,
>> > > buffer: new(bytes.Buffer),
>> > > }
>> > > }
>> > >
>> > > func (p *PushbackReader) Read(b []byte) (n int, err error) {
>> > > if p.buffer.Len() > 0 {
>> > > return p.buffer.Read(b)
>> > > }
>> > > return p.reader.Read(b)
>> > > }
>> > >
>> > > func (p *PushbackReader) UnreadByte() error {
>> > > if p.buffer.Len() == 0 {
>> > > return io.EOF
>> > > }
>> > > lastByte := p.buffer.Bytes()[p.buffer.Len()-1]
>> > > p.buffer.Truncate(p.buffer.Len() - 1)
>> > > p.buffer.WriteByte(lastByte)
>> > > return nil
>> > > }
>> > >
>> > > func (p *PushbackReader) Unread(buf []byte) error {
>> > > if p.buffer.Len() == 0 {
>> > > return io.EOF
>> > > }
>> > > p.buffer.Write(buf)
>> > > return nil
>> > > }
>> > >
>> > > func main() {
>> > > // Example usage
>> > > r := NewPushbackReader(bytes.NewBufferString("Hello, World!"))
>> > > buf := make([]byte, 5)
>> > > r.Read(buf)
>> > > r.UnreadByte()
>> > > r.Read(buf)
>> > > }
>> > >
>> > > Explanation:
>> > > PushbackReader struct: This struct holds the underlying io.Reader and
>> a buffer to store the pushed-back bytes.
>> > > NewPushbackReader: This function creates a new PushbackReader from an
>> existing io.Reader.
>> > > Read method: This method reads bytes from either the buffer (if it
>> contains data) or the underlying reader.
>> > > UnreadByte method: This method pushes back a single byte into the
>> buffer.
>> > > Unread method: This method pushes back a slice of bytes into the
>> buffer.
>> > > Important Considerations:

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-13 Thread 'Axel Wagner' via golang-nuts
Hi,

one way to solve your problem is to wrap the body into an io.Reader that
strips off everything after the first `=` it finds. That can then be fed to
base64.RawStdEncoding. This approach requires no extra buffering or copying
and is easy to implement: https://go.dev/play/p/CwcVz7oietI

The downside is, that this will not verify that the body is *either*
correctly padded Base64 *or* unpadded Base64. So, it will not report an
error if fed something like "AAA=garbage".
That can be remedied by buffering up to four bytes and, when encountering
an EOF, check that there are at most three trailing `=` and that the total
length of the stream is divisible by four. It's more finicky to implement,
but it should also be possible without any extra copies and only requires a
very small extra buffer.

On Sun, 12 Jan 2025 at 22:29, Rory Campbell-Lange 
wrote:

> Thanks very much for the links, pointers and possible solution.
>
> Trying to read base64 standard (padded) encoded data with
> base64.RawStdEncoding can produce an error such as
>
> illegal base64 data at input byte 
>
> Reading base64 raw (unpadded) encoded data produces the EOF error.
>
> I'll go with trying to read the standard encoded data up to maybe 1MB and
> then switch to base64.RawStdEncoding if I hit the "illegal base64 data"
> problem, maybe with reference to bufio.Reader which has most of the methods
> suggested below.
>
> Yes, the use of a "Rewind" method would be crucial. I guess this would
> need to:
> 1. error if more than one buffer of data has been read
> 2. else re-read from byte 0
>
> Thanks again very much for these suggestions.
>
> Rory
>
> On 12/01/25, robert engels (reng...@ix.netcom.com) wrote:
> > Also, see this
> https://stackoverflow.com/questions/69753478/use-base64-stdencoding-or-base64-rawstdencoding-to-decode-base64-string-in-go
> as I expected the error should be reported earlier than the end of stream
> if the chosen format is wrong.
> >
> > > On Jan 12, 2025, at 2:57 PM, robert engels 
> wrote:
> > >
> > > Also, this is what Gemini provided which looks basically correct - but
> I think encapsulating it with a Rewind() method would be easier to
> understand.
> > >
> > >
> > >
> > > While Go doesn't have a built-in PushbackReader like some other
> languages (e.g., Java), you can implement similar functionality using a
> custom struct and a buffer.
> > >
> > > Here's an example implementation:
> > >
> > > package main
> > >
> > > import (
> > > "bytes"
> > > "io"
> > > )
> > >
> > > type PushbackReader struct {
> > > reader io.Reader
> > > buffer *bytes.Buffer
> > > }
> > >
> > > func NewPushbackReader(r io.Reader) *PushbackReader {
> > > return &PushbackReader{
> > > reader: r,
> > > buffer: new(bytes.Buffer),
> > > }
> > > }
> > >
> > > func (p *PushbackReader) Read(b []byte) (n int, err error) {
> > > if p.buffer.Len() > 0 {
> > > return p.buffer.Read(b)
> > > }
> > > return p.reader.Read(b)
> > > }
> > >
> > > func (p *PushbackReader) UnreadByte() error {
> > > if p.buffer.Len() == 0 {
> > > return io.EOF
> > > }
> > > lastByte := p.buffer.Bytes()[p.buffer.Len()-1]
> > > p.buffer.Truncate(p.buffer.Len() - 1)
> > > p.buffer.WriteByte(lastByte)
> > > return nil
> > > }
> > >
> > > func (p *PushbackReader) Unread(buf []byte) error {
> > > if p.buffer.Len() == 0 {
> > > return io.EOF
> > > }
> > > p.buffer.Write(buf)
> > > return nil
> > > }
> > >
> > > func main() {
> > > // Example usage
> > > r := NewPushbackReader(bytes.NewBufferString("Hello, World!"))
> > > buf := make([]byte, 5)
> > > r.Read(buf)
> > > r.UnreadByte()
> > > r.Read(buf)
> > > }
> > >
> > > Explanation:
> > > PushbackReader struct: This struct holds the underlying io.Reader and
> a buffer to store the pushed-back bytes.
> > > NewPushbackReader: This function creates a new PushbackReader from an
> existing io.Reader.
> > > Read method: This method reads bytes from either the buffer (if it
> contains data) or the underlying reader.
> > > UnreadByte method: This method pushes back a single byte into the
> buffer.
> > > Unread method: This method pushes back a slice of bytes into the
> buffer.
> > > Important Considerations:
> > > The buffer size is not managed automatically. You may need to adjust
> the buffer size based on your use case.
> > > This implementation does not handle pushing back beyond the initially
> read data. If you need to support arbitrary pushback, you'll need a more
> complex solution.
> > >
> > > Generative AI is experimental.
> > >
> > >> On Jan 12, 2025, at 2:53 PM, Robert Engels 
> wrote:
> > >>
> > >> You can see the two pass reader here
> https://stackoverflow.com/questions/20666594/how-can-i-push-bytes-into-a-reader-in-go
> > >>
> > >> But yea, the basic premise is that you buffer the data so you can
> rewind if needed
> > >>
> > >> Are you certain it is reading to the end to return EOF? It m

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-12 Thread Robert Engels
No worries - happy to help. One last thing base64 coding is fairly trivial - a 
cursory shows that the padded version uses = signs. I suspect you could write a 
decoder that handled either during the decoding. 

> On Jan 12, 2025, at 3:29 PM, Rory Campbell-Lange  
> wrote:
> 
> Thanks very much for the links, pointers and possible solution.
> 
> Trying to read base64 standard (padded) encoded data with 
> base64.RawStdEncoding can produce an error such as
> 
>illegal base64 data at input byte 
> 
> Reading base64 raw (unpadded) encoded data produces the EOF error.
> 
> I'll go with trying to read the standard encoded data up to maybe 1MB and 
> then switch to base64.RawStdEncoding if I hit the "illegal base64 data" 
> problem, maybe with reference to bufio.Reader which has most of the methods 
> suggested below.
> 
> Yes, the use of a "Rewind" method would be crucial. I guess this would need 
> to:
> 1. error if more than one buffer of data has been read
> 2. else re-read from byte 0
> 
> Thanks again very much for these suggestions.
> 
> Rory
> 
>> On 12/01/25, robert engels (reng...@ix.netcom.com) wrote:
>> Also, see this 
>> https://stackoverflow.com/questions/69753478/use-base64-stdencoding-or-base64-rawstdencoding-to-decode-base64-string-in-go
>>  as I expected the error should be reported earlier than the end of stream 
>> if the chosen format is wrong.
>> 
 On Jan 12, 2025, at 2:57 PM, robert engels  wrote:
>>> 
>>> Also, this is what Gemini provided which looks basically correct - but I 
>>> think encapsulating it with a Rewind() method would be easier to understand.
>>> 
>>> 
>>> 
>>> While Go doesn't have a built-in PushbackReader like some other languages 
>>> (e.g., Java), you can implement similar functionality using a custom struct 
>>> and a buffer.
>>> 
>>> Here's an example implementation:
>>> 
>>> package main
>>> 
>>> import (
>>>"bytes"
>>>"io"
>>> )
>>> 
>>> type PushbackReader struct {
>>>reader io.Reader
>>>buffer *bytes.Buffer
>>> }
>>> 
>>> func NewPushbackReader(r io.Reader) *PushbackReader {
>>>return &PushbackReader{
>>>reader: r,
>>>buffer: new(bytes.Buffer),
>>>}
>>> }
>>> 
>>> func (p *PushbackReader) Read(b []byte) (n int, err error) {
>>>if p.buffer.Len() > 0 {
>>>return p.buffer.Read(b)
>>>}
>>>return p.reader.Read(b)
>>> }
>>> 
>>> func (p *PushbackReader) UnreadByte() error {
>>>if p.buffer.Len() == 0 {
>>>return io.EOF
>>>}
>>>lastByte := p.buffer.Bytes()[p.buffer.Len()-1]
>>>p.buffer.Truncate(p.buffer.Len() - 1)
>>>p.buffer.WriteByte(lastByte)
>>>return nil
>>> }
>>> 
>>> func (p *PushbackReader) Unread(buf []byte) error {
>>>if p.buffer.Len() == 0 {
>>>return io.EOF
>>>}
>>>p.buffer.Write(buf)
>>>return nil
>>> }
>>> 
>>> func main() {
>>>// Example usage
>>>r := NewPushbackReader(bytes.NewBufferString("Hello, World!"))
>>>buf := make([]byte, 5)
>>>r.Read(buf)
>>>r.UnreadByte()
>>>r.Read(buf)
>>> }
>>> 
>>> Explanation:
>>> PushbackReader struct: This struct holds the underlying io.Reader and a 
>>> buffer to store the pushed-back bytes.
>>> NewPushbackReader: This function creates a new PushbackReader from an 
>>> existing io.Reader.
>>> Read method: This method reads bytes from either the buffer (if it contains 
>>> data) or the underlying reader.
>>> UnreadByte method: This method pushes back a single byte into the buffer.
>>> Unread method: This method pushes back a slice of bytes into the buffer.
>>> Important Considerations:
>>> The buffer size is not managed automatically. You may need to adjust the 
>>> buffer size based on your use case.
>>> This implementation does not handle pushing back beyond the initially read 
>>> data. If you need to support arbitrary pushback, you'll need a more complex 
>>> solution.
>>> 
>>> Generative AI is experimental.
>>> 
 On Jan 12, 2025, at 2:53 PM, Robert Engels  wrote:
 
 You can see the two pass reader here 
 https://stackoverflow.com/questions/20666594/how-can-i-push-bytes-into-a-reader-in-go
 
 But yea, the basic premise is that you buffer the data so you can rewind 
 if needed
 
 Are you certain it is reading to the end to return EOF? It may be 
 returning eof once the parsing fails.
 
 Otherwise I would expect this is being decoded wrong - eg the mime type or 
 encoding type should tell you the correct format before you start decoding.
 
> On Jan 12, 2025, at 2:46 PM, Rory Campbell-Lange 
>  wrote:
> 
> Thanks for the suggestion of a ReadSeeker to wrap an io.Reader.
> 
> My google fu must be deserting me. I can find PushbackReader 
> implementations in Java, but the only similar thing for Go I could find 
> was https://gitlab.com/osaki-lab/iowrapper. If you have a specific 
> recommendation for a ReadSeeker wrapper to an io.Reader that would be 
> great to know.
> 
> Si

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-12 Thread Rory Campbell-Lange
Thanks very much for the links, pointers and possible solution.

Trying to read base64 standard (padded) encoded data with base64.RawStdEncoding 
can produce an error such as

illegal base64 data at input byte 

Reading base64 raw (unpadded) encoded data produces the EOF error.

I'll go with trying to read the standard encoded data up to maybe 1MB and then 
switch to base64.RawStdEncoding if I hit the "illegal base64 data" problem, 
maybe with reference to bufio.Reader which has most of the methods suggested 
below.

Yes, the use of a "Rewind" method would be crucial. I guess this would need to:
1. error if more than one buffer of data has been read
2. else re-read from byte 0

Thanks again very much for these suggestions.

Rory

On 12/01/25, robert engels (reng...@ix.netcom.com) wrote:
> Also, see this 
> https://stackoverflow.com/questions/69753478/use-base64-stdencoding-or-base64-rawstdencoding-to-decode-base64-string-in-go
>  as I expected the error should be reported earlier than the end of stream if 
> the chosen format is wrong.
> 
> > On Jan 12, 2025, at 2:57 PM, robert engels  wrote:
> > 
> > Also, this is what Gemini provided which looks basically correct - but I 
> > think encapsulating it with a Rewind() method would be easier to understand.
> > 
> > 
> > 
> > While Go doesn't have a built-in PushbackReader like some other languages 
> > (e.g., Java), you can implement similar functionality using a custom struct 
> > and a buffer. 
> > 
> > Here's an example implementation: 
> > 
> > package main
> > 
> > import (
> > "bytes"
> > "io"
> > )
> > 
> > type PushbackReader struct {
> > reader io.Reader
> > buffer *bytes.Buffer
> > }
> > 
> > func NewPushbackReader(r io.Reader) *PushbackReader {
> > return &PushbackReader{
> > reader: r,
> > buffer: new(bytes.Buffer),
> > }
> > }
> > 
> > func (p *PushbackReader) Read(b []byte) (n int, err error) {
> > if p.buffer.Len() > 0 {
> > return p.buffer.Read(b)
> > }
> > return p.reader.Read(b)
> > }
> > 
> > func (p *PushbackReader) UnreadByte() error {
> > if p.buffer.Len() == 0 {
> > return io.EOF
> > }
> > lastByte := p.buffer.Bytes()[p.buffer.Len()-1]
> > p.buffer.Truncate(p.buffer.Len() - 1)
> > p.buffer.WriteByte(lastByte)
> > return nil
> > }
> > 
> > func (p *PushbackReader) Unread(buf []byte) error {
> > if p.buffer.Len() == 0 {
> > return io.EOF
> > }
> > p.buffer.Write(buf)
> > return nil
> > }
> > 
> > func main() {
> > // Example usage
> > r := NewPushbackReader(bytes.NewBufferString("Hello, World!"))
> > buf := make([]byte, 5)
> > r.Read(buf)
> > r.UnreadByte()
> > r.Read(buf)
> > }
> > 
> > Explanation: 
> > PushbackReader struct: This struct holds the underlying io.Reader and a 
> > buffer to store the pushed-back bytes. 
> > NewPushbackReader: This function creates a new PushbackReader from an 
> > existing io.Reader. 
> > Read method: This method reads bytes from either the buffer (if it contains 
> > data) or the underlying reader. 
> > UnreadByte method: This method pushes back a single byte into the buffer. 
> > Unread method: This method pushes back a slice of bytes into the buffer. 
> > Important Considerations: 
> > The buffer size is not managed automatically. You may need to adjust the 
> > buffer size based on your use case. 
> > This implementation does not handle pushing back beyond the initially read 
> > data. If you need to support arbitrary pushback, you'll need a more complex 
> > solution. 
> > 
> > Generative AI is experimental.
> > 
> >> On Jan 12, 2025, at 2:53 PM, Robert Engels  wrote:
> >> 
> >> You can see the two pass reader here 
> >> https://stackoverflow.com/questions/20666594/how-can-i-push-bytes-into-a-reader-in-go
> >> 
> >> But yea, the basic premise is that you buffer the data so you can rewind 
> >> if needed 
> >> 
> >> Are you certain it is reading to the end to return EOF? It may be 
> >> returning eof once the parsing fails. 
> >> 
> >> Otherwise I would expect this is being decoded wrong - eg the mime type or 
> >> encoding type should tell you the correct format before you start 
> >> decoding. 
> >> 
> >>> On Jan 12, 2025, at 2:46 PM, Rory Campbell-Lange 
> >>>  wrote:
> >>> 
> >>> Thanks for the suggestion of a ReadSeeker to wrap an io.Reader.
> >>> 
> >>> My google fu must be deserting me. I can find PushbackReader 
> >>> implementations in Java, but the only similar thing for Go I could find 
> >>> was https://gitlab.com/osaki-lab/iowrapper. If you have a specific 
> >>> recommendation for a ReadSeeker wrapper to an io.Reader that would be 
> >>> great to know.
> >>> 
> >>> Since the base64 decoding error I'm looking for is an EOF, I guess the 
> >>> wrapper approach will not work when the EOF byte position is > than the 
> >>> io.ReadSeeker buffer size.
> >>> 
> >>> Rory
> >>> 
> >>> On 12/01/25, robert engels (reng...@ix.netcom.com) wrote:
>  creat

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-12 Thread robert engels
Also, see this 
https://stackoverflow.com/questions/69753478/use-base64-stdencoding-or-base64-rawstdencoding-to-decode-base64-string-in-go
 as I expected the error should be reported earlier than the end of stream if 
the chosen format is wrong.

> On Jan 12, 2025, at 2:57 PM, robert engels  wrote:
> 
> Also, this is what Gemini provided which looks basically correct - but I 
> think encapsulating it with a Rewind() method would be easier to understand.
> 
> 
> 
> While Go doesn't have a built-in PushbackReader like some other languages 
> (e.g., Java), you can implement similar functionality using a custom struct 
> and a buffer. 
> 
> Here's an example implementation: 
> 
> package main
> 
> import (
> "bytes"
> "io"
> )
> 
> type PushbackReader struct {
> reader io.Reader
> buffer *bytes.Buffer
> }
> 
> func NewPushbackReader(r io.Reader) *PushbackReader {
> return &PushbackReader{
> reader: r,
> buffer: new(bytes.Buffer),
> }
> }
> 
> func (p *PushbackReader) Read(b []byte) (n int, err error) {
> if p.buffer.Len() > 0 {
> return p.buffer.Read(b)
> }
> return p.reader.Read(b)
> }
> 
> func (p *PushbackReader) UnreadByte() error {
> if p.buffer.Len() == 0 {
> return io.EOF
> }
> lastByte := p.buffer.Bytes()[p.buffer.Len()-1]
> p.buffer.Truncate(p.buffer.Len() - 1)
> p.buffer.WriteByte(lastByte)
> return nil
> }
> 
> func (p *PushbackReader) Unread(buf []byte) error {
> if p.buffer.Len() == 0 {
> return io.EOF
> }
> p.buffer.Write(buf)
> return nil
> }
> 
> func main() {
> // Example usage
> r := NewPushbackReader(bytes.NewBufferString("Hello, World!"))
> buf := make([]byte, 5)
> r.Read(buf)
> r.UnreadByte()
> r.Read(buf)
> }
> 
> Explanation: 
> PushbackReader struct: This struct holds the underlying io.Reader and a 
> buffer to store the pushed-back bytes. 
> NewPushbackReader: This function creates a new PushbackReader from an 
> existing io.Reader. 
> Read method: This method reads bytes from either the buffer (if it contains 
> data) or the underlying reader. 
> UnreadByte method: This method pushes back a single byte into the buffer. 
> Unread method: This method pushes back a slice of bytes into the buffer. 
> Important Considerations: 
> The buffer size is not managed automatically. You may need to adjust the 
> buffer size based on your use case. 
> This implementation does not handle pushing back beyond the initially read 
> data. If you need to support arbitrary pushback, you'll need a more complex 
> solution. 
> 
> Generative AI is experimental.
> 
>> On Jan 12, 2025, at 2:53 PM, Robert Engels  wrote:
>> 
>> You can see the two pass reader here 
>> https://stackoverflow.com/questions/20666594/how-can-i-push-bytes-into-a-reader-in-go
>> 
>> But yea, the basic premise is that you buffer the data so you can rewind if 
>> needed 
>> 
>> Are you certain it is reading to the end to return EOF? It may be returning 
>> eof once the parsing fails. 
>> 
>> Otherwise I would expect this is being decoded wrong - eg the mime type or 
>> encoding type should tell you the correct format before you start decoding. 
>> 
>>> On Jan 12, 2025, at 2:46 PM, Rory Campbell-Lange  
>>> wrote:
>>> 
>>> Thanks for the suggestion of a ReadSeeker to wrap an io.Reader.
>>> 
>>> My google fu must be deserting me. I can find PushbackReader 
>>> implementations in Java, but the only similar thing for Go I could find was 
>>> https://gitlab.com/osaki-lab/iowrapper. If you have a specific 
>>> recommendation for a ReadSeeker wrapper to an io.Reader that would be great 
>>> to know.
>>> 
>>> Since the base64 decoding error I'm looking for is an EOF, I guess the 
>>> wrapper approach will not work when the EOF byte position is > than the 
>>> io.ReadSeeker buffer size.
>>> 
>>> Rory
>>> 
>>> On 12/01/25, robert engels (reng...@ix.netcom.com) wrote:
 create a ReadSeeker that wraps the Reader providing the buffering (mark & 
 reset) - normally the buffer only needs to be large enough to detect the 
 format contained in the Reader.
 
 You can search Google for PushbackReader in Go and you’ll get a basic 
 implementation.
 
> On Jan 12, 2025, at 12:52 PM, Rory Campbell-Lange 
>  wrote:
>>> ...
> I'm attempting to rationalise the process [of avoiding reading email 
> parts into byte slices] by simply wrapping the provided io.Reader with 
> the necessary decoders to reduce memory usage and unnecessary processing.
> 
> The wrapping strategy seems to work ok. However there is a particular 
> issue in detecting base64.StdEncoding versus base64.RawStdEncoding, which 
> requires draining the io.Reader using base64.StdEncoding and (based on 
> the current implementation) switching to base64.RawStdEncoding if an 
> io.ErrUnexpectedEOF is found.
> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Goog

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-12 Thread robert engels
Also, this is what Gemini provided which looks basically correct - but I think 
encapsulating it with a Rewind() method would be easier to understand.



While Go doesn't have a built-in PushbackReader like some other languages 
(e.g., Java), you can implement similar functionality using a custom struct and 
a buffer. 

Here's an example implementation: 

package main

import (
"bytes"
"io"
)

type PushbackReader struct {
reader io.Reader
buffer *bytes.Buffer
}

func NewPushbackReader(r io.Reader) *PushbackReader {
return &PushbackReader{
reader: r,
buffer: new(bytes.Buffer),
}
}

func (p *PushbackReader) Read(b []byte) (n int, err error) {
if p.buffer.Len() > 0 {
return p.buffer.Read(b)
}
return p.reader.Read(b)
}

func (p *PushbackReader) UnreadByte() error {
if p.buffer.Len() == 0 {
return io.EOF
}
lastByte := p.buffer.Bytes()[p.buffer.Len()-1]
p.buffer.Truncate(p.buffer.Len() - 1)
p.buffer.WriteByte(lastByte)
return nil
}

func (p *PushbackReader) Unread(buf []byte) error {
if p.buffer.Len() == 0 {
return io.EOF
}
p.buffer.Write(buf)
return nil
}

func main() {
// Example usage
r := NewPushbackReader(bytes.NewBufferString("Hello, World!"))
buf := make([]byte, 5)
r.Read(buf)
r.UnreadByte()
r.Read(buf)
}

Explanation: 
PushbackReader struct: This struct holds the underlying io.Reader and a buffer 
to store the pushed-back bytes. 
NewPushbackReader: This function creates a new PushbackReader from an existing 
io.Reader. 
Read method: This method reads bytes from either the buffer (if it contains 
data) or the underlying reader. 
UnreadByte method: This method pushes back a single byte into the buffer. 
Unread method: This method pushes back a slice of bytes into the buffer. 
Important Considerations: 
The buffer size is not managed automatically. You may need to adjust the buffer 
size based on your use case. 
This implementation does not handle pushing back beyond the initially read 
data. If you need to support arbitrary pushback, you'll need a more complex 
solution. 

Generative AI is experimental.

> On Jan 12, 2025, at 2:53 PM, Robert Engels  wrote:
> 
> You can see the two pass reader here 
> https://stackoverflow.com/questions/20666594/how-can-i-push-bytes-into-a-reader-in-go
> 
> But yea, the basic premise is that you buffer the data so you can rewind if 
> needed 
> 
> Are you certain it is reading to the end to return EOF? It may be returning 
> eof once the parsing fails. 
> 
> Otherwise I would expect this is being decoded wrong - eg the mime type or 
> encoding type should tell you the correct format before you start decoding. 
> 
>> On Jan 12, 2025, at 2:46 PM, Rory Campbell-Lange  
>> wrote:
>> 
>> Thanks for the suggestion of a ReadSeeker to wrap an io.Reader.
>> 
>> My google fu must be deserting me. I can find PushbackReader implementations 
>> in Java, but the only similar thing for Go I could find was 
>> https://gitlab.com/osaki-lab/iowrapper. If you have a specific 
>> recommendation for a ReadSeeker wrapper to an io.Reader that would be great 
>> to know.
>> 
>> Since the base64 decoding error I'm looking for is an EOF, I guess the 
>> wrapper approach will not work when the EOF byte position is > than the 
>> io.ReadSeeker buffer size.
>> 
>> Rory
>> 
>> On 12/01/25, robert engels (reng...@ix.netcom.com) wrote:
>>> create a ReadSeeker that wraps the Reader providing the buffering (mark & 
>>> reset) - normally the buffer only needs to be large enough to detect the 
>>> format contained in the Reader.
>>> 
>>> You can search Google for PushbackReader in Go and you’ll get a basic 
>>> implementation.
>>> 
 On Jan 12, 2025, at 12:52 PM, Rory Campbell-Lange 
  wrote:
>> ...
 I'm attempting to rationalise the process [of avoiding reading email parts 
 into byte slices] by simply wrapping the provided io.Reader with the 
 necessary decoders to reduce memory usage and unnecessary processing.
 
 The wrapping strategy seems to work ok. However there is a particular 
 issue in detecting base64.StdEncoding versus base64.RawStdEncoding, which 
 requires draining the io.Reader using base64.StdEncoding and (based on the 
 current implementation) switching to base64.RawStdEncoding if an 
 io.ErrUnexpectedEOF is found.
 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com 
> .
> To view this discussion visit 
> https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com
>  
> .

-- 
You received this messag

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-12 Thread Robert Engels
You can see the two pass reader here 
https://stackoverflow.com/questions/20666594/how-can-i-push-bytes-into-a-reader-in-go

But yea, the basic premise is that you buffer the data so you can rewind if 
needed 

Are you certain it is reading to the end to return EOF? It may be returning eof 
once the parsing fails. 

Otherwise I would expect this is being decoded wrong - eg the mime type or 
encoding type should tell you the correct format before you start decoding. 

> On Jan 12, 2025, at 2:46 PM, Rory Campbell-Lange  
> wrote:
> 
> Thanks for the suggestion of a ReadSeeker to wrap an io.Reader.
> 
> My google fu must be deserting me. I can find PushbackReader implementations 
> in Java, but the only similar thing for Go I could find was 
> https://gitlab.com/osaki-lab/iowrapper. If you have a specific recommendation 
> for a ReadSeeker wrapper to an io.Reader that would be great to know.
> 
> Since the base64 decoding error I'm looking for is an EOF, I guess the 
> wrapper approach will not work when the EOF byte position is > than the 
> io.ReadSeeker buffer size.
> 
> Rory
> 
>> On 12/01/25, robert engels (reng...@ix.netcom.com) wrote:
>> create a ReadSeeker that wraps the Reader providing the buffering (mark & 
>> reset) - normally the buffer only needs to be large enough to detect the 
>> format contained in the Reader.
>> 
>> You can search Google for PushbackReader in Go and you’ll get a basic 
>> implementation.
>> 
>>> On Jan 12, 2025, at 12:52 PM, Rory Campbell-Lange  
>>> wrote:
> ...
>>> I'm attempting to rationalise the process [of avoiding reading email parts 
>>> into byte slices] by simply wrapping the provided io.Reader with the 
>>> necessary decoders to reduce memory usage and unnecessary processing.
>>> 
>>> The wrapping strategy seems to work ok. However there is a particular issue 
>>> in detecting base64.StdEncoding versus base64.RawStdEncoding, which 
>>> requires draining the io.Reader using base64.StdEncoding and (based on the 
>>> current implementation) switching to base64.RawStdEncoding if an 
>>> io.ErrUnexpectedEOF is found.
>>> 

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com.


Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-12 Thread Rory Campbell-Lange
Thanks for the suggestion of a ReadSeeker to wrap an io.Reader.

My google fu must be deserting me. I can find PushbackReader implementations in 
Java, but the only similar thing for Go I could find was 
https://gitlab.com/osaki-lab/iowrapper. If you have a specific recommendation 
for a ReadSeeker wrapper to an io.Reader that would be great to know.

Since the base64 decoding error I'm looking for is an EOF, I guess the wrapper 
approach will not work when the EOF byte position is > than the io.ReadSeeker 
buffer size.

Rory

On 12/01/25, robert engels (reng...@ix.netcom.com) wrote:
> create a ReadSeeker that wraps the Reader providing the buffering (mark & 
> reset) - normally the buffer only needs to be large enough to detect the 
> format contained in the Reader.
> 
> You can search Google for PushbackReader in Go and you’ll get a basic 
> implementation.
> 
> > On Jan 12, 2025, at 12:52 PM, Rory Campbell-Lange  
> > wrote:
...
> > I'm attempting to rationalise the process [of avoiding reading email parts 
> > into byte slices] by simply wrapping the provided io.Reader with the 
> > necessary decoders to reduce memory usage and unnecessary processing.
> > 
> > The wrapping strategy seems to work ok. However there is a particular issue 
> > in detecting base64.StdEncoding versus base64.RawStdEncoding, which 
> > requires draining the io.Reader using base64.StdEncoding and (based on the 
> > current implementation) switching to base64.RawStdEncoding if an 
> > io.ErrUnexpectedEOF is found.
> > 

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/Z4Qp1dPZGKKGU-ua%40campbell-lange.net.


Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

2025-01-12 Thread robert engels
create a ReadSeeker that wraps the Reader providing the buffering (mark & 
reset) - normally the buffer only needs to be large enough to detect the format 
contained in the Reader.

You can search Google for PushbackReader in Go and you’ll get a basic 
implementation.

> On Jan 12, 2025, at 12:52 PM, Rory Campbell-Lange  
> wrote:
> 
> I'm looking to develop an alternative to an existing piece of code that reads 
> email parts into byte slices and then returns these after decoding.
> 
> As library users may not wish to use these email parts and because there a 
> multiple byte slice copies being used, I'm attempting to rationalise the 
> process by simply wrapping the provided io.Reader with the necessary decoders 
> to reduce memory usage and unnecessary processing.
> 
> The wrapping strategy seems to work ok. However there is a particular issue 
> in detecting base64.StdEncoding versus base64.RawStdEncoding, which requires 
> draining the io.Reader using base64.StdEncoding and (based on the current 
> implementation) switching to base64.RawStdEncoding if an io.ErrUnexpectedEOF 
> is found.
> 
> I'd be grateful for any thoughts on the most efficient way of dealing with 
> this type of issue, avoiding the need for lots of in-memory copies of -- say 
> -- a 50MB email attachment. Unfortunately neither net/mail.Message.Body or 
> mime/multipart.Part, which provide the input to this func, provide 
> ReadSeekers.
> 
> Code snippet below.
> 
> Thanks!
> Rory
> 
> 
>// decodeContent wraps the content io.Reader in either a base64 or
>// quoted printable decoder if applicable. It further wraps the reader
>// in a transform character decoder if an encoding is supplied.
>func decodeContent(content io.Reader, e encoding.Encoding, cte 
> ContentTransferEncoding) io.Reader {
> 
>var contentReader io.Reader
> 
>switch cte {
>case cteBase64:
> 
>contentReader = base64.NewDecoder(base64.StdEncoding, content)
>// ideally check for errors.Is(err, io.ErrUnexpectedEOF); switch 
> decoder to
>// contentReader = base64.NewDecoder(base64.RawStdEncoding, 
> content)
> 
>case cteQuotedPrintable:
>contentReader = quotedprintable.NewReader(content)
>default:
>contentReader = content
>}
> 
>if e == nil {
>return contentReader
>}
>return transform.NewReader(contentReader, e.NewDecoder())
>}
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion visit 
> https://groups.google.com/d/msgid/golang-nuts/Z4QPbTZ4gemg9kwV%40campbell-lange.net.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/644FC184-CC66-4838-8B35-7F0D926AB52D%40ix.netcom.com.