Re: base64 encoded content

Jason Robertson 12 Feb 2001 20:51:54 -0000

Stupid Outlook Express, now where was I ....

Jeffrey, the issue with the base64 stuff is that when the size of the
decoded output is determined, no accounting is made for possible pad
characters, and for some reason +1 is added to the size always.


If I had to guess, the +1 is a remnant from a day when the decode returned a
string that was NULL appended, perhaps this code was a direct port from some
C code? This should simple just be removed.

The other problem requires a peek to the end of the incoming buffer to see
if the last, or the second to the last character is the pad character.
Here's how you could do this in the Xerces code:

    int numPadChars = 0;

    if ( base64Data[numberQuadruple-2] == PAD )
    {
        numPadChars = 2;
    }
    else if ( base64Data[numberQuadruple-1] == PAD )
    {
        numPadChars = 1;
    }

    decodedData = new byte[ numberQuadruple*3 - numPadChars ];

Also, to be picky :), there is a comment there that says "Throw away
anything not in base64Data" and this isn't done. The variables b1, b2, b3
and b4 are all obtained by getting a value from the base64Alphabet array,
and this value could be "-1" indicating a bad value. This is never checked.

Checking does become a pain, however, because bad characters are to be
ignored, and if you ignore a bad character then your output length changes.
So, it might be proper to note how many characters were skipped, including
pad characters, and adjust at the end, copying the decoded data into a
properly sized array, but all that copying makes my efficiency nerve twitch.

I say just remove the comment. ;)

Jason

----- Original Message -----
From: "Jeffrey Rodriguez" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, February 09, 2001 12:32 PM
Subject: Re: base64 encoded content


> Hi Holger,
> This is normal SAX behavior, look at the SAX specification. It says
clearly
> that the character content may be return in through multiple calls.
>
> From the DocumentHandler API Doc:
>
> "The Parser will call this method to report each chunk of character data.
> SAX parsers may return all contiguous character data in a single chunk, or
> they may split it into several chunks; however, all of the characters in
any
> single event must come from the same external entity, so that the Locator
> provides useful information."
>
> For more info, tutorials, etc. I would recommend you to look at Meggisons
> page ( The author of SAX API).
>
> at:  http://www.megginson.com/SAX/SAX1/index.html
>
> Good luck,
>
>                    Jeffrey Rodriguez
>                    IBM SVL - Pervasive Technologies
>
> PS. I'll take a look over the Base64 stuff this weekend since it is the
only
> time when I can work on this.
>
>
>
>
>
>
> >From: [EMAIL PROTECTED]
> >Reply-To: [EMAIL PROTECTED]
> >To: [EMAIL PROTECTED]
> >Subject: Re: base64 encoded content
> >Date: Fri, 9 Feb 2001 12:27:32 +0100
> >
> >
> >
> >Thanks Jeffrey,
> >
> >but do you or anyone else have any idea about my second question? I still
> >have no idea why the parser splits my content into three character
parts...
> >sorry if this is a stupid question, but I have no explaination for it.
> >
> >Holger
> >
> >
> >Hi Holger,
> >I will take a look at this, it is quite possible a bug.
> >Thanks for reporting this,
> >
> >             Jeffrey Rodriguez
> >             IBM Silicon Valley Lab
> >             PDA
> >
> >
> > >From: [EMAIL PROTECTED]
> > >Reply-To: [EMAIL PROTECTED]
> > >To: [EMAIL PROTECTED]
> > >Subject: base64 encoded content
> > >Date: Thu, 8 Feb 2001 16:24:41 +0100
> > >
> > >
> > >
> > >Hi,
> > >
> > >I'm using the Base64 class to encode and decode binary data in xml
files
> > >and encountered some strange things I hope someone can explain.
> > >
> > >If I'm encoding a byte array with a size of 33552 bytes I get an array
of
> > >the length 44736, which seems correct to me. Decoding it returns an
array
> > >with 33553 bytes - one more than the original byte array. This looks
like
> >a
> > >bug, am I right? If I'm right, does anyone has fixed it already?
> > >
> > >The second strange effect occurs while parsing the xml document with
the
> > >encoded data. The document looks like that:
> > >
> > >      <content>/KJjndlkh/lhjnlkj...</content>
> > >
> > >The SAX parser splits my content into three parts, invoking the
> > >"characters" method three times. Am I doing something wrong?
> > >
> > >Thanks for any idea!
> > >Holger
> > >
> > >
> > >
> > >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: [EMAIL PROTECTED]
> > >For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> >
> >_________________________________________________________________
> >Get your FREE download of MSN Explorer at http://explorer.msn.com
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
>
> _________________________________________________________________
> Get your FREE download of MSN Explorer at http://explorer.msn.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

Re: base64 encoded content

Reply via email to