Re: [mime4j] Please Review Cursor API

Robert Burrell Donkin Wed, 25 Jul 2007 10:33:22 -0700

On 7/25/07, Stefano Bagnara <[EMAIL PROTECTED]> wrote:

Robert Burrell Donkin ha scritto:
> On 7/25/07, Stefano Bagnara <[EMAIL PROTECTED]> wrote:
>> I'd go for an exception. But I don't know the code enough to understand
>> how likely this will happen and how likely this is a programmer error or
>> something else.
>
> AFAICT it would be an implementation error
>
> state is maintained in both the pull parser and the cursor
>
> the cursor needs to understand whether it is within a part in a mime
> message or within not since the input stream reads only within a part.
> the pull parser also records this information.
>
> would probably be cleaner to maintain this is one place. ideas welcomed.


what about adding a Cursor.isInMimePart() or something similar?


not sure it would be so simple as that. the cursor would probably need
to become a first pass parser.

the cursor would need to perform basic parsing of the email to find
the appropriate mime headers and so the appropriate boundary. it would
be possible to model the API so that the cursor performed basic
non-recursive pull parsing (header lines, parts but not part headers).

>> > 3. the API uses a string to represent the MIME boundary. i'm not sure
>> > that this is right. AIUI (hopefully people will correct me if i'm
>> > wrong) this can only be 8 bit ASCII characters. in general, passing a
>> > string should mean worrying about encoding. realistically, the string
>> > will just be stripped to it's low order bytes.
>> >
>> > - robert
>>
>> Why 8 bit ASCII ? Shouldn't it be 7 bit ASCII? The first 7 bit of the
>> US-ASCII should be present in every encoding, right?
>
> sorry: forgot that 7-bit, 8-bit has special meaning in the email context
>
> AIUI the boundary consists of ASCII each encoded as one 8-bit byte
> with one clean bit. java strings (and chars) are UNICODE. this is
> usually encoded as two 8-bit bytes (no clean bits), one 16-bit byte
> (no clean bits) or variable (one, two or three) 8-bit bytes.
>
> accepting a string might require a byte in the input to be decoded to
> a char then encoded to a byte to be used to compare the boundary.
>
> an alternative strategy would be to push enough intelligence into the
> cursor for it to be able to work out MIME and header boundaries for
> itself.
>
> - robert

Not sure I understand the problem. Can't we ignore the encoding issue,
at all? The important thing is that the API uses a string and a string
always can contain a 7bit sequence in a lossless way. If you write such
string to bytes using the US-ASCII charset the result will be unchanged,
right?


if the string contains only US-ACSII then yes, the transformation will
be lossless

my point is that by including a string in the API the caller is forced
to decode the natural representation (bytes) to a string which will
then be encoded to bytes by the cursor implementation. this approach
seems wrong to me.

(if you had non US-ASCII they will be instead converted to "?").


that depends on the way the encoding is done

String.getBytes() is JVM and charset dependent

using the more flexible nio encoders, then bad characters can be
reported, ignored or replaced

The only problems are when we try to use non US-ASCII chars as a
boundary, but this should not be allowed as it is an illegal argument:
maybe we may want to check this in the
public·void·boundary(String·boundary)·throws·IOException. Maybe a throw
a new IllegalArgumentException on a boundary including non US-ASCII
chars is enough (maybe a check for "?" presence is enough).


throwing an exception does seem reasonable

i prefer to offer subclasses for cases such as this so that they can
be caught and (perhaps) dealt with

i generally prefer checked to runtime exceptions but perhaps an
IOException may be wrong here

Passing byte
sequences IMHO would not solve the issue as you would have to check the
8th bit anyway.


true but the check is much quicker and the failure more precise

there are various way that an encoding might fail and there would be
effort involved in determining the exact cause

The details depends mainly on the usage of the boundary by the
underlying system: if the system works with bytes then maybe it is ok to
use bytes also for the boundary method, otherwise IMHO it's safe to keep
using the String (and maybe add the argument check).


MIME works with 8-bit bytes not 16-bit UNICODE so bytes are the
natural way of representing boundaries in java

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [mime4j] Please Review Cursor API

Reply via email to