Re: [mime4j] Please Review Cursor API

Stefano Bagnara Wed, 25 Jul 2007 06:45:08 -0700

Robert Burrell Donkin ha scritto:
> On 7/25/07, Stefano Bagnara <[EMAIL PROTECTED]> wrote:
>> I'd go for an exception. But I don't know the code enough to understand
>> how likely this will happen and how likely this is a programmer error or
>> something else.
> 
> AFAICT it would be an implementation error
> 
> state is maintained in both the pull parser and the cursor
> 
> the cursor needs to understand whether it is within a part in a mime
> message or within not since the input stream reads only within a part.
> the pull parser also records this information.
> 
> would probably be cleaner to maintain this is one place. ideas welcomed.


what about adding a Cursor.isInMimePart() or something similar?

>> > 3. the API uses a string to represent the MIME boundary. i'm not sure
>> > that this is right. AIUI (hopefully people will correct me if i'm
>> > wrong) this can only be 8 bit ASCII characters. in general, passing a
>> > string should mean worrying about encoding. realistically, the string
>> > will just be stripped to it's low order bytes.
>> >
>> > - robert
>>
>> Why 8 bit ASCII ? Shouldn't it be 7 bit ASCII? The first 7 bit of the
>> US-ASCII should be present in every encoding, right?
> 
> sorry: forgot that 7-bit, 8-bit has special meaning in the email context
> 
> AIUI the boundary consists of ASCII each encoded as one 8-bit byte
> with one clean bit. java strings (and chars) are UNICODE. this is
> usually encoded as two 8-bit bytes (no clean bits), one 16-bit byte
> (no clean bits) or variable (one, two or three) 8-bit bytes.
> 
> accepting a string might require a byte in the input to be decoded to
> a char then encoded to a byte to be used to compare the boundary.
> 
> an alternative strategy would be to push enough intelligence into the
> cursor for it to be able to work out MIME and header boundaries for
> itself.
> 
> - robert

Not sure I understand the problem. Can't we ignore the encoding issue,
at all? The important thing is that the API uses a string and a string
always can contain a 7bit sequence in a lossless way. If you write such
string to bytes using the US-ASCII charset the result will be unchanged,
right? (if you had non US-ASCII they will be instead converted to "?").

The only problems are when we try to use non US-ASCII chars as a
boundary, but this should not be allowed as it is an illegal argument:
maybe we may want to check this in the
public·void·boundary(String·boundary)·throws·IOException. Maybe a throw
a new IllegalArgumentException on a boundary including non US-ASCII
chars is enough (maybe a check for "?" presence is enough). Passing byte
sequences IMHO would not solve the issue as you would have to check the
8th bit anyway.

The details depends mainly on the usage of the boundary by the
underlying system: if the system works with bytes then maybe it is ok to
use bytes also for the boundary method, otherwise IMHO it's safe to keep
using the String (and maybe add the argument check).

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [mime4j] Please Review Cursor API

Reply via email to