On 7/25/07, Stefano Bagnara <[EMAIL PROTECTED]> wrote:
> I understand this is a long message and I made many question, so: if
> this does not help your research just ignore it and go ahead with your
> ideas. I'll review the code ;-)

it's good to talk things through (anyone who isn't interested will
probably have stopped reading this thread by now)

> >> Not sure I understand the problem. Can't we ignore the encoding issue,
> >> at all? The important thing is that the API uses a string and a string
> >> always can contain a 7bit sequence in a lossless way. If you write such
> >> string to bytes using the US-ASCII charset the result will be unchanged,
> >> right?
> >
> > if the string contains only US-ACSII then yes, the transformation will
> > be lossless
>
> Well, the String object is only a "container" large enough for our
> purpose. In OOP we often use an Integer to pass data that should be a
> subset of an integer. The important fact is that if the meaning of the
> data we want to transfer is kept.
> That's why we can use the string and simply do a parameter check to see
> it is really an US-ASCII only sequence or we can use anything else. IMHO
> the choice does not depend on the charset support of the String object,
> but the easy of use. You are developing the API, you are more entitled
> to decide whether a byte[] is better than String.

designing good APIs is too hard to be left to one developer

> > my point is that by including a string in the API the caller is forced
> > to decode the natural representation (bytes) to a string which will
> > then be encoded to bytes by the cursor implementation. this approach
> > seems wrong to me.
>
> Well, bytes are the natural representation for every information we
> manage in IT ;-)
>
> My point is that String have very convenient methods and they are really
> well optimized in the JVM, so maybe sometimes String handling is not so
> worse than manual byte handling but they are more usable than byte-arrays.

depends on how the caller has the data

> FWIW you can also introduce a "Boundary" object so that implementation
> can be optimized without altering the API.

or introduce a helper method for CharSequence

> >> (if you had non US-ASCII they will be instead converted to "?").
> >
> > that depends on the way the encoding is done
> >
> > String.getBytes() is JVM and charset dependent
>
> shouldn't getBytes("US-ASCII") work always fine for a String including
> 7bit only chars and use "?" for chars outside the 7bit ?

no - the javadocs specify that the behaviour is undefined

for MIME boundaries, IMHO the right behaviour would be to throw an
exception (rather than converting) so this means using the more
reliable nio charset encoders

> > using the more flexible nio encoders, then bad characters can be
> > reported, ignored or replaced
>
> Not sure I understand this point: do we need to recognize/ignore/replace
> bad chars in the Boundary wrt to that api call?

the more flexible nio charset encoders all the conversion behaviour to
be set programmatically

> >> The only problems are when we try to use non US-ASCII chars as a
> >> boundary, but this should not be allowed as it is an illegal argument:
> >> maybe we may want to check this in the
> >> public·void·boundary(String·boundary)·throws·IOException. Maybe a throw
> >> a new IllegalArgumentException on a boundary including non US-ASCII
> >> chars is enough (maybe a check for "?" presence is enough).
> >
> > throwing an exception does seem reasonable
> >
> > i prefer to offer subclasses for cases such as this so that they can
> > be caught and (perhaps) dealt with
> >
> > i generally prefer checked to runtime exceptions but perhaps an
> > IOException may be wrong here
>
> IMHO the specific check is an argument validity check and an
> IllegalArgumentException better fits in. I see IOException more related
> to IO problems and not related to content/argument.
> Btw I'm also fine with IOException, and as you are the one with the
> dirty hands now, you should decide, IMHO ;-)

throwing runtime exceptions has downsides when running in many containers

> >> Passing byte
> >> sequences IMHO would not solve the issue as you would have to check the
> >> 8th bit anyway.
> >
> > true but the check is much quicker and the failure more precise
>
> I agree. It is a tradeoff of easy of use vs speed/precision. In my
> understanding we didn't need *that* speed and precision for the
> boundary, but I don't know exactly what code you're talking about, so
> I'm fine with the low level operations too.

there are existing performance worries about mime4j and i'd like to
try to avoid baking any more into the API (if possible)

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to