Neil Hodgson wrote:

Robert Roessler:

Working with its "sibling" function SCI_ENCODEDFROMUTF8, is it "safe"
(i.e., CAN NOT fail) to allocate the output buf @ the SAME size as the
input utf8 buf?


   The only encoding I know of that could be a problem is EUC-JP where
there are 3 byte representations of some characters some of which may
only be 2 byte in UTF-8. Scintilla prefers ShiftJIS but EUC-JP may be
supported if the locale is set up for it. I am not aware of all the
tricks that can be played with locales in an application so have not
tried to tightly define what expansion could appear here. For SciTE
I'm happy with expecting 1:1 and handling fault reports if they occur.
For a reasonable degree of safety you could allocate the same 3*+1 as
for SCI_TARGETASUTF8.

http://www.rikai.com/library/kanjitables/kanji_codes.euc.shtml

Thanks, Neil - since I am already accepting doing "double allocates" (allocate conservatively, then allocate exactly and copy when you know the length), this should not be particularly obnoxious... and hey, memory management is free in the Caml runtime, so why worry? :)

but do not have access to
any value that has been set by SCI_SETLENGTHFORENCODE


   The intent was that SCI_SETLENGTHFORENCODE+SCI_ENCODEDFROMUTF8 is
really a single call: the break up into two is caused by the Scintilla
interface only allowing two parameters.

That is believable - I am looking at it from the standpoint of my Caml Scintilla binding - where I really do not want to stash state of my own (i.e., remember that a) there *was* a SCI_SETLENGTHFORENCODE and b) what value it set). :)

Robert Roessler
[EMAIL PROTECTED]
http://www.rftp.com
_______________________________________________
Scintilla-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scintilla-interest

Reply via email to