Philip> On Thu, 26 Oct 2000, Mark Leisher wrote:
>> Following the first page will be all the other pages, each in the same
>> format as the first: one number identifying the page followed by 256
>> double-byte Unicode (UCS-2) characters. If a character in the encoding
>> maps
Philip Newton <[EMAIL PROTECTED]> writes:
>On Thu, 26 Oct 2000, Mark Leisher wrote:
>
>> Following the first page will be all the other pages, each in the same
>> format as the first: one number identifying the page followed by 256
>> double-byte Unicode (UCS-2) characters. If a character i
On Thu, 26 Oct 2000, Mark Leisher wrote:
> Following the first page will be all the other pages, each in the same
> format as the first: one number identifying the page followed by 256
> double-byte Unicode (UCS-2) characters. If a character in the encoding maps
> to the Unicode characte
On Thu, 26 Oct 2000, Peter Prymmer wrote:
> According to Nick's translated doc the first character on the third line
> of the .enc file is the one to be displayed if the Encode module cannot
> figure out what to do with a given character. In iso8859-1.enc we
> see:
>
> # Encoding file: iso8859-
Peter Prymmer <[EMAIL PROTECTED]> writes:
>
>Or one could use the source text "" to *really* indicate a "hex" value
>that does not map to a character (I am being more than a little facetious
>here ;-).
I quite like that idea.
>
>How firmly established is the Tcl scheme?
Has not changed in
Mark Leisher <[EMAIL PROTECTED]> writes:
>Peter> Mark Leisher then replied:
>
>>> If the converted string contains 0x, it will be pretty clear the
>>> source text had bogus characters the moment you display it.
>
>Peter> According to Nick's translated doc the first character on
Peter Prymmer <[EMAIL PROTECTED]> writes:
>
>So shall I go ahead with a cp1047.enc plus cp37.enc and posix-bc.enc
>patch and perhaps some additions to t/lib/encode.t ?
Yes please - even if/when we replace the Tcl stuff with something faster
more rigourous it should be easy enough to massage the t
Mark Leisher <[EMAIL PROTECTED]> writes:
>Nick> Following the first page will be all the other pages, each in the
>Nick> same format as the first: one number identifying the page followed
>Nick> by 256 double-byte Unicode characters. If a character in the
>Nick> encoding maps to t
Peter> Uncomfortable to say the least. Could a surrogate scalar encoding
Peter> be done as an escaped encoding where the high and low pairs are put
Peter> into the .enc files as where both H and L =~ /[0-9A-F]/?
Peter> hence necessitating a shift to reading 8 characters
On Thu, 26 Oct 2000, Mark Leisher wrote:
[fine suggestions snipped]
> Again, UCS-2 is implicit by the restriction of 256 two-byte values and should
> be stated as such.
Uncomfortable to say the least. Could a surrogate scalar encoding be
done as an escaped encoding where the high and low pai
Out of the documentation that Nick sent, the following three paragraphs need
changing (reasons below each paragraph):
The third line of the file is three numbers. The first number is the
fallback character (in base 16) to use when converting from Unicode to this
encoding. The second numbe
Peter> Mark Leisher then replied:
>> If the converted string contains 0x, it will be pretty clear the
>> source text had bogus characters the moment you display it.
Peter> According to Nick's translated doc the first character on the third
Peter> line of the .enc file is
On Thu, 26 Oct 2000, Philip Newton wrote:
> On Wed, 25 Oct 2000, Mark Leisher wrote:
>
> > There may some day be a use for the Unicode codepoint 0x. It might be
> > better to make this 0x, which is a guaranteed non-character in Unicode and
> > probably in ISO10646.
>
> Isn't that the
Philip> On Wed, 25 Oct 2000, Mark Leisher wrote:
>> There may some day be a use for the Unicode codepoint 0x. It might
>> be better to make this 0x, which is a guaranteed non-character in
>> Unicode and probably in ISO10646.
Philip> Isn't that the natural character t
On Wed, 25 Oct 2000, Mark Leisher wrote:
> There may some day be a use for the Unicode codepoint 0x. It might be
> better to make this 0x, which is a guaranteed non-character in Unicode and
> probably in ISO10646.
Isn't that the natural character to use for null-terminated strings? For
On Wed, 25 Oct 2000, Philip Newton wrote:
> I didn't read up on the format, but I would gess that this maps from
> EBCDIC position to Unicode in this way: take the EBCDIC code point and
> treat it as an index into an array of four-character Unicode code points.
> In which case, your table looks
Nick> Following the first page will be all the other pages, each in the
Nick> same format as the first: one number identifying the page followed
Nick> by 256 double-byte Unicode characters. If a character in the
Nick> encoding maps to the Unicode character , it means that the
Philip Newton <[EMAIL PROTECTED]> writes:
>On 25 Oct 2000, at 15:32, Nick Ing-Simmons wrote:
>
>> The next line
>> identifies the type of encoding file. It can be one of the following
>> letters:
>>
>> =item "[1]
>>
>> =item "[2]
>>
>> =item "[3]
>>
>> =item "[4]
>
>You seem to have dropped t
On 25 Oct 2000, at 15:32, Nick Ing-Simmons wrote:
> The next line
> identifies the type of encoding file. It can be one of the following
> letters:
>
> =item "[1]
>
> =item "[2]
>
> =item "[3]
>
> =item "[4]
You seem to have dropped the letters in the transcoding nroff-to-pod,
which is bad
Nick> I would be delighted if people start fixing or improving the
Nick> prototype - but we really want to prove that the API is "suitable"
Nick> for actual use (by XS modules like Tk, PerlIO, EBCDIC, ...).
I have been itching to implement this myself for quite a while now. But like
Mark Leisher <[EMAIL PROTECTED]> writes:
>Peter> Also: since the .enc files seem to have adopted the four hex digit
>Peter> per code point format how is the Encode module going to handle
>Peter> UTF16 surrogates?
>
>I haven't looked into the format for .enc files, but another thing tha
Peter> Also: since the .enc files seem to have adopted the four hex digit
Peter> per code point format how is the Encode module going to handle
Peter> UTF16 surrogates?
I haven't looked into the format for .enc files, but another thing that
happens for example, is more that a single
Peter Prymmer <[EMAIL PROTECTED]> writes:
>Hi,
>
>I've finally been looking at the Encode module and I am
>somewhat perplexed by the stuff at the head of the Encode/*.enc
>files.
The Tcl documentaion needs PODifying or some such.
Attached is a 1st stab at this generated by hacking at the Tcl n
On Tue, 24 Oct 2000, Peter Prymmer wrote:
> I am curious about the viability of an EBCDIC based .enc file so
> I took the Encode/iso8859-1.enc and came up with one that I
> might call Encode/cp1047.enc. Would this be the correct form/format?
> If so I can prepare this and a cp37.enc and a posix-
24 matches
Mail list logo