Re: .enc docs comments [was Re: Encode's .enc files and a question]

2000-10-27 Thread Mark Leisher
Philip> On Thu, 26 Oct 2000, Mark Leisher wrote: >> Following the first page will be all the other pages, each in the same >> format as the first: one number identifying the page followed by 256 >> double-byte Unicode (UCS-2) characters. If a character in the encoding >> maps

Re: .enc docs comments [was Re: Encode's .enc files and a question]

2000-10-27 Thread Nick Ing-Simmons
Philip Newton <[EMAIL PROTECTED]> writes: >On Thu, 26 Oct 2000, Mark Leisher wrote: > >> Following the first page will be all the other pages, each in the same >> format as the first: one number identifying the page followed by 256 >> double-byte Unicode (UCS-2) characters. If a character i

Re: .enc docs comments [was Re: Encode's .enc files and a question]

2000-10-27 Thread Philip Newton
On Thu, 26 Oct 2000, Mark Leisher wrote: > Following the first page will be all the other pages, each in the same > format as the first: one number identifying the page followed by 256 > double-byte Unicode (UCS-2) characters. If a character in the encoding maps > to the Unicode characte

Re: Encode's .enc files and a question

2000-10-27 Thread Philip Newton
On Thu, 26 Oct 2000, Peter Prymmer wrote: > According to Nick's translated doc the first character on the third line > of the .enc file is the one to be displayed if the Encode module cannot > figure out what to do with a given character. In iso8859-1.enc we > see: > > # Encoding file: iso8859-

Re: .enc docs comments [was Re: Encode's .enc files and a question]

2000-10-26 Thread Nick Ing-Simmons
Peter Prymmer <[EMAIL PROTECTED]> writes: > >Or one could use the source text "" to *really* indicate a "hex" value >that does not map to a character (I am being more than a little facetious >here ;-). I quite like that idea. > >How firmly established is the Tcl scheme? Has not changed in

Re: Encode's .enc files and a question

2000-10-26 Thread Nick Ing-Simmons
Mark Leisher <[EMAIL PROTECTED]> writes: >Peter> Mark Leisher then replied: > >>> If the converted string contains 0x, it will be pretty clear the >>> source text had bogus characters the moment you display it. > >Peter> According to Nick's translated doc the first character on

Re: Encode's .enc files and a question

2000-10-26 Thread Nick Ing-Simmons
Peter Prymmer <[EMAIL PROTECTED]> writes: > >So shall I go ahead with a cp1047.enc plus cp37.enc and posix-bc.enc >patch and perhaps some additions to t/lib/encode.t ? Yes please - even if/when we replace the Tcl stuff with something faster more rigourous it should be easy enough to massage the t

Re: Encode's .enc files and a question

2000-10-26 Thread Nick Ing-Simmons
Mark Leisher <[EMAIL PROTECTED]> writes: >Nick> Following the first page will be all the other pages, each in the >Nick> same format as the first: one number identifying the page followed >Nick> by 256 double-byte Unicode characters. If a character in the >Nick> encoding maps to t

Re: .enc docs comments [was Re: Encode's .enc files and a question]

2000-10-26 Thread Mark Leisher
Peter> Uncomfortable to say the least. Could a surrogate scalar encoding Peter> be done as an escaped encoding where the high and low pairs are put Peter> into the .enc files as where both H and L =~ /[0-9A-F]/? Peter> hence necessitating a shift to reading 8 characters

Re: .enc docs comments [was Re: Encode's .enc files and a question]

2000-10-26 Thread Peter Prymmer
On Thu, 26 Oct 2000, Mark Leisher wrote: [fine suggestions snipped] > Again, UCS-2 is implicit by the restriction of 256 two-byte values and should > be stated as such. Uncomfortable to say the least. Could a surrogate scalar encoding be done as an escaped encoding where the high and low pai

.enc docs comments [was Re: Encode's .enc files and a question]

2000-10-26 Thread Mark Leisher
Out of the documentation that Nick sent, the following three paragraphs need changing (reasons below each paragraph): The third line of the file is three numbers. The first number is the fallback character (in base 16) to use when converting from Unicode to this encoding. The second numbe

Re: Encode's .enc files and a question

2000-10-26 Thread Mark Leisher
Peter> Mark Leisher then replied: >> If the converted string contains 0x, it will be pretty clear the >> source text had bogus characters the moment you display it. Peter> According to Nick's translated doc the first character on the third Peter> line of the .enc file is

Re: Encode's .enc files and a question

2000-10-26 Thread Peter Prymmer
On Thu, 26 Oct 2000, Philip Newton wrote: > On Wed, 25 Oct 2000, Mark Leisher wrote: > > > There may some day be a use for the Unicode codepoint 0x. It might be > > better to make this 0x, which is a guaranteed non-character in Unicode and > > probably in ISO10646. > > Isn't that the

Re: Encode's .enc files and a question

2000-10-26 Thread Mark Leisher
Philip> On Wed, 25 Oct 2000, Mark Leisher wrote: >> There may some day be a use for the Unicode codepoint 0x. It might >> be better to make this 0x, which is a guaranteed non-character in >> Unicode and probably in ISO10646. Philip> Isn't that the natural character t

Re: Encode's .enc files and a question

2000-10-26 Thread Philip Newton
On Wed, 25 Oct 2000, Mark Leisher wrote: > There may some day be a use for the Unicode codepoint 0x. It might be > better to make this 0x, which is a guaranteed non-character in Unicode and > probably in ISO10646. Isn't that the natural character to use for null-terminated strings? For

Re: Encode's .enc files and a question

2000-10-25 Thread Peter Prymmer
On Wed, 25 Oct 2000, Philip Newton wrote: > I didn't read up on the format, but I would gess that this maps from > EBCDIC position to Unicode in this way: take the EBCDIC code point and > treat it as an index into an array of four-character Unicode code points. > In which case, your table looks

Re: Encode's .enc files and a question

2000-10-25 Thread Mark Leisher
Nick> Following the first page will be all the other pages, each in the Nick> same format as the first: one number identifying the page followed Nick> by 256 double-byte Unicode characters. If a character in the Nick> encoding maps to the Unicode character , it means that the

Re: Encode's .enc files and a question

2000-10-25 Thread Nick Ing-Simmons
Philip Newton <[EMAIL PROTECTED]> writes: >On 25 Oct 2000, at 15:32, Nick Ing-Simmons wrote: > >> The next line >> identifies the type of encoding file. It can be one of the following >> letters: >> >> =item "[1] >> >> =item "[2] >> >> =item "[3] >> >> =item "[4] > >You seem to have dropped t

Re: Encode's .enc files and a question

2000-10-25 Thread Philip Newton
On 25 Oct 2000, at 15:32, Nick Ing-Simmons wrote: > The next line > identifies the type of encoding file. It can be one of the following > letters: > > =item "[1] > > =item "[2] > > =item "[3] > > =item "[4] You seem to have dropped the letters in the transcoding nroff-to-pod, which is bad

Re: Encode's .enc files and a question

2000-10-25 Thread Mark Leisher
Nick> I would be delighted if people start fixing or improving the Nick> prototype - but we really want to prove that the API is "suitable" Nick> for actual use (by XS modules like Tk, PerlIO, EBCDIC, ...). I have been itching to implement this myself for quite a while now. But like

Re: Encode's .enc files and a question

2000-10-25 Thread Nick Ing-Simmons
Mark Leisher <[EMAIL PROTECTED]> writes: >Peter> Also: since the .enc files seem to have adopted the four hex digit >Peter> per code point format how is the Encode module going to handle >Peter> UTF16 surrogates? > >I haven't looked into the format for .enc files, but another thing tha

Re: Encode's .enc files and a question

2000-10-25 Thread Mark Leisher
Peter> Also: since the .enc files seem to have adopted the four hex digit Peter> per code point format how is the Encode module going to handle Peter> UTF16 surrogates? I haven't looked into the format for .enc files, but another thing that happens for example, is more that a single

Re: Encode's .enc files and a question

2000-10-25 Thread Nick Ing-Simmons
Peter Prymmer <[EMAIL PROTECTED]> writes: >Hi, > >I've finally been looking at the Encode module and I am >somewhat perplexed by the stuff at the head of the Encode/*.enc >files. The Tcl documentaion needs PODifying or some such. Attached is a 1st stab at this generated by hacking at the Tcl n

Re: Encode's .enc files and a question

2000-10-25 Thread Philip Newton
On Tue, 24 Oct 2000, Peter Prymmer wrote: > I am curious about the viability of an EBCDIC based .enc file so > I took the Encode/iso8859-1.enc and came up with one that I > might call Encode/cp1047.enc. Would this be the correct form/format? > If so I can prepare this and a cp37.enc and a posix-