On 25/01/14 01:22, Peter W A Wood wrote:
Richmond

It is almost impossible to determine the encoding of text from the contents of 
the text. You can take educated guesses but when even just considering four 
different encodings that is tricky.

You can get an idea of the complexity by taking a quick look at the encoding? 
function in this REBOL script. (You should be able to find the function as 
there is a big banner with encoding? at the top of it.) The script counts 
characters that are likely to be in one encoding but not in another. For 
instance, presence of characters 129, 141, 144 and 157 give a hint that the 
text is MacRoman encoded.

Um? Where is the REBOL script?

Richmond.


Regards

Peter


On 25 Jan 2014, at 02:54, Richmond wrote:

On 22/01/14 20:41, Graham Samuel wrote:
Richmond, thanks for inching my problem towards a solution. I downloaded your 
test.
Clever, in fact too clever for me.
Possibly, but NOT clever enough . . .

I would like an easy way to know what character encoding is being used in a 
textField:

NOT just whether it is Unicode or Not:

There are all sorts of variable such as

fontLanguage  [I have never quite worked out how that jives with Unicode],

MacCyrillic,

and so on, ad nauseam.

------

For the sake of argument, and at the risk of repeating myself:

I managed to resurrect a 120 page 'thing' of my wife's, written in mixed 
English and Bulgarian on
Mac OS 9 when Mac OS 9 was all the rage.

In the end . . . after a lot of blood, sweat, tears and incredibly coarse 
remarks, I manged to turn it into
a PDF with an embedded text layer .  . . allowing, at least, the English to be 
directly transferred into an ODT
document.

However my wife will still have the "joy" of having to retype all the Bulgarian 
and all the other bits of text
in various other languages, because they were initially typed on Mac OS 9 in the 
"funny ways" Mac did
things then which are not the same as the "funny ways" (a.k.a. Unicode) we do 
things now.

Had I had a stack that allowed me to import the document, or copy-paste the 
text, and then been able to tell
me the encodings of the various bits (chunks) so I could have run them through 
some merry little algorhythms,
life would have been considerably more refreshing.

------------

Now, I know the argument about Livecode not being a jollified word-processor 
that was trotted out when I made a few
comments about Supercard having ways of doing paragraphing and so on.

And, Livecode may NOT be a jollified word-processor; but if it is meant to be a 
computer programming language
rather than a simplified subset of one, it should have the wherewithall for 
programmers to build a word-processor
without recourse to outside resources. That means (quite apart from paragraph 
breaks, which can be easily arranged in Livecode)
the ability to recognise and tell the programmer all sorts of tex-encoding 
standards.

-----------

Now Graham's "Clever" is jolly gratifying, but, frankly, comparing 2 textFields 
in not very clever,
and, while that can differentiate between ASCII text and Unicode text that is 
as far as it goes.

---------

My latest riff is to have a command of the sort:

put textEncoding

and something of the sort 'plainText', 'RTFtext', 'htmlText', 'unicodeText' 
will be output as a result.

And then, for those who really go a bundle on this kind of thing, we might 
extend that to 'UTF8', 'UTF16', 'UTF32' and so forth.

Richmond.

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to