Re: Decoding "quoted-printable" -- Help needed -- Reopened - Solved 2nd

2019-11-14 Thread R.H. via use-livecode
I am very sorry that I am overstressing this list. I keep on answering my
own questions.

The function needs to address bytes. I found this looking at some similar
C# code:

# Code snippet from C#
# Source:
https://stackoverflow.com/questions/32083334/consecutive-control-characters-in-quoted-printable-not-decoding-correctly
---
string sHex = input;
sHex = sHex.Substring(i + 1, 2);
int hex = Convert.ToInt32(sHex, 16);
byte b = Convert.ToByte(hex);
output.Add(b);
i += 3;
---

I oversaw that the value must be a byte value. Anyway, that is all new to
me.
So, the correct and tested converting to and from "quoted-printable" with
encoded UTF8 in LiveCode >7 is:

---
local tChar
local tItem
local tCodedChar
local tCodePoint
local tEncoded
local tDecoded

set the itemdelimiter to "="

// ENCODE EXAMPLE
put "€" into tChar
put textEncode ( tChar , "UTF-8" ) into tCodedChar
repeat for each codePoint tCodePoint in tCodedChar
  put "="& baseConvert ( byteToNum ( tCodePoint ) , 10 , 16 ) after
tEncoded
end repeat
put tEncoded into msg --->  "=E2=82=AC" - the quoted-printable UFT-8
encoding of the Euro symbol "€"

// DECODE EXAMPLE
put "=E2=82=AC" into tEncoded
delete char 1 of tEncoded
repeat for each item tItem in tEncoded
  put numToByte ( BaseConvert ( tItem , 16 , 10 ) ) after tDecoded
end repeat
put textDecode ( tDecoded , "UTF-8" ) into msg --> the Euro symbol "€"
---

Thanks to all.

Given a bit of time, I will post a solution for UTF8 quoted-printable
encoded E-Mail blocks of text in the Forum.

Roland


---

Am Do., 14. Nov. 2019 um 20:41 Uhr schrieb R.H. :
>
> Oh, sorry, I was too quick declaring a solution.
>
> Even though the code of the function works fine, the result also converts
back, but the "quoted-printable" or "UTF-8" code expects that each
codepoint is encoded in Hex with just two ASCII letters representing a
codepoint.
>
> For example, for the Euro symbol "€" we have three codepoints.
> The function below converts to "=E2=201A=AC" while it must be "=E2=82=AC".
> The "=" sign is just a delimiter in quoted-printable.
>
> Now, I do not know what is wrong in my thinking as I am not getting quite
the same results.
> (The result is ok for other symbols such as 'ü'.)
>
> EXAMPLE:
>
> put "€" into tChar
>// First encode to UTF-8:
> put textEncode(tChar,"UTF-8") into tCodedChar
>// Repeat for each codepoint in the UTF-8 char
> repeat for each codePoint tCodePoint in tCodedChar
>// Encode each codepoint to its integer expression and convert to
Hex value:
>   put "="& BaseConvert ( codePointToNum (tCodePoint) , 10 , 16 )
after tEncoded
> end repeat
> put tEncoded into field "Show Codepoints" -- Expected ASCII representing
Hex numbers
> -- Result: "=E2=201A=AC" -- Instead of "=E2=82=AC" , but valid and
working.
>
> The actual "correct" UTF-8 result can be tested here:
http://www.endmemo.com/unicode/unicodeconverter.php
>
> What am I missing?
>
> Thanks a lot
> Roland
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Decoding "quoted-printable" -- Help needed -- Reopened

2019-11-14 Thread R.H. via use-livecode
Oh, sorry, I was too quick declaring a solution.

Even though the code of the function works fine, the result also converts
back, but the "quoted-printable" or "UTF-8" code expects that each
codepoint is encoded in Hex with just two ASCII letters representing a
codepoint.

For example, for the Euro symbol "€" we have three codepoints.
The function below converts to "=E2=201A=AC" while it must be "=E2=82=AC".
The "=" sign is just a delimiter in quoted-printable.

Now, I do not know what is wrong in my thinking as I am not getting quite
the same results.
(The result is ok for other symbols such as 'ü'.)

EXAMPLE:

put "€" into tChar
   // First encode to UTF-8:
put textEncode(tChar,"UTF-8") into tCodedChar
   // Repeat for each codepoint in the UTF-8 char
repeat for each codePoint tCodePoint in tCodedChar
   // Encode each codepoint to its integer expression and convert to
Hex value:
  put "="& BaseConvert ( codePointToNum (tCodePoint) , 10 , 16 ) after
tEncoded
end repeat
put tEncoded into field "Show Codepoints" -- Expected ASCII representing
Hex numbers
-- Result: "=E2=201A=AC" -- Instead of "=E2=82=AC" , but valid and working.

The actual "correct" UTF-8 result can be tested here:
http://www.endmemo.com/unicode/unicodeconverter.php

What am I missing?

Thanks a lot
Roland
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Decoding "quoted-printable" -- Help needed -- Solved

2019-11-14 Thread R.H. via use-livecode
For those interested:

With the help of a privately received message hinting at a solution used
prior to LC7 I was able to construct the required functions for LC 7 and
above.

I must say that I am not really aware of all the many functions LiveCode
presents, I did not even know about baseConvert() before doing a lot of
research. I guess, each of us must go through all the commands and
functions LC provides and study them. It is difficult to find when not
knowing how and what to search for. Also, I had to try to understand what
codepoints are.

Here I am not using the actual quoted-printable format of codepoints in Hex
presentation each with a equal sign "=" as a prefix. That is easy to
retrieve or construct using LiveCode chunk expressions. Instead I am using
comma delimited items.

// The encoding priot to LC7 according to Mark (still works even today)
-- put unidecode(uniencode("e","english"),"UTF8") into x
-- put chartonum(char 1 of x) && chartonum(char 2 of x) into y

// Encoding and decoding UTF-8 for quoted-printable chars (as they may
appear in certain e-mail parts)
set the itemdelimiter to ","
put "€" into tChar -- Using the Euro symbol which is encoded with 3
codepoints (there can be up to 4 for quoted-printable).

// Encode a UTF-8 character to a quoted-printable ASCII encoding
put textEncode( tChar ,"UTF-8") into tCodedChar
repeat for each codePoint tCodePoint in tCodedChar
   put BaseConvert ( codePointToNum (tCodePoint) , 10 , 16 ) &"," after
tEncoded
end repeat
delete last char of tEncoded
put tEncoded  into msg -- just for testing

// Decode a quoted-printable ASCII string to UTF-8
put empty into tDecoded
repeat for each item tItem in tEncoded
   put numToCodePoint ( BaseConvert ( tItem , 16, 10 ) ) after tDecoded
end repeat
put textDecode (tDecoded , "UTF-8") after msg -- just for testing

// Result in message box
-- E2,201A,AC -- In actual quoted-printable that would be: "=E2=201A=AC"
and our items must be converted accordingly
-- €

Roland
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Decoding "quoted-printable" -- Help needed

2019-11-12 Thread R.H. via use-livecode
Even with a lot of research and comparing functions in C# and Javascript, I
do understand it yet.

In E-Mail-bodies, the content parts are often either based64-encoded, no
problem with that, but there are also other encodings called
"quoted-printable". This is text that in my case needs to be converted to
UTF-8.

Now, here all characters that are not pure ASCII are marked with a equal
sign "=" (similar to the "%" in an URL encoded string) and the following
two characters define the byte value in Hex notation. There can be one, two
and even three separate byte values for a character encoded in UTF-8.

Example: "F=C3=BCr". This translates to the German Umlaut and would render
to the string "für". The "ü" is not part of the pure ASCII and therefore it
is encoded this way. It is an encoding specific for UTF-8.

Now, as you can see, there is not just one byte represented with "=C3".
There are actually two bytes "=C3=BC": represented in Hex by "C3" and "BC"
each individually converted to decimal notation as 195 and 188. If you
URL-encode the single bytes using "%" instead of "=" such as "%U3" it will
give it's own character whith will be "À". The URL-encoding of "%BC" gives
"Ä". So, this does not help. I have to somenow look at the two bytes
together.

Converting pure ASCI to Hex gives the correct result in other programs:
-- Link: https://www.rapidtables.com/convert/number/ascii-to-hex.html:
-- Enter: "ü"
-- Result: "C3,BC" --- what we are looking for when encoding: Two separate
byte representations.
-- But it only works when the character encoding is UTF-8.

How do I come from "=C3=BC" to codepoint("ü") = 252? What do I need to
calculate?
How do we  decode such "quoted-printable" encoded string to UTF-8?

Thanks in advance...)
Roland
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode