I extracted an example. The main issue is curly quotes. The text came from FileMaker in UTF8, which I textDecode to UTF16. You can assume that all text is LC native throughout the app.

Here is the template I use for merge:
<p leftindent="10" spacebelow="20"><span metadata="[[tMETADATA]]"><font size="16" color="#C77C02">[[tSECTION]]&#11;</font>[[tCONCEPT]]</span></p>

In the field, this text is displayed accurately with curly quotes:
The New Testament Scholar <soft break> “Dare to reason!”

Here is a result of the merge:
<p leftindent="10" spacebelow="20" bgcolor="#FFDD71"><span metadata="The New Testament Scholar&#9;EN07_&#55231;&#57298;Dare to reason!&#55231;&#57299;"><font size="16" color="#C77C02">The New Testament Scholar&#11;</font>&ldquo;Dare to reason!&rdquo;</span></p>

Notice that the displayed text uses entity names (&ldquo, &rdquo) while the metadata which was created from the same text block as the field text has changed the quotes to two numbers in the high 5000s with no difference between left and right quotes. I was unable to paste the actual text here, as my mail client refused to render it, but the two numerical references appear as a single pictograph in LC's variable watcher, and do not match the card path I need, which in this case is:
EN07_The New Testament Scholar<tab>“Dare to reason!”

Maybe you can make sense of this? I've written an ugly workaround that pieces together the reference I need, but it would be better if I could just use the metadata. The metadata works fine as long as there are no quotes.

On 9/9/19 11:35 PM, dsc--- via use-livecode wrote:
I think I'm doing this wrong. This seems to work, too.

on mouseup
    put empty into field 1
    put numToCodepoint(0x2200) into x
    put numToCodepoint(0x1040F) & "V-" into y
    put merge(" é{ [[x]] }é [[y]]") into field 1
end mouseup


On Sep 9, 2019, at 10:25 PM, dsc--- via use-livecode 
<use-livecode@lists.runrev.com> wrote:

And this, too, looks OK to me.

on mouseup
   put empty into field 1
   put "A" into field 1
   get numToCodepoint(0x2200) & numToCodepoint(0x1040F) & "V-"
   set the metadata of char 1 of field 1 to it
   put the metadata of char 1 of field 1 after field 1
end mouseup

I guess the problem is in the merge as you thought.

I did notice in the dictionary that setting the metadata of a line is not the 
same as setting the metadata of all of the characters of the line.

Dar Scott


On Sep 9, 2019, at 8:58 PM, Dar Scott Consulting via use-livecode 
<use-livecode@lists.runrev.com> wrote:

This quick check seems to work for me.

on mouseup

put "A" into field 1

set the metadata of char 1 of field 1 to "é"

put the metadata of char 1 of field 1 after field 1

end mouseup


On Sep 9, 2019, at 8:32 PM, J. Landman Gay via use-livecode 
<use-livecode@lists.runrev.com> wrote:

Well, I've made some changes to the code since I started urlEncoding the text 
before merging so I'll check that again. Paul is right that unicode in htmltext 
needs to be in hex, but the numbers I'm getting back are very high (8,000+) and 
render in the field as strange pictographs. Elsewhere where there is no merge, 
curly quotes translate to the named quote or apostrophe entities and are 
correct.

By metadata I mean the LC term (see the dictionary) that allows you to attach 
some text to a field text chunk. The metadata isn't displayed in the field but 
you can use it for anything you want. In my case the field is a list of 
clickable entries in a table of contents, each with its own metadata attached 
that provides a path to the stack and card the entry needs to open.

When I use normal LC text as metadata, diacriticals aren't rendered correctly 
(curly quotes become question marks,) the path is therefore incorrect and the 
click goes nowhere.

Since LC is supposed to be unicode throughout, I'd expect metadata to be 
compatible. The same text appears correctly when not used as metadata.
--
Jacqueline Landman Gay | jac...@hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
On September 9, 2019 7:25:28 PM Dar Scott Consulting via use-livecode 
<use-livecode@lists.runrev.com> wrote:

I think you are trying to think too much about the LC implementation of text. 
Maybe.

Text in LC is an abstraction of a sequence of code points. Whether it is UTF16 
or not is hidden to me. (mostly)

So,

get textDecode( binaryFromServer, "UTF-8" )

should put that into the correct form, if it is really UTF-8.

A data (binary bytes) is interpreted as native encoding if one tries to use it 
as text. I recommend against this. I try to always textDecode() everything 
coming in, but I make exceptions at times for ASCII.

I'm not sure what you mean by metadata. Are you referring to HTTP content-type?

Sorry, if I am off on a bunny trail...

Dar

On Sep 9, 2019, at 4:38 PM, J. Landman Gay via use-livecode 
<use-livecode@lists.runrev.com> wrote:

It's UTF8 text from a server, which I textDecode to UTF16. When I use the UTF16 
text in a merge, diacriticals and/or curly quotes get mangled. (Same with 
setting metadata on field text too.)

On 9/9/19 4:16 PM, Dar Scott Consulting via use-livecode wrote:
I'm not sure I understand.
Do you mean "encoded to UTF-16"? In that case you should decode that to convert 
it to internal text. And then try merge. (Which still might have problems, I suppose.)
On Sep 9, 2019, at 12:08 PM, J. Landman Gay via use-livecode 
<use-livecode@lists.runrev.com> wrote:


It seems that the merge command doesn't respect unicode. Does anyone have a 
workaround? The text I'm inserting is already decoded to UTF16.


--
Jacqueline Landman Gay         |     jac...@hyperactivesw.com
HyperActive Software           |     http://www.hyperactivesw.com




--
Jacqueline Landman Gay         |     jac...@hyperactivesw.com
HyperActive Software           |     http://www.hyperactivesw.com


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to