Re: [NTG-context] issue with scite module

2022-06-04 Thread Max Chernoff via ntg-context

Could anyone confirm the issue or explain me what I am missing?


Confirmed on Win64 with the same version.

But I did find a workaround: if I convert your example from NFC 
(composed) to NFD (decomposed), it compiles fine.


$ xxd xml.tex
: 5c75 7365 6d6f 6475 6c65 5b73 6369 7465  \usemodule[scite
0010: 5d0a 5c73 7461 7274 7465 7874 0a5c 7374  ].\starttext.\st
0020: 6172 7458 4d4c 20c3 b15c 7374 6f70 584d  artXML ..\stopXM
0030: 4c0a 5c73 746f 7074 6578 740aL.\stoptext.

$ context xml
[...]
ConTeXt  ver: 2022.05.11 11:36 LMTX  fmt: 2022.6.2
[...]
The file ended when scanning an argument.
[...]
mtx-context | fatal error: return code: 1

$ uconv -x any-nfd xml.tex | sponge xml.tex

$ xxd xml.tex
: 5c75 7365 6d6f 6475 6c65 5b73 6369 7465  \usemodule[scite
0010: 5d0a 5c73 7461 7274 7465 7874 0a5c 7374  ].\starttext.\st
0020: 6172 7458 4d4c 206e cc83 5c73 746f 7058  artXML n..\stopX
0030: 4d4c 0a5c 7374 6f70 7465 7874 0a ML.\stoptext.

$ context xml
[success]

This also gives us a hint as to what the problem is:

$ echo -n 'ñ' | xxd
: c3b1 ..

$ echo -n 'ñ' | uconv -x any-nfd | xxd
: 6ecc 83  ...

$ xxd xml.tex
0020: 6172 7458 4d4c 20c3 b15c 7374 6f70 584d  artXML ..\stopXM
   ^^ ^^
$ xxd xml.log
0570: 5c73 6c78 6465 6661 756c 747b c37d 7d5c  \slxdefault{.}}\
^^

The character "ñ" in UTF-8 NFC is "0xC3, 0xB1". The "0xC3" starts a 
2-byte character, while "0xB1" is a continuation character. In the error 
message from the log, we have "0xC3, 0x7D" which is a 2-byte leading 
byte followed by an ASCII character, which is invalid UTF-8.


I'm guessing that what's happening is the module code is just grabbing 
one character at a time, which works for ASCII, but can lead to orphaned 
characters in Unicode. The NFD form fixes this since the first byte of 
the line is the plain ASCII "n", which can freely be treated as a single 
byte.


This NFD workaround should hopefully "fix" things for basic Latin 
characters with accents, but it probably won't help with non-Latin 
characters since there isn't an ASCII character to decompose them into.


-- Max
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


[NTG-context] issue with scite module

2022-06-04 Thread Pablo Rodriguez via ntg-context
Dear list,

I have this minimal sample (with current latest from 2022.05.11 11:36 on
Linux64):

  \usemodule[scite]
  \starttext
  \startXML ñ\stopXML
  \stoptext

Commenting out the first line avoids compilation error.

Replacing ñ with n also allows compilation.

I think there may be an error in m-scite.mkiv.

The inclusion of non-ASCII characters in XML code seems to leave an
unclosed argument.

Could anyone confirm the issue or explain me what I am missing?

Many thanks for your help,

Pablo
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] issue with scite module

2022-06-04 Thread Pablo Rodriguez via ntg-context
On 6/3/22 00:52, Max Chernoff via ntg-context wrote:
>> For the sake of consistency (with buff-imp-xml.lua), I think the patch
>> should read
>  > [...]
>> +local alsoname = lpatterns.utf8two + lpatterns.utf8three +
>> lpatterns.utf8four
>
> I think that that pattern is a little too broad, since it will match any
> non-ASCII Unicode character. Things like U+202E (xkcd.com/1137), U+00A0
> (no-break space), etc are valid UTF-8 characters, but not valid XML tag
> names. Neither of these two characters are matched by the TeX catcode
> check. This doesn't make any real difference for a syntax highlighter
> though.

Hi Max,

many thanks for your reply.

At best, the patch is only a suggestion and Hans will merge the code he
sees it fits.

>> +local name = (R("az","AZ","09") + S("_-.") + + alsoname)^1
>
> There's a doubled plus in the middle there. The patch works when I
> remove it.

I noticed it too just after sending the message to the list, but I had
to solve the issue with my installation first.

>> But I’m afraid I cannot make it work on my computer (Linux64).
>>
>> On another Win64 computer, both patches worked perfectly fine.
>
> Hmm, that's really weird. Both patches work for me on my main Win64
> computer (after I fixed the extra plus).

It was a stupid mistake on my side. The patch I sent before points to
the error:

--- scite-context-lexer-xml.lua 2022-06-01 17:24:38.625976000 +0200
+++
context/tex/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
2022-06-02 16:37:30.112824947 +0200

I was compiling the sample file in the directory where the unmodified
version of "scite-context-lexer-xml.lua" was running.

ConTeXt was reading the unmodified file and not the modified one, but
that was all my fault.

Now I have to find a MWE for issues I’m experiencing with XML sources
and using the scite module.

Many thanks for your help,

Pablo
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] issue with scite module

2022-06-02 Thread Max Chernoff via ntg-context

For the sake of consistency (with buff-imp-xml.lua), I think the patch
should read

> [...]

+local alsoname = lpatterns.utf8two + lpatterns.utf8three +
lpatterns.utf8four


I think that that pattern is a little too broad, since it will match any 
non-ASCII Unicode character. Things like U+202E (xkcd.com/1137), U+00A0 
(no-break space), etc are valid UTF-8 characters, but not valid XML tag 
names. Neither of these two characters are matched by the TeX catcode 
check. This doesn't make any real difference for a syntax highlighter 
though.



+local name = (R("az","AZ","09") + S("_-.") + + alsoname)^1


There's a doubled plus in the middle there. The patch works when I 
remove it.



But I’m afraid I cannot make it work on my computer (Linux64).

On another Win64 computer, both patches worked perfectly fine.


Hmm, that's really weird. Both patches work for me on my main Win64 
computer (after I fixed the extra plus). I also pulled the 
"contextgarden/context:lmtx" Docker image (Debian sid), and both patches 
worked there too. I get this from inside the container:


root@e8d29a32595c:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux bookworm/sid"
NAME="Debian GNU/Linux"
ID=debian
HOME_URL="https://www.debian.org/;
SUPPORT_URL="https://www.debian.org/support;
BUG_REPORT_URL="https://bugs.debian.org/;

root@e8d29a32595c:~# locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

root@e8d29a32595c:~# xxd test.tex
: 5c75 7365 6d6f 6475 6c65 5b73 6369 7465  \usemodule[scite
0010: 5d0a 5c73 7461 7274 7465 7874 0a5c 7374  ].\starttext.\st
0020: 6172 7454 4558 7061 6765 5b6f  7365  artTEXpage[offse
0030: 743d 3165 785d 0a5c 7479 7065 5b6f 7074  t=1ex].\type[opt
0040: 696f 6e3d 786d 6c5d 7b3c 616e 732f 3e7d  ion=xml]{}
0050: 0a5c 7479 7065 5b6f 7074 696f 6e3d 786d  .\type[option=xm
0060: 6c5d 7b3c c3a1 c3b1 c39f 2f3e 7d0a 5c73  l]{<../>}.\s
0070: 746f 7054 4558 7061 6765 0a5c 7374 6f70  topTEXpage.\stop
0080: 7465 7874 0a text

root@e8d29a32595c:~# context --version
mtx-context | ConTeXt Process Management 1.04
mtx-context |
mtx-context | main context file: [snip]
mtx-context | current version: 2022.05.11 11:36
mtx-context | main context file: [snip]
mtx-context | current version: 2022.05.11 11:36

ldd "$(type -p luametatex)"
linux-vdso.so.1 (0x7ffdbe9a5000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7f4b034d4000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
(0x7f4b034b3000)

libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7f4b0336f000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f4b03196000)
/lib64/ld-linux-x86-64.so.2 (0x7f4b03a55000)

Is this perhaps a weird locale or encoding issue? Maybe try compiling with:

LC_ALL=C.UTF-8 LANG=C.UTF-8 context test.tex

or

LC_ALL=POSIX LANG=POSIX context test.tex

I'm surprised Linux is the one not working here, since it's usually 
Windows that has text encoding issues with its weird hybrid of DOS 
codepages and UTF-16+BOM.


The only other thing that I can think of is a weird library issue with 
your distro, but LuaMetaTeX is statically linked. Not sure what else to 
check here.


-- Max
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] issue with scite module

2022-06-02 Thread Pablo Rodriguez via ntg-context
On 6/2/22 17:36, Pablo Rodriguez via ntg-context wrote:
> On 6/1/22 23:58, Max Chernoff via ntg-context wrote:
>>
>> local name = (R("az","AZ","09") + S("_-.") + 
>> lpeg.utfchartabletopattern(characters.csletters))^1
>
> I’m afraid I cannot make your proposed fix work.

Even with a brand new install, neither of both patches works for me.

I don’t know what I may be missing on my installation.

Do you have any hint about what I am doing wrong?

Many thanks for your help,

Pablo
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] issue with scite module

2022-06-02 Thread Pablo Rodriguez via ntg-context
On 6/1/22 23:58, Max Chernoff via ntg-context wrote:
>> Now, I still don’t understand LPEG and don’t know if there’s a general
>> “character” class that doesn’t need a list...

Many thanks for your reply, Hraban.

> The easiest way that I found was to just cheat and use everything with
> a TeX catcode 11 ("letters"):
>
>   local name = (R("az","AZ","09") + S("_-.") + 
> lpeg.utfchartabletopattern(characters.csletters))^1

Many thanks for your reply, Max,

I’m afraid I cannot make your proposed fix work.

For the sake of consistency (with buff-imp-xml.lua), I think the patch
should read (also attached to the message to avoid wrong line breaking):

--- scite-context-lexer-xml.lua 2022-06-01 17:24:38.625976000 +0200
+++
context/tex/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
2022-06-02 16:37:30.112824947 +0200
@@ -13,7 +13,7 @@
 -- todo: parse entities in attributes

 local global, string, table, lpeg = _G, string, table, lpeg
-local P, R, S, C, Cmt, Cp = lpeg.P, lpeg.R, lpeg.S, lpeg.C, lpeg.Cmt,
lpeg.Cp
+local P, R, S, C, Cmt, Cp, lpatterns = lpeg.P, lpeg.R, lpeg.S, lpeg.C,
lpeg.Cmt, lpeg.Cp, lpeg.patterns
 local type = type
 local match, find = string.match, string.find

@@ -41,7 +41,8 @@
 local equal= P("=")
 local ampersand= P("&")

-local name = (R("az","AZ","09") + S("_-."))^1
+local alsoname = lpatterns.utf8two + lpatterns.utf8three +
lpatterns.utf8four
+local name = (R("az","AZ","09") + S("_-.") + + alsoname)^1
 local openbegin= P("<")
 local openend  = P("") + P(">")

But I’m afraid I cannot make it work on my computer (Linux64).

On another Win64 computer, both patches worked perfectly fine.

Both machines run LMTX current latest. So I have an issue on my
installation that I have to fix first.

Many thanks for your help,

Pablo
--- scite-context-lexer-xml.lua	2022-06-01 17:24:38.625976000 +0200
+++ context/tex/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua	2022-06-02 16:37:30.112824947 +0200
@@ -13,7 +13,7 @@
 -- todo: parse entities in attributes
 
 local global, string, table, lpeg = _G, string, table, lpeg
-local P, R, S, C, Cmt, Cp = lpeg.P, lpeg.R, lpeg.S, lpeg.C, lpeg.Cmt, lpeg.Cp
+local P, R, S, C, Cmt, Cp, lpatterns = lpeg.P, lpeg.R, lpeg.S, lpeg.C, lpeg.Cmt, lpeg.Cp, lpeg.patterns
 local type = type
 local match, find = string.match, string.find
 
@@ -41,7 +41,8 @@
 local equal= P("=")
 local ampersand= P("&")
 
-local name = (R("az","AZ","09") + S("_-."))^1
+local alsoname = lpatterns.utf8two + lpatterns.utf8three + lpatterns.utf8four
+local name = (R("az","AZ","09") + S("_-.") + + alsoname)^1
 local openbegin= P("<")
 local openend  = P("") + P(">")
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] issue with scite module

2022-06-01 Thread Max Chernoff via ntg-context

Now, I still don’t understand LPEG and don’t know if there’s a general
“character” class that doesn’t need a list...


Well looking through the XML spec

https://www.w3.org/TR/REC-xml/#NT-NameChar

you'd think that we'd want a pattern like this:

local name = (R("az","AZ","09", "\u{C0}\u{D6}", "\u{D8}\u{F6}", "\u{F8}\u{2FF}", "\u{370}\u{37D}", "\u{37F}\u{1FFF}", "\u{200C}\u{200D}", 
"\u{2070}\u{218F}", "\u{2C00}\u{2FEF}", "\u{3001}\u{D7FF}", "\u{F900}\u{FDCF}", "\u{FDF0}\u{FFFD}", "\u{1}\u{E}", "\u{0300}\u{036F}", "\u{203F}\u{2040}") + 
S("_-.\u{B7}"))^1

But that doesn't work, since


The same is true for lpeg.R, although the latter will display an error message 
if used
with multibyte characters. Therefore lpeg.R('aä') results in the message bad 
argument #1
to 'R' (range must have two characters), since to lpeg, ä is two ’characters’ 
(bytes), so
aä totals three. (https://texdoc.org/serve/luatex/0##680)


The easiest way that I found was to just cheat and use everything with
a TeX catcode 11 ("letters"):

local name = (R("az","AZ","09") + S("_-.") + 
lpeg.utfchartabletopattern(characters.csletters))^1

This isn't strictly speaking correct, but I think that it's close
enough. It seems to work correctly for Pablo's initial example,
but it may break something else.

-- Max

diff --git 
a/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.original
 b/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
index e635d40..97de3fd 100644
--- 
a/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.original
+++ 
b/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
@@ -41,7 +41,7 @@ local semicolon= P(";")
 local equal= P("=")
 local ampersand= P("&")
 
-local name = (R("az","AZ","09") + S("_-."))^1

+local name = (R("az","AZ","09") + S("_-.") + 
lpeg.utfchartabletopattern(characters.csletters))^1
 local openbegin= P("<")
 local openend  = P("") + P(">")




___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] issue with scite module

2022-06-01 Thread Henning Hraban Ramm via ntg-context

Am 01.06.22 um 19:45 schrieb Pablo Rodriguez via ntg-context:

But I don’t know which file deals with it (so I could try to submit a
patch).


That would be texmf-context/tex/context/modules/mkiv/m-scite.mkiv
and 
texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua

and there probably

local name = (R("az","AZ","09") + S("_-."))^1

Now, I still don’t understand LPEG and don’t know if there’s a general 
“character” class that doesn’t need a list...


Hraban
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] issue with scite module

2022-06-01 Thread Pablo Rodriguez via ntg-context
On 6/1/22 18:58, Henning Hraban Ramm via ntg-context wrote:
> Am 01.06.22 um 18:47 schrieb Pablo Rodriguez via ntg-context:
>> [...]
>> Could anyone confirm the issue?
>
> Hi Pablo,
>
> with LMTX version 2022.05.11, both elements are displayed, but the first
> in blue, the second in red. Apparently the scite highlighter doesn’t
> like non-ASCII characters in elements.

Hi Hraban,

this is exactly what I’m experiencing (and sorry, I forgot to mention
that I was using current latest).

I experienced that without scite and Hans fixed it (in buff-imp-xml.lua).

I mentioned both Geany and Notepad++, because I think it may not be an
issue outside ConTeXt.

But I don’t know which file deals with it (so I could try to submit a
patch).

Many thanks for your help,

Pablo
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] issue with scite module

2022-06-01 Thread Henning Hraban Ramm via ntg-context

Am 01.06.22 um 18:47 schrieb Pablo Rodriguez via ntg-context:

Dear list,

I have the following sample:

   \usemodule[scite]
   \starttext
   \startTEXpage[offset=1ex]
   \type[option=xml]{}
   \type[option=xml]{<áñß/>}
   \stopTEXpage
   \stoptext

Using scite, I don’t get the second element right.

Without scite, both elements are displayed right.

In both Geany and Notepad++ (which use Scintilla internally), the two
elements are displayed right.

Could anyone confirm the issue?


Hi Pablo,

with LMTX version 2022.05.11, both elements are displayed, but the first 
in blue, the second in red. Apparently the scite highlighter doesn’t 
like non-ASCII characters in elements.


Hraban
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


[NTG-context] issue with scite module

2022-06-01 Thread Pablo Rodriguez via ntg-context
Dear list,

I have the following sample:

  \usemodule[scite]
  \starttext
  \startTEXpage[offset=1ex]
  \type[option=xml]{}
  \type[option=xml]{<áñß/>}
  \stopTEXpage
  \stoptext

Using scite, I don’t get the second element right.

Without scite, both elements are displayed right.

In both Geany and Notepad++ (which use Scintilla internally), the two
elements are displayed right.

Could anyone confirm the issue?

Many thanks for your help,

Pablo
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___