Re: [dev-context] lpdf-ini.lmt: lpdf.tosixteen(), wrong conversion to UTF-16BE

2021-02-13 Thread Hans Hagen

Hi,


+v = v - 0x1


ah, i hadn't noted that line (btw, in the file there is a remark where i 
add the 0x1 that it is inconsistent so i should have looked into it 
then, sigh)



My other suggestion, which does the subtraction only for one surrogate
is below.


btw, performance wise the separate step is the same as doing it in the 
one liner (lua does all via the stack so in general using intermediate 
steps assiging to (here v) is often quite ok)


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
dev-context mailing list
dev-context@ntg.nl
https://mailman.ntg.nl/mailman/listinfo/dev-context


Re: [dev-context] lpdf-ini.lmt: lpdf.tosixteen(), wrong conversion to UTF-16BE

2021-02-12 Thread Michal Vlasák
On Tue Feb 9, 2021 at 7:49 PM CET, Hans Hagen wrote:
> On 2/9/2021 6:57 PM, Michal Vlasák wrote:
> > Hello,
> > 
> > conversion to UTF-16BE PDF strings used for example in bookmarks / PDF
> > outlines is not right.
> > 
> > Take the following example:
> > 
> > ```
> > \starttext
> > \setupinteraction[state=start]
> > \placebookmarks[section][number=no]
> > 
> > \section[bookmark=필]
> > 
> > \stoptext
> > ```
> > 
> > Produces:  for 필 (U+1D544), instead of the correct
> > .
> > 
> > 
> > The relevant function is `lpdf.tosixteen()` (from lpdf-ini.lmt), and its
> > `cache`. (Although the same function is also in lpdf-aux.lmt, and in
> > MkIV equivalents).
> > 
> > My proposal (also enclosed as a file attachment):
> > 
> > ```
> > --- a/lpdf-ini.lmt
> > +++ b/lpdf-ini.lmt
> > @@ -178,7 +178,8 @@
> >   if v < 0x1 then
> >   v = format("%04x",v)
> >   else
> > -v = format("%04x%04x",rshift(v,10),v%1024+0xDC00)
> > +v = v - 0x1
> > +v = format("%04x%04x",rshift(v,10)+0xD800,v%1024+0xDC00)
> >   end
> >   t[k] = v
> >   return v
> > ```
> > 
> > (Note the similiarity to existing function `big()` in l-unicode.lua.)
> > 
> > I found this by chance, but I am not really a ConTeXt user, so I hope
> > didn't miss anything.
>
> Thanks for noticing (btw, the aux file is used on some scripts, not in
> context itself).
>
> Hans

Unfortunately the version in latest LMTX is still not right. The
subtraction of 0x1 is really needed, at least for the high
surrogate. (Note how the number is added back in the inverse function
`lpdf.fromsixteen()`.)

My other suggestion, which does the subtraction only for one surrogate
is below.

(Although I prefer my first suggestion, quoted above, which seems more
clear - from number in range 0x1 - 0x10 subtract 0x1, which
makes it a number in range 0x0 - 0xF, a 20 bit number, the higher 10
bits are encoded into the higher surrogate (16 bits), by adding 0xD800
(so the remaining high 6 bits are 110110), and the lower 10 bits are
encoded into the lower surrogate by adding 0xDC00 (high 6 bits are
110111).)

Michal

--- a/lpdf-ini.lmt
+++ b/lpdf-ini.lmt
@@ -176,7 +176,7 @@
 if v < 0x1 then
 v = format("%04x",v)
 else
-v = format("%04x%04x",rshift(v,10)+0xD800,v%1024+0xDC00)
+v = format("%04x%04x",rshift(v-0x1,10)+0xD800,v%1024+0xDC00)
 end
 t[k] = v
 return v
___
dev-context mailing list
dev-context@ntg.nl
https://mailman.ntg.nl/mailman/listinfo/dev-context


Re: [dev-context] lpdf-ini.lmt: lpdf.tosixteen(), wrong conversion to UTF-16BE

2021-02-09 Thread Hans Hagen

On 2/9/2021 6:57 PM, Michal Vlasák wrote:

Hello,

conversion to UTF-16BE PDF strings used for example in bookmarks / PDF
outlines is not right.

Take the following example:

```
\starttext
\setupinteraction[state=start]
\placebookmarks[section][number=no]

\section[bookmark=필]

\stoptext
```

Produces:  for 필 (U+1D544), instead of the correct
.


The relevant function is `lpdf.tosixteen()` (from lpdf-ini.lmt), and its
`cache`. (Although the same function is also in lpdf-aux.lmt, and in
MkIV equivalents).

My proposal (also enclosed as a file attachment):

```
--- a/lpdf-ini.lmt
+++ b/lpdf-ini.lmt
@@ -178,7 +178,8 @@
  if v < 0x1 then
  v = format("%04x",v)
  else
-v = format("%04x%04x",rshift(v,10),v%1024+0xDC00)
+v = v - 0x1
+v = format("%04x%04x",rshift(v,10)+0xD800,v%1024+0xDC00)
  end
  t[k] = v
  return v
```

(Note the similiarity to existing function `big()` in l-unicode.lua.)

I found this by chance, but I am not really a ConTeXt user, so I hope
didn't miss anything.
Thanks for noticing (btw, the aux file is used on some scripts, not in 
context itself).


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
dev-context mailing list
dev-context@ntg.nl
https://mailman.ntg.nl/mailman/listinfo/dev-context


[dev-context] lpdf-ini.lmt: lpdf.tosixteen(), wrong conversion to UTF-16BE

2021-02-09 Thread Michal Vlasák
Hello,

conversion to UTF-16BE PDF strings used for example in bookmarks / PDF
outlines is not right.

Take the following example:

```
\starttext
\setupinteraction[state=start]
\placebookmarks[section][number=no]

\section[bookmark=필]

\stoptext
```

Produces:  for 필 (U+1D544), instead of the correct
.


The relevant function is `lpdf.tosixteen()` (from lpdf-ini.lmt), and its
`cache`. (Although the same function is also in lpdf-aux.lmt, and in
MkIV equivalents).

My proposal (also enclosed as a file attachment):

```
--- a/lpdf-ini.lmt
+++ b/lpdf-ini.lmt
@@ -178,7 +178,8 @@
 if v < 0x1 then
 v = format("%04x",v)
 else
-v = format("%04x%04x",rshift(v,10),v%1024+0xDC00)
+v = v - 0x1
+v = format("%04x%04x",rshift(v,10)+0xD800,v%1024+0xDC00)
 end
 t[k] = v
 return v
```

(Note the similiarity to existing function `big()` in l-unicode.lua.)

I found this by chance, but I am not really a ConTeXt user, so I hope
didn't miss anything.

Regards,
Michal Vlasák
if not modules then modules = { } end modules ['lpdf-ini'] = {
version   = 1.001,
optimize  = true,
comment   = "companion to lpdf-ini.mkiv",
author= "Hans Hagen, PRAGMA-ADE, Hasselt NL",
copyright = "PRAGMA ADE / ConTeXt Development Team",
license   = "see context related readme files"
}

-- beware of "too many locals" here

local setmetatable, getmetatable, type, next, tostring, tonumber, rawset = 
setmetatable, getmetatable, type, next, tostring, tonumber, rawset
local char, byte, format, gsub, concat, match, sub, gmatch = string.char, 
string.byte, string.format, string.gsub, table.concat, string.match, 
string.sub, string.gmatch
local utfchar, utfbyte, utfvalues = utf.char, utf.byte, utf.values
local sind, cosd, max, min = math.sind, math.cosd, math.max, math.min
local sort, sortedhash = table.sort, table.sortedhash
local P, C, R, S, Cc, Cs, V = lpeg.P, lpeg.C, lpeg.R, lpeg.S, lpeg.Cc, lpeg.Cs, 
lpeg.V
local lpegmatch, lpegpatterns = lpeg.match, lpeg.patterns
local formatters = string.formatters
local isboolean = string.is_boolean
local rshift = bit32.rshift

local report_objects= logs.reporter("backend","objects")
local report_finalizing = logs.reporter("backend","finalizing")
local report_blocked= logs.reporter("backend","blocked")

local implement = interfaces.implement

local context   = context

-- In ConTeXt MkIV we use utf8 exclusively so all strings get mapped onto a hex
-- encoded utf16 string type between <>. We could probably save some bytes by 
using
-- strings between () but then we end up with escaped ()\ too.

pdf = type(pdf) == "table" and pdf or { }
local factor= number.dimenfactors.bp

local codeinjections= { }
local nodeinjections= { }

local backends  = backends

local pdfbackend= {
comment= "backend for directly generating pdf output",
nodeinjections = nodeinjections,
codeinjections = codeinjections,
registrations  = { },
tables = { },
}

backends.pdf = pdfbackend

lpdf   = lpdf or { }
local lpdf = lpdf
lpdf.flags = lpdf.flags or { } -- will be filled later

table.setmetatableindex(lpdf, function(t,k)
report_blocked("function %a is not accessible",k)
os.exit()
end)

local trace_finalizers = false  trackers.register("backend.finalizers", 
function(v) trace_finalizers = v end)
local trace_resources  = false  trackers.register("backend.resources",  
function(v) trace_resources  = v end)

local pdfreserveobject
local pdfimmediateobject

updaters.register("backend.update.lpdf",function()
pdfreserveobject= lpdf.reserveobject
pdfimmediateobject  = lpdf.immediateobject
end)

do

updaters.register("backend.update.lpdf",function()
job.positions.registerhandlers {
getpos  = drivers.getpos,
getrpos = drivers.getrpos,
gethpos = drivers.gethpos,
getvpos = drivers.getvpos,
}
lpdf.getpos = drivers.getpos
end)

local pdfgetmatrix, pdfhasmatrix, pdfgetpos

updaters.register("backend.update.lpdf",function()
pdfgetmatrix = lpdf.getmatrix
pdfhasmatrix = lpdf.hasmatrix
pdfgetpos= lpdf.getpos
end)

-- local function transform(llx,lly,urx,ury,rx,sx,sy,ry)
-- local x1 = llx * rx + lly * sy
-- local y1 = llx * sx + lly * ry
-- local x2 = llx * rx + ury * sy
-- local y2 = llx * sx + ury * ry
-- local x3 = urx * rx + lly * sy
-- local y3 = urx * sx + lly * ry
-- local x4 = urx * rx + ury * sy
-- local y4 = urx * sx + ury * ry
-- llx = min(x1,x2,x3,x4);
-- lly = min(y1,y2,y3,y4);
-- urx = max(x1,x2,x3,x4);
-- ury = max(y1,y2,y3,y4);
-- return llx, lly, urx, ury
-- end
--
-- function lpdf.transform(llx,lly,urx,ury) -- not yet used so unchecked
-- if