Re: [dev-context] lpdf-ini.lmt: lpdf.tosixteen(), wrong conversion to UTF-16BE
Hi, +v = v - 0x1 ah, i hadn't noted that line (btw, in the file there is a remark where i add the 0x1 that it is inconsistent so i should have looked into it then, sigh) My other suggestion, which does the subtraction only for one surrogate is below. btw, performance wise the separate step is the same as doing it in the one liner (lua does all via the stack so in general using intermediate steps assiging to (here v) is often quite ok) Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl - ___ dev-context mailing list dev-context@ntg.nl https://mailman.ntg.nl/mailman/listinfo/dev-context
Re: [dev-context] lpdf-ini.lmt: lpdf.tosixteen(), wrong conversion to UTF-16BE
On Tue Feb 9, 2021 at 7:49 PM CET, Hans Hagen wrote: > On 2/9/2021 6:57 PM, Michal Vlasák wrote: > > Hello, > > > > conversion to UTF-16BE PDF strings used for example in bookmarks / PDF > > outlines is not right. > > > > Take the following example: > > > > ``` > > \starttext > > \setupinteraction[state=start] > > \placebookmarks[section][number=no] > > > > \section[bookmark=필] > > > > \stoptext > > ``` > > > > Produces: for 필 (U+1D544), instead of the correct > > . > > > > > > The relevant function is `lpdf.tosixteen()` (from lpdf-ini.lmt), and its > > `cache`. (Although the same function is also in lpdf-aux.lmt, and in > > MkIV equivalents). > > > > My proposal (also enclosed as a file attachment): > > > > ``` > > --- a/lpdf-ini.lmt > > +++ b/lpdf-ini.lmt > > @@ -178,7 +178,8 @@ > > if v < 0x1 then > > v = format("%04x",v) > > else > > -v = format("%04x%04x",rshift(v,10),v%1024+0xDC00) > > +v = v - 0x1 > > +v = format("%04x%04x",rshift(v,10)+0xD800,v%1024+0xDC00) > > end > > t[k] = v > > return v > > ``` > > > > (Note the similiarity to existing function `big()` in l-unicode.lua.) > > > > I found this by chance, but I am not really a ConTeXt user, so I hope > > didn't miss anything. > > Thanks for noticing (btw, the aux file is used on some scripts, not in > context itself). > > Hans Unfortunately the version in latest LMTX is still not right. The subtraction of 0x1 is really needed, at least for the high surrogate. (Note how the number is added back in the inverse function `lpdf.fromsixteen()`.) My other suggestion, which does the subtraction only for one surrogate is below. (Although I prefer my first suggestion, quoted above, which seems more clear - from number in range 0x1 - 0x10 subtract 0x1, which makes it a number in range 0x0 - 0xF, a 20 bit number, the higher 10 bits are encoded into the higher surrogate (16 bits), by adding 0xD800 (so the remaining high 6 bits are 110110), and the lower 10 bits are encoded into the lower surrogate by adding 0xDC00 (high 6 bits are 110111).) Michal --- a/lpdf-ini.lmt +++ b/lpdf-ini.lmt @@ -176,7 +176,7 @@ if v < 0x1 then v = format("%04x",v) else -v = format("%04x%04x",rshift(v,10)+0xD800,v%1024+0xDC00) +v = format("%04x%04x",rshift(v-0x1,10)+0xD800,v%1024+0xDC00) end t[k] = v return v ___ dev-context mailing list dev-context@ntg.nl https://mailman.ntg.nl/mailman/listinfo/dev-context
Re: [dev-context] lpdf-ini.lmt: lpdf.tosixteen(), wrong conversion to UTF-16BE
On 2/9/2021 6:57 PM, Michal Vlasák wrote: Hello, conversion to UTF-16BE PDF strings used for example in bookmarks / PDF outlines is not right. Take the following example: ``` \starttext \setupinteraction[state=start] \placebookmarks[section][number=no] \section[bookmark=필] \stoptext ``` Produces: for 필 (U+1D544), instead of the correct . The relevant function is `lpdf.tosixteen()` (from lpdf-ini.lmt), and its `cache`. (Although the same function is also in lpdf-aux.lmt, and in MkIV equivalents). My proposal (also enclosed as a file attachment): ``` --- a/lpdf-ini.lmt +++ b/lpdf-ini.lmt @@ -178,7 +178,8 @@ if v < 0x1 then v = format("%04x",v) else -v = format("%04x%04x",rshift(v,10),v%1024+0xDC00) +v = v - 0x1 +v = format("%04x%04x",rshift(v,10)+0xD800,v%1024+0xDC00) end t[k] = v return v ``` (Note the similiarity to existing function `big()` in l-unicode.lua.) I found this by chance, but I am not really a ConTeXt user, so I hope didn't miss anything. Thanks for noticing (btw, the aux file is used on some scripts, not in context itself). Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl - ___ dev-context mailing list dev-context@ntg.nl https://mailman.ntg.nl/mailman/listinfo/dev-context
[dev-context] lpdf-ini.lmt: lpdf.tosixteen(), wrong conversion to UTF-16BE
Hello, conversion to UTF-16BE PDF strings used for example in bookmarks / PDF outlines is not right. Take the following example: ``` \starttext \setupinteraction[state=start] \placebookmarks[section][number=no] \section[bookmark=필] \stoptext ``` Produces: for 필 (U+1D544), instead of the correct . The relevant function is `lpdf.tosixteen()` (from lpdf-ini.lmt), and its `cache`. (Although the same function is also in lpdf-aux.lmt, and in MkIV equivalents). My proposal (also enclosed as a file attachment): ``` --- a/lpdf-ini.lmt +++ b/lpdf-ini.lmt @@ -178,7 +178,8 @@ if v < 0x1 then v = format("%04x",v) else -v = format("%04x%04x",rshift(v,10),v%1024+0xDC00) +v = v - 0x1 +v = format("%04x%04x",rshift(v,10)+0xD800,v%1024+0xDC00) end t[k] = v return v ``` (Note the similiarity to existing function `big()` in l-unicode.lua.) I found this by chance, but I am not really a ConTeXt user, so I hope didn't miss anything. Regards, Michal Vlasák if not modules then modules = { } end modules ['lpdf-ini'] = { version = 1.001, optimize = true, comment = "companion to lpdf-ini.mkiv", author= "Hans Hagen, PRAGMA-ADE, Hasselt NL", copyright = "PRAGMA ADE / ConTeXt Development Team", license = "see context related readme files" } -- beware of "too many locals" here local setmetatable, getmetatable, type, next, tostring, tonumber, rawset = setmetatable, getmetatable, type, next, tostring, tonumber, rawset local char, byte, format, gsub, concat, match, sub, gmatch = string.char, string.byte, string.format, string.gsub, table.concat, string.match, string.sub, string.gmatch local utfchar, utfbyte, utfvalues = utf.char, utf.byte, utf.values local sind, cosd, max, min = math.sind, math.cosd, math.max, math.min local sort, sortedhash = table.sort, table.sortedhash local P, C, R, S, Cc, Cs, V = lpeg.P, lpeg.C, lpeg.R, lpeg.S, lpeg.Cc, lpeg.Cs, lpeg.V local lpegmatch, lpegpatterns = lpeg.match, lpeg.patterns local formatters = string.formatters local isboolean = string.is_boolean local rshift = bit32.rshift local report_objects= logs.reporter("backend","objects") local report_finalizing = logs.reporter("backend","finalizing") local report_blocked= logs.reporter("backend","blocked") local implement = interfaces.implement local context = context -- In ConTeXt MkIV we use utf8 exclusively so all strings get mapped onto a hex -- encoded utf16 string type between <>. We could probably save some bytes by using -- strings between () but then we end up with escaped ()\ too. pdf = type(pdf) == "table" and pdf or { } local factor= number.dimenfactors.bp local codeinjections= { } local nodeinjections= { } local backends = backends local pdfbackend= { comment= "backend for directly generating pdf output", nodeinjections = nodeinjections, codeinjections = codeinjections, registrations = { }, tables = { }, } backends.pdf = pdfbackend lpdf = lpdf or { } local lpdf = lpdf lpdf.flags = lpdf.flags or { } -- will be filled later table.setmetatableindex(lpdf, function(t,k) report_blocked("function %a is not accessible",k) os.exit() end) local trace_finalizers = false trackers.register("backend.finalizers", function(v) trace_finalizers = v end) local trace_resources = false trackers.register("backend.resources", function(v) trace_resources = v end) local pdfreserveobject local pdfimmediateobject updaters.register("backend.update.lpdf",function() pdfreserveobject= lpdf.reserveobject pdfimmediateobject = lpdf.immediateobject end) do updaters.register("backend.update.lpdf",function() job.positions.registerhandlers { getpos = drivers.getpos, getrpos = drivers.getrpos, gethpos = drivers.gethpos, getvpos = drivers.getvpos, } lpdf.getpos = drivers.getpos end) local pdfgetmatrix, pdfhasmatrix, pdfgetpos updaters.register("backend.update.lpdf",function() pdfgetmatrix = lpdf.getmatrix pdfhasmatrix = lpdf.hasmatrix pdfgetpos= lpdf.getpos end) -- local function transform(llx,lly,urx,ury,rx,sx,sy,ry) -- local x1 = llx * rx + lly * sy -- local y1 = llx * sx + lly * ry -- local x2 = llx * rx + ury * sy -- local y2 = llx * sx + ury * ry -- local x3 = urx * rx + lly * sy -- local y3 = urx * sx + lly * ry -- local x4 = urx * rx + ury * sy -- local y4 = urx * sx + ury * ry -- llx = min(x1,x2,x3,x4); -- lly = min(y1,y2,y3,y4); -- urx = max(x1,x2,x3,x4); -- ury = max(y1,y2,y3,y4); -- return llx, lly, urx, ury -- end -- -- function lpdf.transform(llx,lly,urx,ury) -- not yet used so unchecked -- if