Re: [NTG-context] actualtext and encoding

2009-12-07 Thread Wolfgang Schuster
Am 07.12.2009 um 08:52 schrieb Taco Hoekwater: Is /ActualText supposed to be in PDFDoc Encoding? No, you could also use Unicode Encoding. Wolfgang ___ If your question is of interest to others as well, please add

Re: [NTG-context] actualtext and encoding

2009-12-07 Thread Hans Hagen
Taco Hoekwater wrote: Wolfgang Schuster wrote: Hi Hans, you showed a while ago how the actualtext function of pdf works and i have a module where i would use it but letters outside of ascii appear wrong when i copy the text \starttext text \pdfliteral{/Span /ActualText (Müller)

Re: [NTG-context] actualtext and encoding

2009-12-07 Thread Wolfgang Schuster
Am 07.12.2009 um 09:46 schrieb Hans Hagen: \def\pdfactualtext#1#2% {\pdfliteral direct{/Span /ActualText \ctxlua{tex.write(lpdf.tosixteen(#2))} BDC}#1\pdfliteral direct{EMC}} \starttext text \pdfactualtext{Meier}{Müller} text \stoptext Perfect, will this end in the core? Regards,

Re: [NTG-context] actualtext and encoding

2009-12-07 Thread Hans Hagen
Wolfgang Schuster wrote: Am 07.12.2009 um 09:46 schrieb Hans Hagen: \def\pdfactualtext#1#2% {\pdfliteral direct{/Span /ActualText \ctxlua{tex.write(lpdf.tosixteen(#2))} BDC}#1\pdfliteral direct{EMC}} \starttext text \pdfactualtext{Meier}{Müller} text \stoptext Perfect, will this end

Re: [NTG-context] actualtext and encoding

2009-12-07 Thread Wolfgang Schuster
Am 07.12.2009 um 10:11 schrieb Hans Hagen: hm, doesn't that kind of functionality demands a bit more 'thinking'? what exactly is needed? how does it relate to linebreaks? other content? etc .. actually, such a mechanism should be implemented a bit differently (maybe attributes and delayed

Re: [NTG-context] actualtext and encoding

2009-12-07 Thread Hans Hagen
Wolfgang Schuster wrote: Am 07.12.2009 um 10:11 schrieb Hans Hagen: hm, doesn't that kind of functionality demands a bit more 'thinking'? what exactly is needed? how does it relate to linebreaks? other content? etc .. actually, such a mechanism should be implemented a bit differently (maybe

Re: [NTG-context] actualtext and encoding

2009-12-07 Thread Wolfgang Schuster
Am 07.12.2009 um 11:21 schrieb Hans Hagen: detail ... \def\ruby#1#2% {\dontleavehmode\bgroup \setbox\scratchboxone\hbox{#1}% \setbox\scratchboxtwo\hbox{#2}% \scratchdimen\wd\ifdim\wd\scratchboxone\wd\scratchboxtwo\scratchboxone\else\scratchboxtwo\fi \setbox\scratchbox\vbox

[NTG-context] actualtext and encoding

2009-12-06 Thread Wolfgang Schuster
Hi Hans, you showed a while ago how the actualtext function of pdf works and i have a module where i would use it but letters outside of ascii appear wrong when i copy the text \starttext text \pdfliteral{/Span /ActualText (Müller) BDC}Meier\pdfliteral{EMC} text \stoptext becomes text Müller

Re: [NTG-context] actualtext and encoding

2009-12-06 Thread Taco Hoekwater
Wolfgang Schuster wrote: Hi Hans, you showed a while ago how the actualtext function of pdf works and i have a module where i would use it but letters outside of ascii appear wrong when i copy the text \starttext text \pdfliteral{/Span /ActualText (Müller) BDC}Meier\pdfliteral{EMC}

Re: [NTG-context] ActualText

2009-09-19 Thread Hans Hagen
Barry Schwartz wrote: Please tell me this isn't in a FAQ. :) Is there support for ActualText tags so that searching and extraction will work with OpenType fonts and Unicode? If so, do discretionary hyphens get treated as 00AD instead of 002D? can you explain in mode detail what you mean with

Re: [NTG-context] ActualText

2009-09-19 Thread Arthur Reutenauer
can you explain in mode detail what you mean with 'actual text tags' ? He means ActualText tags :-) See the PDF spec section 14.9.4, page 623. It's a more generic way to support searching than ToUnicode vectors: you just specify the actual string of underlying Unicode characters. The PDF

Re: [NTG-context] ActualText

2009-09-19 Thread Wolfgang Schuster
Am 19.09.2009 um 19:10 schrieb Arthur Reutenauer: Anyway, this needs support at the engine level and I don't think there is; actually it would be nice to add that to LuaTeX. Heiko Oberdiek wrote the accsupp package to use ActualText in LaTeX, why shouldn't it be then possible to use it in

Re: [NTG-context] ActualText

2009-09-19 Thread Barry Schwartz
Arthur Reutenauer arthur.reutena...@normalesup.org skribis: He means ActualText tags :-) See the PDF spec section 14.9.4, page 623. It's a more generic way to support searching than ToUnicode vectors: you just specify the actual string of underlying Unicode characters. The PDF spec uses

Re: [NTG-context] ActualText

2009-09-19 Thread Arthur Reutenauer
Heiko Oberdiek wrote the accsupp package to use ActualText in LaTeX, why shouldn't it be then possible to use it in LuaTeX (and ConTeXt)? Right, you don't need additional engine support, you can use \pdfliteral in pdfTeX, and in LuaTeX as well. Heiko's package should be quite easy to port to

Re: [NTG-context] ActualText

2009-09-19 Thread Hans Hagen
Arthur Reutenauer wrote: can you explain in mode detail what you mean with 'actual text tags' ? He means ActualText tags :-) See the PDF spec section 14.9.4, page 623. It's a more generic way to support searching than ToUnicode vectors: you just specify the actual string of underlying

Re: [NTG-context] ActualText

2009-09-19 Thread Hans Hagen
Barry Schwartz wrote: Also, I noticed when playing around with the examples from the Th ligature discussion that searching and extraction didn't work with small caps, though it did work with the ligature. With ActualText tags hm, mkiv has an analyser for names-unicode and afaik small caps

Re: [NTG-context] ActualText

2009-09-19 Thread Barry Schwartz
Hans Hagen pra...@wxs.nl skribis: put an ActualText tag on anything that happens not to match what you would get from the ToUnicode mapping. hm, if one knows the character (say c) then why not adapt the tounicode vector The same glyph could correspond to different Unicode in the source.

Re: [NTG-context] ActualText

2009-09-19 Thread Barry Schwartz
Barry Schwartz chemoelect...@chemoelectric.org skribis: In practice what I see with my method is that discretionary hyphens always get an ActualText, and if the font is older and has names like Asmall or ffl (which I don't bother handling specially) then the substituted stuff gets an

[NTG-context] ActualText

2009-09-18 Thread Barry Schwartz
Please tell me this isn't in a FAQ. :) Is there support for ActualText tags so that searching and extraction will work with OpenType fonts and Unicode? If so, do discretionary hyphens get treated as 00AD instead of 002D?