Note that Skim puts the data in a separate data structure, while other viewers embed the notes in the PDF data. So I guess other readers implicitly normalize the unicode when they write not to the Pdf. We don;’t do anything to the strings, so it will be the way it is entered.
Christiaan > On 12 Apr 2024, at 14:16, Mark Roberts <[email protected]> wrote: > > Hi, > > Thanks for your message. Let me try to ask this question a little differently. > > Regarding Unicode, it appears that Skim behaves unlike every other PDF reader > I've got, e.g., Acrobat, PDF Expert, Foxit Reader, and Preview. > > If I copy some text from a PDF file that uses Unicode, all of these PDF > readers will perform Unicode normalization, while Skim does not. > > For example, I copy the string "shū" from a PDF using any of the other reader > apps, and the clipboard contains 7368c5ab. > > If I try this with Skim, though, the clipboard contains 736875cc84. You can > verify this using the command line, e.g., echo -n "shū" | xxd -p > > Similarly, the output via skimnotes is not being normalized. So, it seems > that other PDF reader apps are doing normalization, but Skim does not. > > When you say "any normalization should happen before the data was created" — > in fact this is not how all other PDF reader apps that I've got seem to work. > > Would you consider adding an option for skimnotes to behave like other PDF > apps w.r.t. Unicode? > > Thanks, > > M. > > On Fri, Apr 12, 2024 at 6:21 PM Christiaan Hofman <[email protected] > <mailto:[email protected]>> wrote: > That would be useless. SkimNotes does not process the data, it just copies it > between different locations. Also passing the file through conversion does > not work, as none of the formats involved are unicode text files. The point > is that the strings included as part off the data may represent strings in > some encoding in some form. So any normalization should happen before the > data was created (which is not an option), or by parsing the plist, > normalizing all strings in it, and reassemble the plist. > > Christiaan > >> On 12 Apr 2024, at 04:48, Mark Roberts <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi, >> >> Thanks for clarifying. >> >> I looked for a tool to do this, but I haven't found anything. >> >> Some people suggest running a text file through oconv, but that seems to be >> just a brute force approach to patch specific characters. >> >> What would you think about an option to skimnotes that invokes >> precomposedStringWithCanonicalMapping() or whatever the appropriate function >> is? >> >> Thanks ! >> >> On Thu, Apr 11, 2024 at 11:18 PM Christiaan Hofman <[email protected] >> <mailto:[email protected]>> wrote: >> >> >>> On 11 Apr 2024, at 13:31, Mark Roberts <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> I've been using the skimnotes command line app to "get" skim notes as a >>> plist, and to then convert them to XML with plutil. >>> >>> One thing I've discovered is that notes in Unicode may not be normalized. >>> >>> Some apps can handle this, but some cannot. >>> >>> Question: is there a way to get skimnotes to normalize the Unicode, or >>> could you suggest an app I can use in a pipe with plutil... ? >>> >>> Thanks ! >> >> I don’t think there exists a tool to normalize unicode strings in a binary >> plist. I am pretty sure that any tool that may exist to convert a binary >> plist to an XML plist goes through the same system code from Apple, so it >> will all do the same thing. Perhaps there is a tool to post-process the XML >> to normalize any strings as Unicode, but I could not help you there either. >> >> Christiaan
_______________________________________________ Skim-app-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/skim-app-users
