Note that Skim puts the data in a separate data structure, while other viewers 
embed the notes in the PDF data. So I guess other readers implicitly normalize 
the unicode when they write not to the Pdf. We don;’t do anything to the 
strings, so it will be the way it is entered.

Christiaan

> On 12 Apr 2024, at 14:16, Mark Roberts <[email protected]> wrote:
> 
> Hi,
> 
> Thanks for your message. Let me try to ask this question a little differently.
> 
> Regarding Unicode, it appears that Skim behaves unlike every other PDF reader 
> I've got, e.g., Acrobat, PDF Expert, Foxit Reader, and Preview.
> 
> If I copy some text from a PDF file that uses Unicode, all of these PDF 
> readers will perform Unicode normalization, while Skim does not.
> 
> For example, I copy the string "shū" from a PDF using any of the other reader 
> apps, and the clipboard contains 7368c5ab.
> 
> If I try this with Skim, though, the clipboard contains 736875cc84. You can 
> verify this using the command line, e.g., echo -n "shū" | xxd -p
> 
> Similarly, the output via skimnotes is not being normalized. So, it seems 
> that other PDF reader apps are doing normalization, but Skim does not.
> 
> When you say "any normalization should happen before the data was created" — 
> in fact this is not how all other PDF reader apps that I've got seem to work.
> 
> Would you consider adding an option for skimnotes to behave like other PDF 
> apps w.r.t. Unicode?
> 
> Thanks,
> 
> M.
> 
> On Fri, Apr 12, 2024 at 6:21 PM Christiaan Hofman <[email protected] 
> <mailto:[email protected]>> wrote:
> That would be useless. SkimNotes does not process the data, it just copies it 
> between different locations. Also passing the file through conversion does 
> not work, as none of the formats involved are unicode text files. The point 
> is that the strings included as part off the data may represent strings in 
> some encoding in some form. So any normalization should happen before the 
> data was created (which is not an option), or by parsing the plist, 
> normalizing all strings in it, and reassemble the plist.
> 
> Christiaan
> 
>> On 12 Apr 2024, at 04:48, Mark Roberts <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi,
>> 
>> Thanks for clarifying.
>> 
>> I looked for a tool to do this, but I haven't found anything.
>> 
>> Some people suggest running a text file through oconv, but that seems to be 
>> just a brute force approach to patch specific characters.
>> 
>> What would you think about an option to skimnotes that invokes 
>> precomposedStringWithCanonicalMapping() or whatever the appropriate function 
>> is?
>> 
>> Thanks !
>> 
>> On Thu, Apr 11, 2024 at 11:18 PM Christiaan Hofman <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>>> On 11 Apr 2024, at 13:31, Mark Roberts <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> I've been using the skimnotes command line app to "get" skim notes as a 
>>> plist, and to then convert them to XML with plutil.
>>> 
>>> One thing I've discovered is that notes in Unicode may not be normalized.
>>> 
>>> Some apps can handle this, but some cannot.
>>> 
>>> Question: is there a way to get skimnotes to normalize the Unicode, or 
>>> could you suggest an app I can use in a pipe with plutil... ?
>>> 
>>> Thanks !
>> 
>> I don’t think there exists a tool to normalize unicode strings in a binary 
>> plist. I am pretty sure that any tool that may exist to convert a binary 
>> plist to an XML plist goes through the same system code from Apple, so it 
>> will all do the same thing. Perhaps there is a tool to post-process the XML 
>> to normalize any strings as Unicode, but I could not help you there either.
>> 
>> Christiaan

_______________________________________________
Skim-app-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/skim-app-users

Reply via email to