Re: [Skim-app-users] Skimnotes app and Unicode

Mark Roberts Fri, 12 Apr 2024 19:29:55 -0700

Thanks for your messages.

Carsten, there's a public API in NSString for this:


https://developer.apple.com/documentation/foundation/nsstring/1412645-precomposedstringwithcanonicalma#normalizing-strings

For all other PDF readers, regardless of the engine they use, I would
imagine it's just a simple matter of using this API to normalize the string
from the PDF file, before sending it to the clipboard.

Looks like it can be done with a single function call. Nothing undocumented
or magic here.

Christiaan, it would be very helpful if there were an option in skimnotes
to use this API in NSString, when saving the notes file (plist, xml, etc.).

I looked at Acrobat Preflight, but it doesn't provide any way to normalize
the source PDF. :/

If you google this, you'll find many people dealing with the headache of
non-canonical unicode (e.g., compare doesn't work).

Adding a way to normalize the output of skimnotes would make the headache
go away :)

Thanks again,

M.

On Fri, Apr 12, 2024 at 11:05 PM Christiaan Hofman <[email protected]>
wrote:

> Note that Skim puts the data in a separate data structure, while other
> viewers embed the notes in the PDF data. So I guess other readers
> implicitly normalize the unicode when they write not to the Pdf. We don;’t
> do anything to the strings, so it will be the way it is entered.
>
> Christiaan
>
> On 12 Apr 2024, at 14:16, Mark Roberts <[email protected]> wrote:
>
> Hi,
>
> Thanks for your message. Let me try to ask this question a little
> differently.
>
> Regarding Unicode, it appears that Skim behaves unlike every other PDF
> reader I've got, e.g., Acrobat, PDF Expert, Foxit Reader, and Preview.
>
> If I copy some text from a PDF file that uses Unicode, all of these PDF
> readers will perform Unicode normalization, while Skim does not.
>
> For example, I copy the string "shū" from a PDF using any of the other
> reader apps, and the clipboard contains 7368c5ab.
>
> If I try this with Skim, though, the clipboard contains 736875cc84. You
> can verify this using the command line, e.g., *echo -n "shū" | xxd -p*
>
> Similarly, the output via skimnotes is not being normalized. So, it seems
> that other PDF reader apps are doing normalization, but Skim does not.
>
> When you say "any normalization should happen before the data was created"
> — in fact this is not how all other PDF reader apps that I've got seem to
> work.
>
> Would you consider adding an option for *skimnotes* to behave like other
> PDF apps w.r.t. Unicode?
>
> Thanks,
>
> M.
>
> On Fri, Apr 12, 2024 at 6:21 PM Christiaan Hofman <[email protected]>
> wrote:
>
>> That would be useless. SkimNotes does not process the data, it just
>> copies it between different locations. Also passing the file through
>> conversion does not work, as none of the formats involved are unicode text
>> files. The point is that the strings included as part off the data may
>> represent strings in some encoding in some form. So any normalization
>> should happen before the data was created (which is not an option), or by
>> parsing the plist, normalizing all strings in it, and reassemble the plist.
>>
>> Christiaan
>>
>> On 12 Apr 2024, at 04:48, Mark Roberts <[email protected]> wrote:
>>
>> Hi,
>>
>> Thanks for clarifying.
>>
>> I looked for a tool to do this, but I haven't found anything.
>>
>> Some people suggest running a text file through *oconv*, but that seems
>> to be just a brute force approach to patch specific characters.
>>
>> What would you think about an option to *skimnotes* that invokes
>> precomposedStringWithCanonicalMapping() or whatever the appropriate
>> function is?
>>
>> Thanks !
>>
>> On Thu, Apr 11, 2024 at 11:18 PM Christiaan Hofman <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On 11 Apr 2024, at 13:31, Mark Roberts <[email protected]> wrote:
>>>
>>> I've been using the *skimnotes* command line app to "get" skim notes as
>>> a plist, and to then convert them to XML with plutil.
>>>
>>> One thing I've discovered is that notes in Unicode may not be normalized.
>>>
>>> Some apps can handle this, but some cannot.
>>>
>>> Question: is there a way to get skimnotes to normalize the Unicode, or
>>> could you suggest an app I can use in a pipe with plutil... ?
>>>
>>> Thanks !
>>>
>>>
>>> I don’t think there exists a tool to normalize unicode strings in a
>>> binary plist. I am pretty sure that any tool that may exist to convert a
>>> binary plist to an XML plist goes through the same system code from Apple,
>>> so it will all do the same thing. Perhaps there is a tool to post-process
>>> the XML to normalize any strings as Unicode, but I could not help you there
>>> either.
>>>
>>> Christiaan
>>>
>>
> _______________________________________________
> Skim-app-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/skim-app-users
>

_______________________________________________
Skim-app-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/skim-app-users

Re: [Skim-app-users] Skimnotes app and Unicode

Reply via email to