Re: [Skim-app-users] Question about accessing Skim Notes

Mark Roberts Tue, 15 Mar 2022 00:56:16 -0700

Thanks for clarifying.

I guess my question remains: how can I fix up these hyphenated lines in my
notes? I can parse and process the XML output from skimnotes, but it seems
there isn't enough data to identify lines.


The issue is that full-text search of the notes won't work if words are
broken up with hyphens.

Whatever Skim is doing to handle line breaks isn't working for me — I still
see words broken up by hyphens everywhere.

Any ideas?

Thanks,

M.

On Mon, Mar 14, 2022 at 11:53 PM Christiaan Hofman <cmhof...@gmail.com>
wrote:

> You should realize that the text of the note is a completely separate data
> element from the highlighted text. The highlighted text is not part of the
> note, it is just te text that happens to lie behind the highlight in the
> PDF. We just set the text of the note to the text you highlight by default,
> and we already do some cleaning, including trying to handle line-breaks,
> before we set the text. And you can set it to whatever you want. So there
> is no way to relate the geometry of the highlight in any way to the text,
> as there does not exist a relation.
>
> Christiaan
>
> On 14 Mar 2022, at 12:27, Mark Roberts <mroberts1...@gmail.com> wrote:
>
> Hi,
>
> This is very helpful — thanks !!
>
> I just tried your suggestion and got an XML file as expected. I more or
> less understand all the elements of the XML, but it seems the entire note
> is in a <string> element, while the quadrilateralPoints for the
> highlighting boxes are separate.
>
> What I was hoping to do is somehow get each line of my note and then look
> for a hyphen at the end of each line, and then trim that hyphen, as
> necessary. The objective is to try and clean up the skim note to eliminate
> line-break hyphens in the source text.
>
> Any ideas about how I could do this?
>
> Thanks again,
>
> M.
>
> On Mon, Mar 14, 2022 at 7:27 PM Christiaan Hofman <cmhof...@gmail.com>
> wrote:
>
>>
>>
>> On 14 Mar 2022, at 11:13, Christiaan Hofman <cmhof...@gmail.com> wrote:
>>
>>
>>
>> On 14 Mar 2022, at 10:56, Christiaan Hofman <cmhof...@gmail.com> wrote:
>>
>>
>>
>> On 14 Mar 2022, at 10:50, Christiaan Hofman <cmhof...@gmail.com> wrote:
>>
>>
>>
>> On 14 Mar 2022, at 04:49, Mark Roberts <mroberts1...@gmail.com> wrote:
>>
>> Is there some way to get more detailed information about skim notes,
>> i.e., other than the code framework?
>>
>> I have tried the skimnotes command line tool (e.g., the 'get' and
>> 'format' commands), but it seems to only output the basic information about
>> notes, such as the note type, page number, and note text.
>>
>> Perhaps(?) there's another mode for the skimnotes tool, but I couldn't
>> find it from reading the documentation.
>>
>> I'd like to get more complete data on each note, such as a timestamp, the
>> coordinates of the boxes that are highlighted in the PDF file, the
>> highlight color, and the text contained in each box.
>>
>> I assume(?) this data is in the notes file, but the skimnotes app ignores
>> it for now.
>>
>> I'm wondering about this because if possible I'd like to make a script
>> that gathers my notes for a PDF file, and tries to fix words that were
>> broken by hyphenation in the original PDF. If I can get the highlight boxes
>> in the notes file, and the text in each box, then it should be possible to
>> check for a hyphen character at the end of each line, and then stitch
>> together the words that were split across lines.
>>
>> Any suggestions?
>>
>> Thanks in advance,
>>
>> M.
>>
>>
>> The skimnotes tool is not a tool that can interpret the data. It only
>> copies the data around to various locations that are supported (such as
>> between extended attributes, .skim files, or within a .pdfd bundle). There
>> is no tool to interpret he data. The Wiki has information about how the
>> data is formatted. You could try to build your own tool to unarchive the
>> data from that, but that would be quite a bit of work.
>>
>> Christiaan
>>
>>
>> I can also note that in the near future the skim notes will be saved in a
>> plist format, which can be read by various tools and apps, including
>> AppleScript. You can already have Skim do that by activating a hidden
>> preference, see the Wiki for details.
>>
>> Christiaan
>>
>>
>> I just remembered that the skimnotes tool *can* convert to the plist
>> format, which you may be able to read, using the ’skimnotes format’
>> command.' skimnotes format plist SKIM_FILE' can do that. The help for
>> skimnotes does not say so, but you can immediately also get the skim notes
>> plist format from the skimnotes tool as follows:
>>
>> skimnotes get plist PDF_FILE SKIM_FILE
>>
>> This will get you a plist file in SKIM_FILE. Perhaps for other tools to
>> read it you have to change the extension to .plist. You could also then
>> pass it through plutil to convert the binary plist to xml plist (plutil
>> -convert xml1 PLIST_FILE), which would even be human readable. You could
>> combine that to get the skimnotes in xml format as follows:
>>
>> skimnotes get plist PDF_FILE - | plutil -format xml1 -o PLIST_FILE -
>>
>> Christiaan
>>
>>
>> Small correction, I messed up ‘-format’ arguments to the commands. It
>> should be added in skimnotes, and in plutil it is -convert:
>>
>> skimnotes get -format plist PDF_FILE SKIM_FILE
>>
>> plutil -convert xml1 PLIST_FILE
>>
>> skimnotes get -format plist PDF_FILE - | plutil -convert xml1 -o
>> PLIST_FILE -
>>
>> If you want to go to the reverse, and write the xml plist data as skim
>> notes, you could do:
>>
>> plutil -convert binary1 -o - PLIST_FILE | skimnotes set PDF_FILE -
>>
>> Christiaan
>>
>
> _______________________________________________
> Skim-app-users mailing list
> Skim-app-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/skim-app-users
>

_______________________________________________
Skim-app-users mailing list
Skim-app-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/skim-app-users

Re: [Skim-app-users] Question about accessing Skim Notes

Reply via email to