Thanks for clarifying. I guess my question remains: how can I fix up these hyphenated lines in my notes? I can parse and process the XML output from skimnotes, but it seems there isn't enough data to identify lines.
The issue is that full-text search of the notes won't work if words are broken up with hyphens. Whatever Skim is doing to handle line breaks isn't working for me — I still see words broken up by hyphens everywhere. Any ideas? Thanks, M. On Mon, Mar 14, 2022 at 11:53 PM Christiaan Hofman <cmhof...@gmail.com> wrote: > You should realize that the text of the note is a completely separate data > element from the highlighted text. The highlighted text is not part of the > note, it is just te text that happens to lie behind the highlight in the > PDF. We just set the text of the note to the text you highlight by default, > and we already do some cleaning, including trying to handle line-breaks, > before we set the text. And you can set it to whatever you want. So there > is no way to relate the geometry of the highlight in any way to the text, > as there does not exist a relation. > > Christiaan > > On 14 Mar 2022, at 12:27, Mark Roberts <mroberts1...@gmail.com> wrote: > > Hi, > > This is very helpful — thanks !! > > I just tried your suggestion and got an XML file as expected. I more or > less understand all the elements of the XML, but it seems the entire note > is in a <string> element, while the quadrilateralPoints for the > highlighting boxes are separate. > > What I was hoping to do is somehow get each line of my note and then look > for a hyphen at the end of each line, and then trim that hyphen, as > necessary. The objective is to try and clean up the skim note to eliminate > line-break hyphens in the source text. > > Any ideas about how I could do this? > > Thanks again, > > M. > > On Mon, Mar 14, 2022 at 7:27 PM Christiaan Hofman <cmhof...@gmail.com> > wrote: > >> >> >> On 14 Mar 2022, at 11:13, Christiaan Hofman <cmhof...@gmail.com> wrote: >> >> >> >> On 14 Mar 2022, at 10:56, Christiaan Hofman <cmhof...@gmail.com> wrote: >> >> >> >> On 14 Mar 2022, at 10:50, Christiaan Hofman <cmhof...@gmail.com> wrote: >> >> >> >> On 14 Mar 2022, at 04:49, Mark Roberts <mroberts1...@gmail.com> wrote: >> >> Is there some way to get more detailed information about skim notes, >> i.e., other than the code framework? >> >> I have tried the skimnotes command line tool (e.g., the 'get' and >> 'format' commands), but it seems to only output the basic information about >> notes, such as the note type, page number, and note text. >> >> Perhaps(?) there's another mode for the skimnotes tool, but I couldn't >> find it from reading the documentation. >> >> I'd like to get more complete data on each note, such as a timestamp, the >> coordinates of the boxes that are highlighted in the PDF file, the >> highlight color, and the text contained in each box. >> >> I assume(?) this data is in the notes file, but the skimnotes app ignores >> it for now. >> >> I'm wondering about this because if possible I'd like to make a script >> that gathers my notes for a PDF file, and tries to fix words that were >> broken by hyphenation in the original PDF. If I can get the highlight boxes >> in the notes file, and the text in each box, then it should be possible to >> check for a hyphen character at the end of each line, and then stitch >> together the words that were split across lines. >> >> Any suggestions? >> >> Thanks in advance, >> >> M. >> >> >> The skimnotes tool is not a tool that can interpret the data. It only >> copies the data around to various locations that are supported (such as >> between extended attributes, .skim files, or within a .pdfd bundle). There >> is no tool to interpret he data. The Wiki has information about how the >> data is formatted. You could try to build your own tool to unarchive the >> data from that, but that would be quite a bit of work. >> >> Christiaan >> >> >> I can also note that in the near future the skim notes will be saved in a >> plist format, which can be read by various tools and apps, including >> AppleScript. You can already have Skim do that by activating a hidden >> preference, see the Wiki for details. >> >> Christiaan >> >> >> I just remembered that the skimnotes tool *can* convert to the plist >> format, which you may be able to read, using the ’skimnotes format’ >> command.' skimnotes format plist SKIM_FILE' can do that. The help for >> skimnotes does not say so, but you can immediately also get the skim notes >> plist format from the skimnotes tool as follows: >> >> skimnotes get plist PDF_FILE SKIM_FILE >> >> This will get you a plist file in SKIM_FILE. Perhaps for other tools to >> read it you have to change the extension to .plist. You could also then >> pass it through plutil to convert the binary plist to xml plist (plutil >> -convert xml1 PLIST_FILE), which would even be human readable. You could >> combine that to get the skimnotes in xml format as follows: >> >> skimnotes get plist PDF_FILE - | plutil -format xml1 -o PLIST_FILE - >> >> Christiaan >> >> >> Small correction, I messed up ‘-format’ arguments to the commands. It >> should be added in skimnotes, and in plutil it is -convert: >> >> skimnotes get -format plist PDF_FILE SKIM_FILE >> >> plutil -convert xml1 PLIST_FILE >> >> skimnotes get -format plist PDF_FILE - | plutil -convert xml1 -o >> PLIST_FILE - >> >> If you want to go to the reverse, and write the xml plist data as skim >> notes, you could do: >> >> plutil -convert binary1 -o - PLIST_FILE | skimnotes set PDF_FILE - >> >> Christiaan >> > > _______________________________________________ > Skim-app-users mailing list > Skim-app-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/skim-app-users >
_______________________________________________ Skim-app-users mailing list Skim-app-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/skim-app-users