You should realize that the text of the note is a completely separate data 
element from the highlighted text. The highlighted text is not part of the 
note, it is just te text that happens to lie behind the highlight in the PDF. 
We just set the text of the note to the text you highlight by default, and we 
already do some cleaning, including trying to handle line-breaks, before we set 
the text. And you can set it to whatever you want. So there is no way to relate 
the geometry of the highlight in any way to the text, as there does not exist a 
relation.

Christiaan

> On 14 Mar 2022, at 12:27, Mark Roberts <mroberts1...@gmail.com> wrote:
> 
> Hi,
> 
> This is very helpful — thanks !!
> 
> I just tried your suggestion and got an XML file as expected. I more or less 
> understand all the elements of the XML, but it seems the entire note is in a 
> <string> element, while the quadrilateralPoints for the highlighting boxes 
> are separate.
> 
> What I was hoping to do is somehow get each line of my note and then look for 
> a hyphen at the end of each line, and then trim that hyphen, as necessary. 
> The objective is to try and clean up the skim note to eliminate line-break 
> hyphens in the source text.
> 
> Any ideas about how I could do this?
> 
> Thanks again,
> 
> M.
> 
> On Mon, Mar 14, 2022 at 7:27 PM Christiaan Hofman <cmhof...@gmail.com 
> <mailto:cmhof...@gmail.com>> wrote:
> 
> 
>> On 14 Mar 2022, at 11:13, Christiaan Hofman <cmhof...@gmail.com 
>> <mailto:cmhof...@gmail.com>> wrote:
>> 
>> 
>> 
>>> On 14 Mar 2022, at 10:56, Christiaan Hofman <cmhof...@gmail.com 
>>> <mailto:cmhof...@gmail.com>> wrote:
>>> 
>>> 
>>> 
>>>> On 14 Mar 2022, at 10:50, Christiaan Hofman <cmhof...@gmail.com 
>>>> <mailto:cmhof...@gmail.com>> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On 14 Mar 2022, at 04:49, Mark Roberts <mroberts1...@gmail.com 
>>>>> <mailto:mroberts1...@gmail.com>> wrote:
>>>>> 
>>>>> Is there some way to get more detailed information about skim notes, 
>>>>> i.e., other than the code framework?
>>>>> 
>>>>> I have tried the skimnotes command line tool (e.g., the 'get' and 
>>>>> 'format' commands), but it seems to only output the basic information 
>>>>> about notes, such as the note type, page number, and note text.
>>>>> 
>>>>> Perhaps(?) there's another mode for the skimnotes tool, but I couldn't 
>>>>> find it from reading the documentation.
>>>>> 
>>>>> I'd like to get more complete data on each note, such as a timestamp, the 
>>>>> coordinates of the boxes that are highlighted in the PDF file, the 
>>>>> highlight color, and the text contained in each box.
>>>>> 
>>>>> I assume(?) this data is in the notes file, but the skimnotes app ignores 
>>>>> it for now.
>>>>> 
>>>>> I'm wondering about this because if possible I'd like to make a script 
>>>>> that gathers my notes for a PDF file, and tries to fix words that were 
>>>>> broken by hyphenation in the original PDF. If I can get the highlight 
>>>>> boxes in the notes file, and the text in each box, then it should be 
>>>>> possible to check for a hyphen character at the end of each line, and 
>>>>> then stitch together the words that were split across lines.
>>>>> 
>>>>> Any suggestions?
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> M.
>>>> 
>>>> The skimnotes tool is not a tool that can interpret the data. It only 
>>>> copies the data around to various locations that are supported (such as 
>>>> between extended attributes, .skim files, or within a .pdfd bundle). There 
>>>> is no tool to interpret he data. The Wiki has information about how the 
>>>> data is formatted. You could try to build your own tool to unarchive the 
>>>> data from that, but that would be quite a bit of work.
>>>> 
>>>> Christiaan
>>>> 
>>> 
>>> 
>>> I can also note that in the near future the skim notes will be saved in a 
>>> plist format, which can be read by various tools and apps, including 
>>> AppleScript. You can already have Skim do that by activating a hidden 
>>> preference, see the Wiki for details. 
>>> 
>>> Christiaan
>>> 
>> 
>> 
>> I just remembered that the skimnotes tool *can* convert to the plist format, 
>> which you may be able to read, using the ’skimnotes format’ command.' 
>> skimnotes format plist SKIM_FILE' can do that. The help for skimnotes does 
>> not say so, but you can immediately also get the skim notes plist format 
>> from the skimnotes tool as follows:
>> 
>> skimnotes get plist PDF_FILE SKIM_FILE
>> 
>> This will get you a plist file in SKIM_FILE. Perhaps for other tools to read 
>> it you have to change the extension to .plist. You could also then pass it 
>> through plutil to convert the binary plist to xml plist (plutil -convert 
>> xml1 PLIST_FILE), which would even be human readable. You could combine that 
>> to get the skimnotes in xml format as follows:
>> 
>> skimnotes get plist PDF_FILE - | plutil -format xml1 -o PLIST_FILE -
>> 
>> Christiaan
>> 
> 
> 
> Small correction, I messed up ‘-format’ arguments to the commands. It should 
> be added in skimnotes, and in plutil it is -convert:
> 
> skimnotes get -format plist PDF_FILE SKIM_FILE
> 
> plutil -convert xml1 PLIST_FILE
> 
> skimnotes get -format plist PDF_FILE - | plutil -convert xml1 -o PLIST_FILE -
> 
> If you want to go to the reverse, and write the xml plist data as skim notes, 
> you could do:
> 
> plutil -convert binary1 -o - PLIST_FILE | skimnotes set PDF_FILE -
> 
> Christiaan

_______________________________________________
Skim-app-users mailing list
Skim-app-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/skim-app-users

Reply via email to