On 25 Dec 2008, at 3:39 PM, Hydro Meteor wrote:

Hello all --

I noticed that Skim, besides saving PDF documents, saves annotations in Mac OS X file system metadata (extended attributes). This clearly shows up in Leopard (not sure about Tiger).

Here's an "untouched" PDF before annotations were added with Skim:

-rw-r--r--   1 hydro  staff    24733 Dec 25 12:06 control.pdf

and after annotations were added:

-rw-r--r--@  1 hydro  staff    24733 Dec 25 12:10 control.pdf

the xattr command-line tool (Leopard only?) reveals the extended attributes in what appears to be structured "chunks" where each chunk has a header of some sort:

$ xattr -l control.pdf

com.apple.FinderInfo:
0000 50 44 46 20 00 00 00 00 00 00 00 00 00 00 00 00 PDF ............ 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

...

net_sourceforge_skim-app_699F2F81-6C76-4909- A962-97BE5ABF2E8E-29358-0001A72CDDAB061A-0: 0000 42 5A 68 35 31 41 59 26 53 59 48 43 37 D6 00 0B BZh51AY&SYHC7...

...

net_sourceforge_skim-app_699F2F81-6C76-4909- A962-97BE5ABF2E8E-29358-0001A72CDDAB061A-1: 0000 C2 3E 18 94 A0 60 AA 78 9F 3C 5A 17 DC B0 B3 A0 .>...`.x.<Z.....

and so on ...

From the perspective of long-term archival of PDFs and its annotations, from what I've read about Apple's implementation of Extanded Attributes on the file system, Apple supposedly conforms to POSIX.1e ACL, per http://developer.apple.com/documentation/Darwin/Reference/Manpages/man3/acl.3.html but if one reads the Description section of this page carefully, you'll see that there are differences from pure POSIX.1e:

This implementation of the POSIX.1e library differs from the standard in a number of non-portable ways in order to support the MacOS/Darwin ACL semantic. Where possible, these differences are implemented using the mechanisms provided in the standard for such extensions. Where routines are non-standard, they are suffixed with _np to indicate that they are not portable. POSIX.1e describes a set of ACL manipulation routines to manage the contents of ACLs, as well as their relationships with files; almost all of these support routines are implemented.

The "almost all" and "differs from the standard in a number of non- portable ways ..." is concerning to me from the perspective of long- term archiving of PDF documents with annotations, which PDFs with annotations need to be preserved into perpetuity. One reason for concern is that open source network backup solutions such as Bacula < http://www.bacula.org/en/ > do not yet fully handle ACLs in Mac OS X (I have tried and its just not there yet although it may be in the future). So there is the possibility of losing annotations when backing up and restoring.


EAs and ACLs are not the same thing. E.g. they are often handled differently by copy/backup tools (some preserve both, some none, some one but not the other). For Skim notes, the real question is just how EAs are handled. What's relevant is not the API that Apple provides but whether the backup tool preserves the data, that's all. It very much depends on the tool. The FAQ on the Skim Wiki has a discussion about skim notes and backup tools, including a link to a test page of various tools and what they preserve. If you want to be sure you won't lose the notes you can save them in the data of a separate .skim file (you can do this manually, or choose to always do this automatically). You can also convert to a PDF bundle, which is just a file package containing the (original) PDF and the notes in separate files.

I think one can be pretty sure that Apple's implementation of EAs will be compatible with any future changes.

Skim's ability to separate PDF from annotations is excellent. This allows for archival preservation of the original PDF document in a library system and then recombine the annotations with the original as a separate process at a later time for example. While it is possible to and greatly appreciated that export options exist for annotations in the form of text, RTF, RTFD, the only format for round tripping annotations in and out of a PDF document is FDF (I.e., Skim only parses FDF (Forms Data Format)). I can't quite discern if FDF is an open format / ISO standard.

FDF is actually just a simplified form of PDF (in fact, Skim reads FDF by replacing the first "F" in "FDF" by "P" and reading it as PDF), so it's just as open as PDF (though I'm not 100% sure if it's an ISO standard.

Be careful though to use FDF for backup, because it may lead to data loss, as there's not a complete 1-to-1 mapping between Skim notes and PDF/FDF annotations (especially anchored notes do not exist in PDF). Only the Skim Notes export type is completely data preserving (as it's the same data as what's saved in the EAs).

The Skim notes format is a proprietary format from Skim. But it is completely open, Skim is OSS, and the format for Skim notes is completely described on the Wiki. Moreover, a library to read and write them including the source code is available from the site. It uses only standard Cocoa and the BSD library for EAs. So it would always be possible to as a minimum be able to convert Skim notes to whatever you want (including PDF annotations).

According to these references, PDF, like OpenDocument (ODF), is only very recently an ISO standard:

http://en.wikipedia.org/wiki/Pdf
PDF is an open standard that was officially published on July 1, 2008 by the ISO as ISO 32000-1:2008.
http://www.theinquirer.net/inquirer/news/411/1030411/pdf-approved-iso-32000
THE ISO BALLOT to approve Adobe's PDF 1.7 as the ISO 32000 standard passed by an overwhelming vote.


I have been able to download Adobe's PDF 1.7 Reference Sixth Edition (dated November 2006) document here < http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf > which includes sections about FDF structure, but the ISO wants to charge 370 Swiss Francs for the actual 3200-1:2008 document! I suppose for now we can take Adobe's word for it that their Sixth Edition Reference document is the same ISO standard. That being said, when I've tried exporting from Skim to various incarnations of PDF, I see the version (if looked at in a text editor) is 1.3 rather than 1.7 such as:

%PDF-1.3
Reading the Adobe Reference 1.7 Sixth Edition on PDF, the FDF section suggests that beginning with PDF 1.3 is when FDF was first made available for use with annotations.

I think it would behoove Skim to have the ability to Save As and Export PDF documents as PDF 1.7 compliant. In doing so, the outputs from Skim would be ISO 3200-1:2008 global standards compliant, akin to OpenDocument which is also a world standard per ISO. This would include annotations as FDF which is a subst of PDF 1.7 as well. From an archival standpoint, we could decades from now be confident that at least PDF version 1.7 with subset FDF could be relied upon regardless of how the world may change (as optimistic as we may sometimes want to be about the future, the world can sometimes change quickly -- companies can come and go, economies can fluctuate, corporate culture can shift, etc.). Adobe could change (for better or worse) -- I think there is in fact a compelling argument that Skim was created out of need and not wanting to wait around for Adboe's development cycle such as to bring Acrobat to OS X natively based on Cocoa. Even Preview is qutie nice but has its limitations. Apple (and its APIs) could change as well. But as long as we have some global ISO standards to federate to, we can be confident of having readers and writers independent of corporations!

Using the normal Save (or Export as PDF), Skim does not touch the original PDF data at all. The same is true for export without notes. For export with embedded notes, Skim fully relies on Apple's PDFKit, so there's absolutely no control over it.

Actually, PDFKit does a very bad job exporting PDF annotations (I'm talking about Leopard, on Tiger it's not even possible). The saved notes are actually changed. This is one more reason why Skim doesn't use it. You may see this in Preview, though it may not be immediately obvious because Preview does not support the types of annotations for which this is the worst (such as lines and freehand notes), while it uses workarounds for other types of notes (like highlights). To see the problems, try exporting a file containing various types of notes as PDF With Embedded Notes, then reopen that file in Skim and choose File > Convert Notes. Try editing the notes afterwards (especially line notes and multi-line underlines). (Convert Notes will fix this in the next release.)

I think PDFKit currently supports PDF 1.3 features only. The fact that this is lower than 1.7 is not a problem though, quite to the contrary. PDF 1.3 is a strict subset of PDF 1.7; PDF 1.7 just adds new features (such as interactive features). The essence is that it ALLOWS more, it does not REQUIRE more. In other words, PDF 1.3 is always valid as PDF 1.7.

When it comes to document presevation and annotations of those documents, this is the type of stuff that archivists worry about, and correctly so (decades may seem far away, but they will be here sooner than later)!


If you keep on to a version of Skim, you will always be able to read Skim notes. Even if Skim in the future will use PDF rather than .skim notes (which I think won't happen), there will be some old version available that can convert the notes, in fact there should than be a simple conversion tool available, perhaps embedded in the skimnotes tool.

Skim is off to a great start as an open source app and the really great feature it already implements as a native OS X app (which will only get better with time as the underlying of OS X will slated to improve with Snow Leopard)!

Any follow up thoughts?

Cheers,

Hydro

Christiaan

------------------------------------------------------------------------------
_______________________________________________
Skim-app-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/skim-app-users

Reply via email to