On 25 Dec 2008, at 3:39 PM, Hydro Meteor wrote:
Hello all --
I noticed that Skim, besides saving PDF documents, saves annotations
in Mac OS X file system metadata (extended attributes). This clearly
shows up in Leopard (not sure about Tiger).
Here's an "untouched" PDF before annotations were added with Skim:
-rw-r--r-- 1 hydro staff 24733 Dec 25 12:06 control.pdf
and after annotations were added:
-rw-r--r--@ 1 hydro staff 24733 Dec 25 12:10 control.pdf
the xattr command-line tool (Leopard only?) reveals the extended
attributes in what appears to be structured "chunks" where each
chunk has a header of some sort:
$ xattr -l control.pdf
com.apple.FinderInfo:
0000 50 44 46 20 00 00 00 00 00 00 00 00 00 00 00 00
PDF ............
0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 ................
...
net_sourceforge_skim-app_699F2F81-6C76-4909-
A962-97BE5ABF2E8E-29358-0001A72CDDAB061A-0:
0000 42 5A 68 35 31 41 59 26 53 59 48 43 37 D6 00 0B
BZh51AY&SYHC7...
...
net_sourceforge_skim-app_699F2F81-6C76-4909-
A962-97BE5ABF2E8E-29358-0001A72CDDAB061A-1:
0000 C2 3E 18 94 A0 60 AA 78 9F 3C 5A 17 DC B0 B3
A0 .>...`.x.<Z.....
and so on ...
From the perspective of long-term archival of PDFs and its
annotations, from what I've read about Apple's implementation of
Extanded Attributes on the file system, Apple supposedly conforms to
POSIX.1e ACL, per http://developer.apple.com/documentation/Darwin/Reference/Manpages/man3/acl.3.html
but if one reads the Description section of this page carefully,
you'll see that there are differences from pure POSIX.1e:
This implementation of the POSIX.1e library differs from the
standard in a number of non-portable ways in order to support the
MacOS/Darwin ACL semantic. Where possible, these differences are
implemented using the mechanisms provided in the standard for such
extensions. Where routines are non-standard, they are suffixed with
_np to indicate that they are not portable.
POSIX.1e describes a set of ACL manipulation routines to manage the
contents of ACLs, as well as their relationships with files; almost
all of these support routines are implemented.
The "almost all" and "differs from the standard in a number of non-
portable ways ..." is concerning to me from the perspective of long-
term archiving of PDF documents with annotations, which PDFs with
annotations need to be preserved into perpetuity. One reason for
concern is that open source network backup solutions such as Bacula
< http://www.bacula.org/en/ > do not yet fully handle ACLs in Mac OS
X (I have tried and its just not there yet although it may be in the
future). So there is the possibility of losing annotations when
backing up and restoring.
EAs and ACLs are not the same thing. E.g. they are often handled
differently by copy/backup tools (some preserve both, some none, some
one but not the other). For Skim notes, the real question is just how
EAs are handled. What's relevant is not the API that Apple provides
but whether the backup tool preserves the data, that's all. It very
much depends on the tool. The FAQ on the Skim Wiki has a discussion
about skim notes and backup tools, including a link to a test page of
various tools and what they preserve. If you want to be sure you won't
lose the notes you can save them in the data of a separate .skim file
(you can do this manually, or choose to always do this automatically).
You can also convert to a PDF bundle, which is just a file package
containing the (original) PDF and the notes in separate files.
I think one can be pretty sure that Apple's implementation of EAs will
be compatible with any future changes.
Skim's ability to separate PDF from annotations is excellent. This
allows for archival preservation of the original PDF document in a
library system and then recombine the annotations with the original
as a separate process at a later time for example. While it is
possible to and greatly appreciated that export options exist for
annotations in the form of text, RTF, RTFD, the only format for
round tripping annotations in and out of a PDF document is FDF
(I.e., Skim only parses FDF (Forms Data Format)). I can't quite
discern if FDF is an open format / ISO standard.
FDF is actually just a simplified form of PDF (in fact, Skim reads FDF
by replacing the first "F" in "FDF" by "P" and reading it as PDF), so
it's just as open as PDF (though I'm not 100% sure if it's an ISO
standard.
Be careful though to use FDF for backup, because it may lead to data
loss, as there's not a complete 1-to-1 mapping between Skim notes and
PDF/FDF annotations (especially anchored notes do not exist in PDF).
Only the Skim Notes export type is completely data preserving (as it's
the same data as what's saved in the EAs).
The Skim notes format is a proprietary format from Skim. But it is
completely open, Skim is OSS, and the format for Skim notes is
completely described on the Wiki. Moreover, a library to read and
write them including the source code is available from the site. It
uses only standard Cocoa and the BSD library for EAs. So it would
always be possible to as a minimum be able to convert Skim notes to
whatever you want (including PDF annotations).
According to these references, PDF, like OpenDocument (ODF), is only
very recently an ISO standard:
http://en.wikipedia.org/wiki/Pdf
PDF is an open standard that was officially published on July 1,
2008 by the ISO as ISO 32000-1:2008.
http://www.theinquirer.net/inquirer/news/411/1030411/pdf-approved-iso-32000
THE ISO BALLOT to approve Adobe's PDF 1.7 as the ISO 32000 standard
passed by an overwhelming vote.
I have been able to download Adobe's PDF 1.7 Reference Sixth Edition
(dated November 2006) document here < http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
> which includes sections about FDF structure, but the ISO wants
to charge 370 Swiss Francs for the actual 3200-1:2008 document! I
suppose for now we can take Adobe's word for it that their Sixth
Edition Reference document is the same ISO standard. That being
said, when I've tried exporting from Skim to various incarnations of
PDF, I see the version (if looked at in a text editor) is 1.3 rather
than 1.7 such as:
%PDF-1.3
Reading the Adobe Reference 1.7 Sixth Edition on PDF, the FDF
section suggests that beginning with PDF 1.3 is when FDF was first
made available for use with annotations.
I think it would behoove Skim to have the ability to Save As and
Export PDF documents as PDF 1.7 compliant. In doing so, the outputs
from Skim would be ISO 3200-1:2008 global standards compliant, akin
to OpenDocument which is also a world standard per ISO. This would
include annotations as FDF which is a subst of PDF 1.7 as well. From
an archival standpoint, we could decades from now be confident that
at least PDF version 1.7 with subset FDF could be relied upon
regardless of how the world may change (as optimistic as we may
sometimes want to be about the future, the world can sometimes
change quickly -- companies can come and go, economies can
fluctuate, corporate culture can shift, etc.). Adobe could change
(for better or worse) -- I think there is in fact a compelling
argument that Skim was created out of need and not wanting to wait
around for Adboe's development cycle such as to bring Acrobat to OS
X natively based on Cocoa. Even Preview is qutie nice but has its
limitations. Apple (and its APIs) could change as well. But as long
as we have some global ISO standards to federate to, we can be
confident of having readers and writers independent of corporations!
Using the normal Save (or Export as PDF), Skim does not touch the
original PDF data at all. The same is true for export without notes.
For export with embedded notes, Skim fully relies on Apple's PDFKit,
so there's absolutely no control over it.
Actually, PDFKit does a very bad job exporting PDF annotations (I'm
talking about Leopard, on Tiger it's not even possible). The saved
notes are actually changed. This is one more reason why Skim doesn't
use it. You may see this in Preview, though it may not be immediately
obvious because Preview does not support the types of annotations for
which this is the worst (such as lines and freehand notes), while it
uses workarounds for other types of notes (like highlights). To see
the problems, try exporting a file containing various types of notes
as PDF With Embedded Notes, then reopen that file in Skim and choose
File > Convert Notes. Try editing the notes afterwards (especially
line notes and multi-line underlines). (Convert Notes will fix this in
the next release.)
I think PDFKit currently supports PDF 1.3 features only. The fact that
this is lower than 1.7 is not a problem though, quite to the contrary.
PDF 1.3 is a strict subset of PDF 1.7; PDF 1.7 just adds new features
(such as interactive features). The essence is that it ALLOWS more, it
does not REQUIRE more. In other words, PDF 1.3 is always valid as PDF
1.7.
When it comes to document presevation and annotations of those
documents, this is the type of stuff that archivists worry about,
and correctly so (decades may seem far away, but they will be here
sooner than later)!
If you keep on to a version of Skim, you will always be able to read
Skim notes. Even if Skim in the future will use PDF rather than .skim
notes (which I think won't happen), there will be some old version
available that can convert the notes, in fact there should than be a
simple conversion tool available, perhaps embedded in the skimnotes
tool.
Skim is off to a great start as an open source app and the really
great feature it already implements as a native OS X app (which will
only get better with time as the underlying of OS X will slated to
improve with Snow Leopard)!
Any follow up thoughts?
Cheers,
Hydro
Christiaan
------------------------------------------------------------------------------
_______________________________________________
Skim-app-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/skim-app-users