On 26 Dec 2008, at 7:59 PM, Hydro Meteor wrote:
On Thu, Dec 25, 2008 at 7:30 AM, Christiaan Hofman
<[email protected]> wrote:
On 25 Dec 2008, at 3:39 PM, Hydro Meteor wrote:
[ SNIP ]
From the perspective of long-term archival of PDFs and its
annotations, from what I've read about Apple's implementation of
Extanded Attributes on the file system, Apple supposedly conforms
to POSIX.1e ACL, per http://developer.apple.com/documentation/Darwin/Reference/Manpages/man3/acl.3.html
but if one reads the Description section of this page carefully,
you'll see that there are differences from pure POSIX.1e:
This implementation of the POSIX.1e library differs from the
standard in a number of non-portable ways in order to support the
MacOS/Darwin ACL semantic. Where possible, these differences are
implemented using the mechanisms provided in the standard for such
extensions. Where routines are non-standard, they are suffixed
with _np to indicate that they are not portable.
POSIX.1e describes a set of ACL manipulation routines to manage the
contents of ACLs, as well as their relationships with files; almost
all of these support routines are implemented.
The "almost all" and "differs from the standard in a number of non-
portable ways ..." is concerning to me from the perspective of long-
term archiving of PDF documents with annotations, which PDFs with
annotations need to be preserved into perpetuity. One reason for
concern is that open source network backup solutions such as Bacula
< http://www.bacula.org/en/ > do not yet fully handle ACLs in Mac
OS X (I have tried and its just not there yet although it may be in
the future). So there is the possibility of losing annotations when
backing up and restoring.
EAs and ACLs are not the same thing. E.g. they are often handled
differently by copy/backup tools (some preserve both, some none,
some one but not the other).
Thanks. I dug a little deeper for clarification and found some of my
sys admin notes. ACLs are a form of EAs but not all EAs are ACLs,
according to Amit Singh's Mac OS X Internals A Systems Approach < http://osxbook.com/
> (page 134) where Singh writes:
ACLs – File system ACLs are supported for finer-grained and flexible
admission control when using on-disk information. Per-file ACLs are
implemented as extended attributes in the file system.
For Skim notes, the real question is just how EAs are handled.
What's relevant is not the API that Apple provides but whether the
backup tool preserves the data, that's all. It very much depends on
the tool.
Great point. My Bacula notes also suggest that some EAs for files
can be backed up and restored. When it comes to quiescently creating
HFS+ or HFSX disk snapshots (using Apple's command-line tools such
as hdiutil and asr), you'll be able to capture everything on the
filesystem because its possible to use these tools at the device
(block) level.
ACLs seem to be implemented as some kind of special EAs, that are
somewhat hidden from the user (at least by the xattr tool, perhaps
even by the BSD library). Maybe that's why bacula doesn't copy them.
However Skim uses ordinary EAs.
The FAQ on the Skim Wiki has a discussion about skim notes and
backup tools, including a link to a test page of various tools and
what they preserve.
Appreciate the links to the pages. Can I add a link to the Wiki page
to the open source Bacula project?
No ATM, as the wiki is currently uneditable.
I am probably a wee bit biased because Bacula is a very solid
industrial strength backup and recovery tool that happens to be just
about the only one out there that is totally FOSS and also multi
platform (runs on OS X as well as Linux). Bacula is known however
for not being able to backup and recover ACLs on OS X (so it may not
fit everyone's needs). I'll for sure have to test it out with Skim-
generated EA metadata for PDFs.
If you want to be sure you won't lose the notes you can save them in
the data of a separate .skim file (you can do this manually, or
choose to always do this automatically). You can also convert to a
PDF bundle, which is just a file package containing the (original)
PDF and the notes in separate files.
I had not checked out Skim's Preference previously -- but I see the
ability to automatically save Skim notes backups. That is a great
feature, thanks for including it!
I think one can be pretty sure that Apple's implementation of EAs
will be compatible with any future changes.
That's probably a reasonable assumption considering that starting
with Leopard, Mac OS X is "an Open Brand UNIX 03 Registered Product,
conforming to the SUSv3 and POSIX 1003.1 specifications for the C
API, Shell Utilities, and Threads" < http://www.apple.com/macosx/technology/unix.html
>
It will be interesting to see what if anything changes on this score
once Snow Leopard is released :-)
Skim's ability to separate PDF from annotations is excellent. This
allows for archival preservation of the original PDF document in a
library system and then recombine the annotations with the original
as a separate process at a later time for example. While it is
possible to and greatly appreciated that export options exist for
annotations in the form of text, RTF, RTFD, the only format for
round tripping annotations in and out of a PDF document is FDF
(I.e., Skim only parses FDF (Forms Data Format)). I can't quite
discern if FDF is an open format / ISO standard.
FDF is actually just a simplified form of PDF (in fact, Skim reads
FDF by replacing the first "F" in "FDF" by "P" and reading it as
PDF), so it's just as open as PDF (though I'm not 100% sure if it's
an ISO standard.
I'll see if I can find out (not sure how to get a hold of the ISO
32000 document but I'll try). My guess is that FDF is part of the
standard because the Adobe 1.7 Reference document includes an entire
section (8.6.6) on Forms Data Format. Here's a copy of the brief
introduction about FDF:
8.6.6 Forms Data Format
This section describes Forms Data Format (FDF), the file format used
for inter-
active form data (PDF 1.2). FDF is used when submitting form data to
a server,
receiving the response, and incorporating it into the interactive
form. It can also
be used to export form data to stand-alone files that can be stored,
transmitted
electronically, and imported back into the corresponding PDF
interactive form.
In addition, beginning in PDF 1.3, FDF can be used to define a
container for an-
notations that are separate from the PDF document to which they apply.
FDF is based on PDF; it uses the same syntax (see Section 3.1,
"Lexical Conven-
tions") and basic object types (Section 3.2, "Objects"), and has
essentially the
same file structure (Section 3.4, "File Structure"). However, it
differs from PDF in
the following ways:
•The cross-reference table (Section 3.4.3, "Cross-Reference Table")
is optional.
•FDF files cannot be updated (see Section 3.4.5, "Incremental
Updates"). Objects can only be of generation 0, and no two objects
can have the same object number.
•The document structure is much simpler than PDF, since the body of
an FDF document consists of only one required object.
•The length of a stream may not be specified by an indirect object.
FDF uses the MIME content type application / vnd . fdf. On the
Windows and
UNIX platforms, FDF files have the extension . fdf; on Mac OS, they
have file type ' FDF '.
Be careful though to use FDF for backup, because it may lead to data
loss, as there's not a complete 1-to-1 mapping between Skim notes
and PDF/FDF annotations (especially anchored notes do not exist in
PDF). Only the Skim Notes export type is completely data preserving
(as it's the same data as what's saved in the EAs).
Thanks for the heads up. Apparently FDF is not capable of handling
anchored notes.
Anchored notes are our invention, and they predate our use of PDF
annotations.
I noticed that Apple's Preview app (on Leopard) provides the ability
to add anchored notes to PDFs.
Those are not anchored notes but notes of type Text. Anchored notes
are implemented as a subclass of Text notes, and the difference is
that anchored notes can have an image and an additional rich text
property (apart from the plain text string).
Does PDFKit provide some API hooks into adding anchored notes, which
Skim is availing of, which is probably what Preview is also availing
of? In other words, are anchored notes pretty much specific to
PDFKit? I wonder how Adobe Acrobat (assuming Acrobat also provides
anchored notes?) structures and saves them (if not as FDF)?
As we invented anchored notes, neither Adobe nor PDFKit knows about
them. When Skim saves with embedded notes or exports to FDF, the
anchored notes are saved as Text annotations, which are the closest
PDF/FDF have to offer. That's why it loses information. Another thing
that's lost is transparency in colors, because PDF/FDF doesn't support
that. Moreover, the font of (Skim's) text notes may get lost.
The Skim notes format is a proprietary format from Skim. But it is
completely open, Skim is OSS, and the format for Skim notes is
completely described on the Wiki. Moreover, a library to read and
write them including the source code is available from the site. It
uses only standard Cocoa and the BSD library for EAs. So it would
always be possible to as a minimum be able to convert Skim notes to
whatever you want (including PDF annotations).
[SNIP]
Using the normal Save (or Export as PDF), Skim does not touch the
original PDF data at all. The same is true for export without notes.
For export with embedded notes, Skim fully relies on Apple's PDFKit,
so there's absolutely no control over it.
Actually, PDFKit does a very bad job exporting PDF annotations (I'm
talking about Leopard, on Tiger it's not even possible). The saved
notes are actually changed. This is one more reason why Skim doesn't
use it.
That seems quite reasonable. So "standard" Cocoa excludes PDFKit?
Yes, it's just Foundation and AppKit. Note that that's just also
what's available as the cross-platform GNUStep.
Given what you've written about PDFKit, I can understand all the
more reason for Skim notes format! My curiosity is probably unusual
in that I'm looking at preservation of annotations from the
perspective of an archivist (such that these notes coudl be looked
at, potentially, 100 years from now). Most people have shorter term
horizons!
100 years is extremely long for computing, and I don't think anything
on my computer will last that long. In fact, I wouldn't even know how
to recover my files on floppy disks from a mere 15 years ago!
You may see this in Preview, though it may not be immediately
obvious because Preview does not support the types of annotations
for which this is the worst (such as lines and freehand notes),
while it uses workarounds for other types of notes (like
highlights). To see the problems, try exporting a file containing
various types of notes as PDF With Embedded Notes, then reopen that
file in Skim and choose File > Convert Notes. Try editing the notes
afterwards (especially line notes and multi-line underlines).
(Convert Notes will fix this in the next release.)
I tried what you suggested. Yuck regarding PDFKit.
Development of PDFKit is pretty slow. I was very much disappointed by
Leopard's improvements, I'd expected a lot more. I got the impression
they've got just one guy working on it, at least he's complaining
about Apple giving too few man hours.
Skim notes all teh way. When you came up with the Skim format, had
you looked at XML-based SVG by any chance? SVG hasn't really taken
off (several years ago I thought it would), and I'm not sure why.
Any thoughts as to why? Ironically, Adobe was behind SVG as a member
of the W3C SVG committee years ago if I'm not mistaken.
I don't really think SVG is appropriate for these notes, it certainly
cannot capture anything. E.g. anchored notes, and colors.
I think PDFKit currently supports PDF 1.3 features only. The fact
that this is lower than 1.7 is not a problem though, quite to the
contrary. PDF 1.3 is a strict subset of PDF 1.7; PDF 1.7 just adds
new features (such as interactive features). The essence is that it
ALLOWS more, it does not REQUIRE more. In other words, PDF 1.3 is
always valid as PDF 1.7.
Great! Unusually mided archivists and curators like me can now rest
more easily at night :-)
When it comes to document presevation and annotations of those
documents, this is the type of stuff that archivists worry about,
and correctly so (decades may seem far away, but they will be here
sooner than later)!
If you keep on to a version of Skim, you will always be able to read
Skim notes. Even if Skim in the future will use PDF rather
than .skim notes (which I think won't happen), there will be some
old version available that can convert the notes, in fact there
should than be a simple conversion tool available, perhaps embedded
in the skimnotes tool.
Even if Apple were to disappear decades from now, there will always
be a museum somewhere that will have a Mac with Tiger or Leopard on
it along with the XCode tools with the standard Cocoa libraries to
link to and compile with gcc ;-)
And moreover, Apple's implementation is not the only one, there's also
GNUStep's.
By the way, thank you (and the other contributors) for such a great
app in Skim. From a reading standpoint, 1.) the Automaic Resize is
fantastic such that the content of a PDF page is resized to the
viewing pane that its in and 2.) Skim is the best I've ever seen
when it comes to Full Screen mode -- I can take a page in Skim, go
Full Screen and then Command + will literally scale it up to the
full physical display size (sans vertical scroll bar on the right
hand side). And then on top of it all, I can still annotate this
fully scaled page and I can even use the Magnify bubble. That is
just sweet! My Grandfather passed away at age 90 several years ago
and in his older age we got him a large CRT (an old TV set) which
was equipped with a video input from a camera that would aim down
and a light shining on it (so he could take a sheet of paper and
have it magnified on the CRT to help him with his vision). If Skim
was around then, we could have gotten him a 30 inch Cinema Display
and he would have loved it!
Its a joy to have discovered Skim a few days ago (I can only imagine
things will get more interesting as Apple plays brings multi touch
to more of the Mac world and perhaps they'll finally bring a tablet
out as light as the Air for way cool annotation capability?).
Cheers,
Hydro
[SNIP]
You're welcome.
Christiaan
------------------------------------------------------------------------------
_______________________________________________
Skim-app-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/skim-app-users