On 27 Dec 2008, at 8:40 PM, Hydro Meteor wrote:
On Fri, Dec 26, 2008 at 1:21 PM, Christiaan Hofman
<[email protected]> wrote:
On 26 Dec 2008, at 7:59 PM, Hydro Meteor wrote:
On Thu, Dec 25, 2008 at 7:30 AM, Christiaan Hofman <[email protected]
> wrote:
On 25 Dec 2008, at 3:39 PM, Hydro Meteor wrote:
[ SNIP ]
From the perspective of long-term archival of PDFs and its
annotations, from what I've read about Apple's implementation of
Extanded Attributes on the file system, Apple supposedly conforms
to POSIX.1e ACL, per http://developer.apple.com/documentation/Darwin/Reference/Manpages/man3/acl.3.html
but if one reads the Description section of this page carefully,
you'll see that there are differences from pure POSIX.1e:
This implementation of the POSIX.1e library differs from the
standard in a number of non-portable ways in order to support the
MacOS/Darwin ACL semantic. Where possible, these differences are
implemented using the mechanisms provided in the standard for such
extensions. Where routines are non-standard, they are suffixed
with _np to indicate that they are not portable.
POSIX.1e describes a set of ACL manipulation routines to manage
the contents of ACLs, as well as their relationships with files;
almost all of these support routines are implemented.
The "almost all" and "differs from the standard in a number of non-
portable ways ..." is concerning to me from the perspective of
long-term archiving of PDF documents with annotations, which PDFs
with annotations need to be preserved into perpetuity. One reason
for concern is that open source network backup solutions such as
Bacula < http://www.bacula.org/en/ > do not yet fully handle ACLs
in Mac OS X (I have tried and its just not there yet although it
may be in the future). So there is the possibility of losing
annotations when backing up and restoring.
EAs and ACLs are not the same thing. E.g. they are often handled
differently by copy/backup tools (some preserve both, some none,
some one but not the other).
Thanks. I dug a little deeper for clarification and found some of
my sys admin notes. ACLs are a form of EAs but not all EAs are
ACLs, according to Amit Singh's Mac OS X Internals A Systems
Approach < http://osxbook.com/ > (page 134) where Singh writes:
ACLs – File system ACLs are supported for finer-grained and
flexible admission control when using on-disk information. Per-file
ACLs are implemented as extended attributes in the file system.
For Skim notes, the real question is just how EAs are handled.
What's relevant is not the API that Apple provides but whether the
backup tool preserves the data, that's all. It very much depends on
the tool.
Great point. My Bacula notes also suggest that some EAs for files
can be backed up and restored. When it comes to quiescently
creating HFS+ or HFSX disk snapshots (using Apple's command-line
tools such as hdiutil and asr), you'll be able to capture
everything on the filesystem because its possible to use these
tools at the device (block) level.
ACLs seem to be implemented as some kind of special EAs, that are
somewhat hidden from the user (at least by the xattr tool, perhaps
even by the BSD library). Maybe that's why bacula doesn't copy them.
However Skim uses ordinary EAs.
At some point sooner than later I will get around to testing Skim
EAs with Bacula. I will confirm my observations to this mailing list
and perhaps cross post to the Bacula mailing list as it might be
useful for people to learn some heuristics on the Bacula mailing
list about various EA flavors.
The FAQ on the Skim Wiki has a discussion about skim notes and
backup tools, including a link to a test page of various tools and
what they preserve.
Appreciate the links to the pages. Can I add a link to the Wiki
page to the open source Bacula project?
No ATM, as the wiki is currently uneditable.
Would you please add it? I think Bacula deserves to be listed among
the resources (I've worked with it quite exensively on Mac OS X
although I'm still learning about Bacula). I am not a Bacula cult
member, its just that I find it to be quite amazing and I'd like for
more people in the Mac OS X community (especially those who
appreciate FOSS < http://en.wikipedia.org/wiki/FOSS >) to be aware
of it.
As I said, the Wiki is uneditble ATM. And I don't have the permissions
(or a clue) to fix it.
[SNIP]
Thanks for the heads up. Apparently FDF is not capable of handling
anchored notes.
Anchored notes are our invention, and they predate our use of PDF
annotations.
I noticed that Apple's Preview app (on Leopard) provides the
ability to add anchored notes to PDFs.
Those are not anchored notes but notes of type Text. Anchored notes
are implemented as a subclass of Text notes, and the difference is
that anchored notes can have an image and an additional rich text
property (apart from the plain text string).
Does PDFKit provide some API hooks into adding anchored notes,
which Skim is availing of, which is probably what Preview is also
availing of? In other words, are anchored notes pretty much
specific to PDFKit? I wonder how Adobe Acrobat (assuming Acrobat
also provides anchored notes?) structures and saves them (if not as
FDF)?
As we invented anchored notes, neither Adobe nor PDFKit knows about
them. When Skim saves with embedded notes or exports to FDF, the
anchored notes are saved as Text annotations, which are the closest
PDF/FDF have to offer. That's why it loses information. Another
thing that's lost is transparency in colors, because PDF/FDF doesn't
support that. Moreover, the font of (Skim's) text notes may get lost.
Appreciated that anchored notes were invented via Skim team! This is
quite educational and a fresh reminder that big corporations don't
always invent great things.
The Skim notes format is a proprietary format from Skim. But it is
completely open, Skim is OSS, and the format for Skim notes is
completely described on the Wiki. Moreover, a library to read and
write them including the source code is available from the site. It
uses only standard Cocoa and the BSD library for EAs. So it would
always be possible to as a minimum be able to convert Skim notes to
whatever you want (including PDF annotations).
[SNIP]
Using the normal Save (or Export as PDF), Skim does not touch the
original PDF data at all. The same is true for export without
notes. For export with embedded notes, Skim fully relies on Apple's
PDFKit, so there's absolutely no control over it.
Actually, PDFKit does a very bad job exporting PDF annotations (I'm
talking about Leopard, on Tiger it's not even possible). The saved
notes are actually changed. This is one more reason why Skim
doesn't use it.
That seems quite reasonable. So "standard" Cocoa excludes PDFKit?
Yes, it's just Foundation and AppKit. Note that that's just also
what's available as the cross-platform GNUStep.
Great. So is in possible, in theory and in practice, to compile and
run Skim on GNUStep? Has anyone tried? With today's rich world of
virtual machine technology, it might be worth giving it a shot in
VMWare or Parallels vm?
Not Skim, which uses many more frameworks (such as PDFKit). Only the
SkimNotesBase framework, which is the core library for reading and
writing Skim notes to EAs or file. I haven't tried it, as I've never
even tried to compile GNUStep. But it should be possible.
Given what you've written about PDFKit, I can understand all the
more reason for Skim notes format! My curiosity is probably unusual
in that I'm looking at preservation of annotations from the
perspective of an archivist (such that these notes coudl be looked
at, potentially, 100 years from now). Most people have shorter term
horizons!
100 years is extremely long for computing, and I don't think
anything on my computer will last that long. In fact, I wouldn't
even know how to recover my files on floppy disks from a mere 15
years ago!
Physical media preservation is one thing, but file formats and
software executables are another. I agreew with you about floppies,
but think about how there are still to this day programs running in
Cobol and Fortran (and operating on data that may be very old
hisorically). In fact, I would argue that data archival and recovery
is going to become ever more increasingly important given the latest
"financial crisis" the world is in (which has been attributed in
part to using 200 year old statistical models in economics -- the
Alan Greenspans and the Milton Friedmans of the world were trained
in the 60s and 70s to think that financial markets were mostly
Guassian and didn't experience kurtosis ala "fat tails", etc.). This
may seem odd to you given your mathematics background (as it does to
me given my background in atmospheric science), but economists don't
generally take into account chaos theory / perurbations /
probabilistic outcomes, for a great article see:
"Economics needs a scientific revolution"
http://www.nature.com/nature/journal/v455/n7217/full/4551181a.html
Financial engineers have put too much faith in untested axioms and
faulty models, says Jean-Philippe Bouchaud. To prevent economic
havoc, that needs to change.
Compared with physics, it seems fair to say that the quantitative
success of the economic sciences has been disappointing. Rockets fly
to the Moon; energy is extracted from minute changes of atomic mass.
Besides economics likely to be changing (and thus the importance of
archiving), also don't forget that 100 years is child's play in
terms of the scale of climatology (as our friends up north at UND in
Grand Forks at the Center for Aerospace Sciences can attest to).
I may know more about finance and economics than you may think ;-) But
I won't indulge.
Thus, if an organization (such as a university or a research lab)
has built up years and years worth of PDFs and important annotations
of those PDFs, archiving both the PDFs and the annotations well into
the future (beyond a human lifetime or two) is really important.
Imagine university professor today who may have made an incredible
insightful annotation on a PDF document today written about, say,
the financial crisis or about global climate change, and what if
that professor passes away unexpectedly -- a student years from now
should be able to discover that professor's annotations for use in
his or her research which might lead to further revelations and
insights!
This is why I approached Skim and PDF from the ISO standard
viewpoint -- as crazy as it may have seemed at first. Also this is
why I happen to be adamant about open source (such that the source
code can always be compiled independent of platform, independent of
corporation) and why I love OpenDocument (to be free of the shackles
of Microsoft with regard to Word processing once and for all see
also < http://www.boston.com/business/technology/articles/2005/09/02/state_may_drop_office_software/
>)!
I agree completely with you, especially in regards to the importance
of OSS.
You may see this in Preview, though it may not be immediately
obvious because Preview does not support the types of annotations
for which this is the worst (such as lines and freehand notes),
while it uses workarounds for other types of notes (like
highlights). To see the problems, try exporting a file containing
various types of notes as PDF With Embedded Notes, then reopen that
file in Skim and choose File > Convert Notes. Try editing the notes
afterwards (especially line notes and multi-line underlines).
(Convert Notes will fix this in the next release.)
I tried what you suggested. Yuck regarding PDFKit.
Development of PDFKit is pretty slow. I was very much disappointed
by Leopard's improvements, I'd expected a lot more. I got the
impression they've got just one guy working on it, at least he's
complaining about Apple giving too few man hours.
That's very interesting to know. Thanks for sharing. Perhaps not too
surprising. I recall Leopard was delayed from its original intended
release date because Apple needed to borrow some engineers and human
power for the iPhone. This is another good example of why its tricky
to count on corporations -- they wil ebb and flow to do what's in
their best interest which is fine (not a judgment) but why its great
if we can compile against FOSS (e.g., GNUstep).
Skim notes all teh way. When you came up with the Skim format, had
you looked at XML-based SVG by any chance? SVG hasn't really taken
off (several years ago I thought it would), and I'm not sure why.
Any thoughts as to why? Ironically, Adobe was behind SVG as a
member of the W3C SVG committee years ago if I'm not mistaken.
I don't really think SVG is appropriate for these notes, it
certainly cannot capture anything. E.g. anchored notes, and colors.
Perhaps XML (for everything manta) pushed by the W3C got carried
away (SOAP, SVG, XUL, SMIL, OWL, RDF, etc, etc.) ... committees are
not known to always chalk up efficiency.
I think PDFKit currently supports PDF 1.3 features only. The fact
that this is lower than 1.7 is not a problem though, quite to the
contrary. PDF 1.3 is a strict subset of PDF 1.7; PDF 1.7 just adds
new features (such as interactive features). The essence is that it
ALLOWS more, it does not REQUIRE more. In other words, PDF 1.3 is
always valid as PDF 1.7.
Great! Unusually mided archivists and curators like me can now rest
more easily at night :-)
When it comes to document presevation and annotations of those
documents, this is the type of stuff that archivists worry about,
and correctly so (decades may seem far away, but they will be here
sooner than later)!
If you keep on to a version of Skim, you will always be able to
read Skim notes. Even if Skim in the future will use PDF rather
than .skim notes (which I think won't happen), there will be some
old version available that can convert the notes, in fact there
should than be a simple conversion tool available, perhaps embedded
in the skimnotes tool.
Even if Apple were to disappear decades from now, there will always
be a museum somewhere that will have a Mac with Tiger or Leopard on
it along with the XCode tools with the standard Cocoa libraries to
link to and compile with gcc ;-)
And moreover, Apple's implementation is not the only one, there's
also GNUStep's.
As aforementioned, Skim should be able to compile under GNUStep?
No, but the basic Skim notes read/write component is.
Christiaan
[SNIP]
You're welcome.
Christiaan
Cheers,
Hydro
------------------------------------------------------------------------------
_______________________________________________
Skim-app-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/skim-app-users