On 26 Dec 2008, at 7:59 PM, Hydro Meteor wrote:



On Thu, Dec 25, 2008 at 7:30 AM, Christiaan Hofman <[email protected]> wrote:

On 25 Dec 2008, at 3:39 PM, Hydro Meteor wrote:



[ SNIP ]

From the perspective of long-term archival of PDFs and its annotations, from what I've read about Apple's implementation of Extanded Attributes on the file system, Apple supposedly conforms to POSIX.1e ACL, per http://developer.apple.com/documentation/Darwin/Reference/Manpages/man3/acl.3.html but if one reads the Description section of this page carefully, you'll see that there are differences from pure POSIX.1e:

This implementation of the POSIX.1e library differs from the standard in a number of non-portable ways in order to support the MacOS/Darwin ACL semantic. Where possible, these differences are implemented using the mechanisms provided in the standard for such extensions. Where routines are non-standard, they are suffixed with _np to indicate that they are not portable. POSIX.1e describes a set of ACL manipulation routines to manage the contents of ACLs, as well as their relationships with files; almost all of these support routines are implemented.

The "almost all" and "differs from the standard in a number of non- portable ways ..." is concerning to me from the perspective of long- term archiving of PDF documents with annotations, which PDFs with annotations need to be preserved into perpetuity. One reason for concern is that open source network backup solutions such as Bacula < http://www.bacula.org/en/ > do not yet fully handle ACLs in Mac OS X (I have tried and its just not there yet although it may be in the future). So there is the possibility of losing annotations when backing up and restoring.


EAs and ACLs are not the same thing. E.g. they are often handled differently by copy/backup tools (some preserve both, some none, some one but not the other).

Thanks. I dug a little deeper for clarification and found some of my sys admin notes. ACLs are a form of EAs but not all EAs are ACLs, according to Amit Singh's Mac OS X Internals A Systems Approach < http://osxbook.com/ > (page 134) where Singh writes: ACLs – File system ACLs are supported for finer-grained and flexible admission control when using on-disk information. Per-file ACLs are implemented as extended attributes in the file system. For Skim notes, the real question is just how EAs are handled. What's relevant is not the API that Apple provides but whether the backup tool preserves the data, that's all. It very much depends on the tool.

Great point. My Bacula notes also suggest that some EAs for files can be backed up and restored. When it comes to quiescently creating HFS+ or HFSX disk snapshots (using Apple's command-line tools such as hdiutil and asr), you'll be able to capture everything on the filesystem because its possible to use these tools at the device (block) level.


ACLs seem to be implemented as some kind of special EAs, that are somewhat hidden from the user (at least by the xattr tool, perhaps even by the BSD library). Maybe that's why bacula doesn't copy them. However Skim uses ordinary EAs.

The FAQ on the Skim Wiki has a discussion about skim notes and backup tools, including a link to a test page of various tools and what they preserve.

Appreciate the links to the pages. Can I add a link to the Wiki page to the open source Bacula project?


No ATM, as the wiki is currently uneditable.

I am probably a wee bit biased because Bacula is a very solid industrial strength backup and recovery tool that happens to be just about the only one out there that is totally FOSS and also multi platform (runs on OS X as well as Linux). Bacula is known however for not being able to backup and recover ACLs on OS X (so it may not fit everyone's needs). I'll for sure have to test it out with Skim- generated EA metadata for PDFs.

If you want to be sure you won't lose the notes you can save them in the data of a separate .skim file (you can do this manually, or choose to always do this automatically). You can also convert to a PDF bundle, which is just a file package containing the (original) PDF and the notes in separate files.

I had not checked out Skim's Preference previously -- but I see the ability to automatically save Skim notes backups. That is a great feature, thanks for including it!


I think one can be pretty sure that Apple's implementation of EAs will be compatible with any future changes.

That's probably a reasonable assumption considering that starting with Leopard, Mac OS X is "an Open Brand UNIX 03 Registered Product, conforming to the SUSv3 and POSIX 1003.1 specifications for the C API, Shell Utilities, and Threads" < http://www.apple.com/macosx/technology/unix.html >

It will be interesting to see what if anything changes on this score once Snow Leopard is released :-)

Skim's ability to separate PDF from annotations is excellent. This allows for archival preservation of the original PDF document in a library system and then recombine the annotations with the original as a separate process at a later time for example. While it is possible to and greatly appreciated that export options exist for annotations in the form of text, RTF, RTFD, the only format for round tripping annotations in and out of a PDF document is FDF (I.e., Skim only parses FDF (Forms Data Format)). I can't quite discern if FDF is an open format / ISO standard.

FDF is actually just a simplified form of PDF (in fact, Skim reads FDF by replacing the first "F" in "FDF" by "P" and reading it as PDF), so it's just as open as PDF (though I'm not 100% sure if it's an ISO standard.

I'll see if I can find out (not sure how to get a hold of the ISO 32000 document but I'll try). My guess is that FDF is part of the standard because the Adobe 1.7 Reference document includes an entire section (8.6.6) on Forms Data Format. Here's a copy of the brief introduction about FDF:

8.6.6 Forms Data Format

This section describes Forms Data Format (FDF), the file format used for inter- active form data (PDF 1.2). FDF is used when submitting form data to a server, receiving the response, and incorporating it into the interactive form. It can also be used to export form data to stand-alone files that can be stored, transmitted electronically, and imported back into the corresponding PDF interactive form. In addition, beginning in PDF 1.3, FDF can be used to define a container for an-
notations that are separate from the PDF document to which they apply.

FDF is based on PDF; it uses the same syntax (see Section 3.1, "Lexical Conven- tions") and basic object types (Section 3.2, "Objects"), and has essentially the same file structure (Section 3.4, "File Structure"). However, it differs from PDF in
the following ways:

•The cross-reference table (Section 3.4.3, "Cross-Reference Table") is optional.

•FDF files cannot be updated (see Section 3.4.5, "Incremental Updates"). Objects can only be of generation 0, and no two objects can have the same object number.

•The document structure is much simpler than PDF, since the body of an FDF document consists of only one required object.

•The length of a stream may not be specified by an indirect object.
FDF uses the MIME content type application / vnd . fdf. On the Windows and UNIX platforms, FDF files have the extension . fdf; on Mac OS, they have file type ' FDF '.


Be careful though to use FDF for backup, because it may lead to data loss, as there's not a complete 1-to-1 mapping between Skim notes and PDF/FDF annotations (especially anchored notes do not exist in PDF). Only the Skim Notes export type is completely data preserving (as it's the same data as what's saved in the EAs).

Thanks for the heads up. Apparently FDF is not capable of handling anchored notes.

Anchored notes are our invention, and they predate our use of PDF annotations.

I noticed that Apple's Preview app (on Leopard) provides the ability to add anchored notes to PDFs.

Those are not anchored notes but notes of type Text. Anchored notes are implemented as a subclass of Text notes, and the difference is that anchored notes can have an image and an additional rich text property (apart from the plain text string).

Does PDFKit provide some API hooks into adding anchored notes, which Skim is availing of, which is probably what Preview is also availing of? In other words, are anchored notes pretty much specific to PDFKit? I wonder how Adobe Acrobat (assuming Acrobat also provides anchored notes?) structures and saves them (if not as FDF)?


As we invented anchored notes, neither Adobe nor PDFKit knows about them. When Skim saves with embedded notes or exports to FDF, the anchored notes are saved as Text annotations, which are the closest PDF/FDF have to offer. That's why it loses information. Another thing that's lost is transparency in colors, because PDF/FDF doesn't support that. Moreover, the font of (Skim's) text notes may get lost.


The Skim notes format is a proprietary format from Skim. But it is completely open, Skim is OSS, and the format for Skim notes is completely described on the Wiki. Moreover, a library to read and write them including the source code is available from the site. It uses only standard Cocoa and the BSD library for EAs. So it would always be possible to as a minimum be able to convert Skim notes to whatever you want (including PDF annotations).

[SNIP]

Using the normal Save (or Export as PDF), Skim does not touch the original PDF data at all. The same is true for export without notes. For export with embedded notes, Skim fully relies on Apple's PDFKit, so there's absolutely no control over it.

Actually, PDFKit does a very bad job exporting PDF annotations (I'm talking about Leopard, on Tiger it's not even possible). The saved notes are actually changed. This is one more reason why Skim doesn't use it.

That seems quite reasonable. So "standard" Cocoa excludes PDFKit?

Yes, it's just Foundation and AppKit. Note that that's just also what's available as the cross-platform GNUStep.

Given what you've written about PDFKit, I can understand all the more reason for Skim notes format! My curiosity is probably unusual in that I'm looking at preservation of annotations from the perspective of an archivist (such that these notes coudl be looked at, potentially, 100 years from now). Most people have shorter term horizons!


100 years is extremely long for computing, and I don't think anything on my computer will last that long. In fact, I wouldn't even know how to recover my files on floppy disks from a mere 15 years ago!

You may see this in Preview, though it may not be immediately obvious because Preview does not support the types of annotations for which this is the worst (such as lines and freehand notes), while it uses workarounds for other types of notes (like highlights). To see the problems, try exporting a file containing various types of notes as PDF With Embedded Notes, then reopen that file in Skim and choose File > Convert Notes. Try editing the notes afterwards (especially line notes and multi-line underlines). (Convert Notes will fix this in the next release.)

I tried what you suggested. Yuck regarding PDFKit.

Development of PDFKit is pretty slow. I was very much disappointed by Leopard's improvements, I'd expected a lot more. I got the impression they've got just one guy working on it, at least he's complaining about Apple giving too few man hours.

Skim notes all teh way. When you came up with the Skim format, had you looked at XML-based SVG by any chance? SVG hasn't really taken off (several years ago I thought it would), and I'm not sure why. Any thoughts as to why? Ironically, Adobe was behind SVG as a member of the W3C SVG committee years ago if I'm not mistaken.

I don't really think SVG is appropriate for these notes, it certainly cannot capture anything. E.g. anchored notes, and colors.


I think PDFKit currently supports PDF 1.3 features only. The fact that this is lower than 1.7 is not a problem though, quite to the contrary. PDF 1.3 is a strict subset of PDF 1.7; PDF 1.7 just adds new features (such as interactive features). The essence is that it ALLOWS more, it does not REQUIRE more. In other words, PDF 1.3 is always valid as PDF 1.7.

Great! Unusually mided archivists and curators like me can now rest more easily at night :-)
When it comes to document presevation and annotations of those documents, this is the type of stuff that archivists worry about, and correctly so (decades may seem far away, but they will be here sooner than later)!

If you keep on to a version of Skim, you will always be able to read Skim notes. Even if Skim in the future will use PDF rather than .skim notes (which I think won't happen), there will be some old version available that can convert the notes, in fact there should than be a simple conversion tool available, perhaps embedded in the skimnotes tool.

Even if Apple were to disappear decades from now, there will always be a museum somewhere that will have a Mac with Tiger or Leopard on it along with the XCode tools with the standard Cocoa libraries to link to and compile with gcc ;-)

And moreover, Apple's implementation is not the only one, there's also GNUStep's.


By the way, thank you (and the other contributors) for such a great app in Skim. From a reading standpoint, 1.) the Automaic Resize is fantastic such that the content of a PDF page is resized to the viewing pane that its in and 2.) Skim is the best I've ever seen when it comes to Full Screen mode -- I can take a page in Skim, go Full Screen and then Command + will literally scale it up to the full physical display size (sans vertical scroll bar on the right hand side). And then on top of it all, I can still annotate this fully scaled page and I can even use the Magnify bubble. That is just sweet! My Grandfather passed away at age 90 several years ago and in his older age we got him a large CRT (an old TV set) which was equipped with a video input from a camera that would aim down and a light shining on it (so he could take a sheet of paper and have it magnified on the CRT to help him with his vision). If Skim was around then, we could have gotten him a 30 inch Cinema Display and he would have loved it!

Its a joy to have discovered Skim a few days ago (I can only imagine things will get more interesting as Apple plays brings multi touch to more of the Mac world and perhaps they'll finally bring a tablet out as light as the Air for way cool annotation capability?).

Cheers,

Hydro


[SNIP]

You're welcome.

Christiaan

------------------------------------------------------------------------------
_______________________________________________
Skim-app-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/skim-app-users

Reply via email to