Are those PDF supposed to be searchable inside of them ? For archival purpose, the PDF are stored in their final form, and search is performed by creating a database of descriptive metadata. Each time one wants formal details, they have to read the original the way it was presented (many PDFs are jsut scanned facsimiles of old documents which originately were not even in numeric plain-text, they were printed or typewritten, frequently they include graphics, handwritten signatures, stamped seals...)
Being able to search plain-text inside a PDF is not the main objective (and not the priority). The archival however is a top priority (and there's no money to finance a numerisation and no human resource available to redo this old work, if needed other contributors will recreate a plain-text version, possibly with rich-text features, e.g. in Wikisource for old documents that fall in the public domain). PDF/A-1a is meant only for creating new documents from a original plain-text or rich-text document created with modern word-processing applications. But this specification will frequently have to be broken, if there's the need to include handwritten or supplementary elements (signatures, seals...) whose source is not the original electronic document but the printed paper over which the annotations were made: it is this paper document, not the electronic document which is the official final source (we've got some important legal paper whose original has other marks including traces of beer or coffee, or partly burnt, the paper itself has several alterations, but it is the original "as is", and for legal purpose the only acceptable archival form as a PDF must ignore all the PDF/A-1a constraints, not meant to represent originals accurately). 2016-03-20 20:52 GMT+01:00 Tom Gewecke <[email protected]>: > > > On Mar 20, 2016, at 12:24 PM, Asmus Freytag (t) <[email protected]> > wrote: > > > > Usually, the archive feature pertains only to the fact that you can > reproduce the final form, not to being able to get at the correct source > (plain text backbone) for the document. > > My understanding is that PDF/A-1a is supposed to be searchable. > > > >

