On Fri, Feb 03, 2006 at 01:38:13PM +0100, PLinnell wrote: > On Friday 03 February 2006 06:40, Bart Alberti wrote: > > The author of pdftk has a book PDF HACKS published by O'reilly > > which is mostly a collection of recipes for dealing with pdf files. > > However, there is a scheme for using a plug in to vim or gvim which > > uncompresses the pdf and put it in the text editor. Since vim is > > scriptable (I don't know personally how to) this may be a clue to > > long sought import and/or edit pdf. > > Bart Alberti > > _______________________________________________ > > Scribus mailing list > > Scribus at nashi.altmuehlnet.de > > http://nashi.altmuehlnet.de/mailman/listinfo/scribus > > PDF editing is vastly more complex than editing the raw PDF source in > vi. One of the major stumbling blocks is the non-linear nature of > PDF. > > It will take a powerful parser to be able to edit PDF natively.
I'm beginning to believe that a traditional "parser" is the best way to absolutely muck up PDF editing and import. You really need an implemention of the PDF document and file structure that follows the non-linear PDF model - reads the xref table and knows how to find indirect objects, etc. Alongside that you need a (probably reasonably simple) parser that can read PDF objects and traverse nested arrays and dictionaries. You also need decoders for the PDF stream filter algorithms. A final component would be a PDF content stream parser, but it might well be veru simplistic and dumb depending on your needs. By recognising that PDF is a series of related data formats in a container, your life should be made a LOT easier. I'm working on this for PDF output at the moment and I think I can in time adapt my design for input/processing/editing of PDF as well. I shudder at the thought of trying to write a comprehensive grammar for PDF as a whole using a traditional parser generator. Nasty, inefficient, error-prone, and generally not a nice prospect, I suspect. -- Craig Ringer
