PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com _____________________________________________________________

Recent versions of Think121's pdfExpress provide a feature apropos to this discussion. pdfExpress supports a regular expression-like search and replace function that operates on PDF content streams. The regex functionality has knowledge of the structure of PDF.

? - matches any single PDF operand
$ - matches any single PDF operator
?$ - matches a single PDF operators
including its associated operands
?, $ and ?$ support * and + for matching
zero or more or one or more occurrences
?$* - matches zero or more occurrences of any operator and its operands
(Jones) 12.45 Tj rg ... - constant values (operands and operators) that match exactly
_sX .. _eX - allows you to group matches like [ ] in regular regex
${_sX} - the value of a group like \1 in regex


So you can say something like

        find:  _s1 ?$* _e1 (%%name%%) Tj
        replace: ${_s1} (BOB SMITH) Tj

When applied against a PDF content stream this will find the first instance of a Tj operator applied to the string (%%name%%) and replace its operand with (BOB SMITH). The _s1 marker groups all the operators and operands that occur prior to the Tj; these are subsequently used in the replacement to prefix the new operator.

This gets used in commercial situations where PDF files are acting as templates (the text to find is always the same) or where additional workflow functions use the result of the find (which can be written to an output file) to perform other operations, e.g., locating TAB pages in a PDF by content and then replacing them with a tray pull command of some sort.

Obviously, this requires detailed knowledge about what you are trying to do and about the structure of the PDFs you are working with.

Todd



To change your subscription: http://www.pdfzone.com/discussions/lists-pdfdev.html



Reply via email to