At 11:31 AM 8/13/2006, Jauco Noordzij wrote:
It parses the pages, aggregating textblocks in much the same way as
the current textoutputdev. It then chunks the page into a tree of
nested 'splits'. ie. The page is split in two, then the two parts are
split in two etc. This tree is then turned into blocks and paragraphs.
The process is a bit hard to explain, but works quite well.
Sounds good.
One thing I noticed from the blog is that it doesn't (yet)
support styling information. You should be able to easily carry this
along using a similar method to the PdfWord used by the current TextOutputDev.
I also have a quick question: Is there a callback function for
outputdevs that gets called at the end of processing the pdf, like the
one that's called at the end of the page?
No, because in an interactive application, no such thing
really exists. The closest you can come is the destructor for the
OutputDev. I've put such things there before.
Another option is a new method for your callers to use...
Leonard
---------------------------------------------------------------------------
Leonard Rosenthol <mailto:[EMAIL PROTECTED]>
Chief Technical Officer <http://www.pdfsages.com>
PDF Sages, Inc. 215-938-7080 (voice)
215-938-0880 (fax)
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler