Re: [poppler] pdf to xml update

Leonard Rosenthol Sun, 13 Aug 2006 16:37:37 -0700

At 11:31 AM 8/13/2006, Jauco Noordzij wrote:

It parses the pages, aggregating textblocks in much the same way as
the current textoutputdev. It then chunks the page into a tree of
nested 'splits'. ie. The page is split in two, then the two parts are
split in two etc. This tree is then turned into blocks and paragraphs.
The process is a bit hard to explain, but works quite well.


        Sounds good.

One thing I noticed from the blog is that it doesn't (yet)support styling information. You should be able to easily carry thisalong using a similar method to the PdfWord used by the current TextOutputDev.

I also have a quick question: Is there a callback function for
outputdevs that gets called at the end of processing the pdf, like the
one that's called at the end of the page?

No, because in an interactive application, no such thingreally exists. The closest you can come is the destructor for theOutputDev. I've put such things there before.


        Another option is a new method for your callers to use...


Leonard

---------------------------------------------------------------------------
Leonard Rosenthol                            <mailto:[EMAIL PROTECTED]>
Chief Technical Officer                      <http://www.pdfsages.com>
PDF Sages, Inc.                              215-938-7080 (voice)
                                             215-938-0880 (fax)

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] pdf to xml update

Reply via email to