siegfried wrote:
Are there any tools that will accept a PDF and produce XML? Might this
be a feature of FOP someday?
Thanks,
Siegfried
That's highly improbable, because PDF is a non-structured format and
going from non-structured to structured is a daunting (and often
theoretically and practically impossible) task.
There are tools that extract the text from PDF and there are tools that
extract the images from PDF. And some create Word (iirc) and/or RTF with
layout. Going from RTF to XSL-FO is then rather easy (rtf is text
based), but it will get extremely bloated (check out the RTF when you
have all options set, the RTF is will get huge already for a couple of
pages!). Much of this has to do with the precise positioning inside pdf.
Still many objects or properties cannot be extracted at all (borders,
backgrounds, alpha channels, overlays, partially embedded fonts).
I don't see a reason why FOP would do such a thing (if PDF can be
treated as input, than Word, RTF, TIFF, BMP etc should also be
considered, I guess, which makes it next to impossible), it is such a
specialized task (compare OCR) that other tools are better suited.
Hope this answers your question,
Cheers,
-- Abel Braaksma
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]