RE: Tools for reverse FOP?

2007-11-01 Thread Dudley, Mark
A Google search for PDF to XML will yield both commercial and
open-source tools to do this. Another approach, that has been mentioned,
is to convert the PDF to Word and then output WordML from Word 2003 or
OOXML from Word 2007. Don't expect great XML markup from this process,
or any other for that matter due to limitations in PDF, but it is a good
start.
 
Mark



From: siegfried [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 31, 2007 6:59 PM
To: fop-users@xmlgraphics.apache.org
Subject: Tools for reverse FOP?



Are there any tools that will accept a PDF and produce XML? Might this
be a feature of FOP someday?

Thanks,

Siegfried



Re: Tools for reverse FOP?

2007-11-01 Thread Abel Braaksma

siegfried wrote:


Are there any tools that will accept a PDF and produce XML? Might this 
be a feature of FOP someday?


Thanks,

Siegfried



That's highly improbable, because PDF is a non-structured format and 
going from non-structured to structured is a daunting (and often 
theoretically and practically impossible) task.


There are tools that extract the text from PDF and there are tools that 
extract the images from PDF. And some create Word (iirc) and/or RTF with 
layout. Going from RTF to XSL-FO is then rather easy (rtf is text 
based), but it will get extremely bloated (check out the RTF when you 
have all options set, the RTF is will get huge already for a couple of 
pages!). Much of this has to do with the precise positioning inside pdf. 
Still many objects or properties cannot be extracted at all (borders, 
backgrounds, alpha channels, overlays, partially embedded fonts).


I don't see a reason why FOP would do such a thing (if PDF can be 
treated as input, than Word, RTF, TIFF, BMP etc should also be 
considered, I guess, which makes it next to impossible), it is such a 
specialized task (compare OCR) that other tools are better suited.


Hope this answers your question,

Cheers,
-- Abel Braaksma

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Tools for reverse FOP?

2007-11-01 Thread Andreas L Delmelle

On Oct 31, 2007, at 23:59, siegfried wrote:

Hi

Are there any tools that will accept a PDF and produce XML?


How do you mean this exactly? Translating PDF to a FO-document? What  
is the use-case?



Might this be a feature of FOP someday?


No plans that I'm aware of.


Cheers

Andreas


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Tools for reverse FOP?

2007-10-31 Thread siegfried
Are there any tools that will accept a PDF and produce XML? Might this be a
feature of FOP someday?

Thanks,

Siegfried