Re: Tools for reverse FOP?

2007-11-01 Thread Andreas L Delmelle

On Oct 31, 2007, at 23:59, siegfried wrote:

Hi

Are there any tools that will accept a PDF and produce XML?


How do you mean this exactly? Translating PDF to a FO-document? What  
is the use-case?



Might this be a feature of FOP someday?


No plans that I'm aware of.


Cheers

Andreas


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Tools for reverse FOP?

2007-11-01 Thread Abel Braaksma

siegfried wrote:


Are there any tools that will accept a PDF and produce XML? Might this 
be a feature of FOP someday?


Thanks,

Siegfried



That's highly improbable, because PDF is a non-structured format and 
going from non-structured to structured is a daunting (and often 
theoretically and practically impossible) task.


There are tools that extract the text from PDF and there are tools that 
extract the images from PDF. And some create Word (iirc) and/or RTF with 
layout. Going from RTF to XSL-FO is then rather easy (rtf is text 
based), but it will get extremely bloated (check out the RTF when you 
have all options set, the RTF is will get huge already for a couple of 
pages!). Much of this has to do with the precise positioning inside pdf. 
Still many objects or properties cannot be extracted at all (borders, 
backgrounds, alpha channels, overlays, partially embedded fonts).


I don't see a reason why FOP would do such a thing (if PDF can be 
treated as input, than Word, RTF, TIFF, BMP etc should also be 
considered, I guess, which makes it next to impossible), it is such a 
specialized task (compare OCR) that other tools are better suited.


Hope this answers your question,

Cheers,
-- Abel Braaksma

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]