New To Fop
Hello; I need a advice from more experienced FOP developers. The web project I am contracting on needs to be able to generate a PDF version of various pages the user may be browsing. As of now the only input I have to work with is the HTML of the page being displayed (the system can return it to me as a string during runtime). Speed is a factor so a requirements is the system only creates a new PDF document when the previously created one is out of synch with the content. I need to get this done fast. Can someone suggest what they think the best strategy will be for me to create the document? Should I use an .fo input? Transform the HTML into XML and process it with an XSL? Any tips from some who has done something similar would be very very helpful and appreciated. With Regards, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New To Fop
Luke, What you are looking to do comes up pretty often on this list so you will probably get quite a bit of help. Here's mine What you will need to do is go from HTML into XML then into FO, once in FO, FOP can render it quite quickly into a PDF, your browser can even be used as the delivery mechanism. I wrote a Java Servlet which is invoked via an HTML page link, the links passes the necessary parameters. In your case that will be a reference to the original HMTL file. The next step is not obvious, hence this e-mail. Not all HTML is XML-ready, humans make mistakes which most browsers correct, unbalanced and missing tags for example. Also some tags need to be doctored, BR and HR come to mind, these have no closing tags. What I did here was to use the Tidy engine/library to fix up my HTML into valid XML. Now the job gets pretty easy... The next step is to develop an XSL transform which takes HMTL tags and create FO XML. I have some transforms which I am very happy to share with you, as will others. Nobody has a complete HTML to FO implementation as this would be huge but you can get most of the transform working quickly and then add to it as needed. Once you have the FO XML -- BOOM, a few lines of code later and you've got your PDF. The servlet I wrote actually communicates back to the browser every second and fakes an elapsed progress timer. We had to do this as we originally were running on slow hardware and have very impatient. With our hardware these days the transform and PDF generation runs so quickly, the interaction is more of a nuisance that an aid. But at the time that is what the boss wanted, so I wrote it. --will smime.p7s Description: S/MIME cryptographic signature
Re: New To Fop
Thanks Will. This is the sort of advice I was hoping for. From the little I have played with FOP this makes sense. I would be interested in looking at any code you would like to share. Luke - Original Message - From: Will Gilbert [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, January 04, 2005 11:30 AM Subject: Re: New To Fop Luke, What you are looking to do comes up pretty often on this list so you will probably get quite a bit of help. Here's mine What you will need to do is go from HTML into XML then into FO, once in FO, FOP can render it quite quickly into a PDF, your browser can even be used as the delivery mechanism. I wrote a Java Servlet which is invoked via an HTML page link, the links passes the necessary parameters. In your case that will be a reference to the original HMTL file. The next step is not obvious, hence this e-mail. Not all HTML is XML-ready, humans make mistakes which most browsers correct, unbalanced and missing tags for example. Also some tags need to be doctored, BR and HR come to mind, these have no closing tags. What I did here was to use the Tidy engine/library to fix up my HTML into valid XML. Now the job gets pretty easy... The next step is to develop an XSL transform which takes HMTL tags and create FO XML. I have some transforms which I am very happy to share with you, as will others. Nobody has a complete HTML to FO implementation as this would be huge but you can get most of the transform working quickly and then add to it as needed. Once you have the FO XML -- BOOM, a few lines of code later and you've got your PDF. The servlet I wrote actually communicates back to the browser every second and fakes an elapsed progress timer. We had to do this as we originally were running on slow hardware and have very impatient. With our hardware these days the transform and PDF generation runs so quickly, the interaction is more of a nuisance that an aid. But at the time that is what the boss wanted, so I wrote it. --will - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New To Fop
Hi Will; What I did here was to use the Tidy engine/library to fix up my HTML into valid XML. This library you are referring to. Which package is it part of? Thanks, Luke - Original Message - From: Will Gilbert [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, January 04, 2005 11:30 AM Subject: Re: New To Fop Luke, What you are looking to do comes up pretty often on this list so you will probably get quite a bit of help. Here's mine What you will need to do is go from HTML into XML then into FO, once in FO, FOP can render it quite quickly into a PDF, your browser can even be used as the delivery mechanism. I wrote a Java Servlet which is invoked via an HTML page link, the links passes the necessary parameters. In your case that will be a reference to the original HMTL file. The next step is not obvious, hence this e-mail. Not all HTML is XML-ready, humans make mistakes which most browsers correct, unbalanced and missing tags for example. Also some tags need to be doctored, BR and HR come to mind, these have no closing tags. What I did here was to use the Tidy engine/library to fix up my HTML into valid XML. Now the job gets pretty easy... The next step is to develop an XSL transform which takes HMTL tags and create FO XML. I have some transforms which I am very happy to share with you, as will others. Nobody has a complete HTML to FO implementation as this would be huge but you can get most of the transform working quickly and then add to it as needed. Once you have the FO XML -- BOOM, a few lines of code later and you've got your PDF. The servlet I wrote actually communicates back to the browser every second and fakes an elapsed progress timer. We had to do this as we originally were running on slow hardware and have very impatient. With our hardware these days the transform and PDF generation runs so quickly, the interaction is more of a nuisance that an aid. But at the time that is what the boss wanted, so I wrote it. --will - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]