Re: [dev] scripted multiplatform .doc to .html conversion
Kirk Israel wrote: And in terms of expanding on that so that it have .doc as input (right now it seems to only accept .odt) and HTML as output (currently not one of the options listed in the program), are there any gotchas I should know about or is it just about finding some appropriate API documentation and doing the fairly obvious things? have a look at jooconvert (http://jooreports.sourceforge.net/) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [dev] scripted multiplatform .doc to .html conversion
Hi Kirk, No you simply have to discover the right filters which have to be used ;-) A more appropriate place to ask is: - dev@api.openoffice.org - http://api.openoffice.org/DevelopersGuide/DevelopersGuide.html But once more if you plan to use this snippet in a multi-threaded environment like your J2EE-Server you need to serialize access in your application or have to have a pool of OO-Instances and dispatch conversion process to one of them. One more thing, I don't think that you need to have OOo/programm-Directory in your class-path OpenOffice 2 libs should locate the soffice.bin itself but I could be mistaken here. Tom Kirk Israel wrote: Tom, thanks, that is very cool. I was able to get the snippet up and running... Through trial and error I got the correct Jars from my OOo directory I needed to compile against, and then Google indicated I needed to include the OOo/program directory in the classpath. Was there a smarter way I should have know the above? And in terms of expanding on that so that it have .doc as input (right now it seems to only accept .odt) and HTML as output (currently not one of the options listed in the program), are there any gotchas I should know about or is it just about finding some appropriate API documentation and doing the fairly obvious things? This was a great first step, many thanks! -Kirk On 4/19/06, Tom Schindl [EMAIL PROTECTED] wrote: Hi, there's a fully functional codesnippet available which does show how document-conversion can happen. http://codesnippets.services.openoffice.org/Office/Office.ConvertDocuments.snip If you are running this from a J2EE application you need to take into consideration that ***one*** OO-Instance can not deal with multiple request at the same time, so must: - serialize access to OO - create a pool of instances you connect to and serialize access to them Tom Kirk Israel wrote: This project got backburnered but is now coming up again, the concept of integrating OOo's doc to HTML conversion as seamlessly as possible into an exisint J2EE application. My understanding is that OOo must be present (copied to, but not necceaarily installed, baed on Mathias' previous comments. At that point, it should be fairly easy to go through with the UNO libraries...is that about the size of it? Am I missing anything, or are there any resources that might make this easier? Thanks, Kirk
Re: [dev] scripted multiplatform .doc to .html conversion
Hi, there's a fully functional codesnippet available which does show how document-conversion can happen. http://codesnippets.services.openoffice.org/Office/Office.ConvertDocuments.snip If you are running this from a J2EE application you need to take into consideration that ***one*** OO-Instance can not deal with multiple request at the same time, so must: - serialize access to OO - create a pool of instances you connect to and serialize access to them Tom Kirk Israel wrote: This project got backburnered but is now coming up again, the concept of integrating OOo's doc to HTML conversion as seamlessly as possible into an exisint J2EE application. My understanding is that OOo must be present (copied to, but not necceaarily installed, baed on Mathias' previous comments. At that point, it should be fairly easy to go through with the UNO libraries...is that about the size of it? Am I missing anything, or are there any resources that might make this easier? Thanks, Kirk
Re: [dev] scripted multiplatform .doc to .html conversion
Tom, thanks, that is very cool. I was able to get the snippet up and running... Through trial and error I got the correct Jars from my OOo directory I needed to compile against, and then Google indicated I needed to include the OOo/program directory in the classpath. Was there a smarter way I should have know the above? And in terms of expanding on that so that it have .doc as input (right now it seems to only accept .odt) and HTML as output (currently not one of the options listed in the program), are there any gotchas I should know about or is it just about finding some appropriate API documentation and doing the fairly obvious things? This was a great first step, many thanks! -Kirk On 4/19/06, Tom Schindl [EMAIL PROTECTED] wrote: Hi, there's a fully functional codesnippet available which does show how document-conversion can happen. http://codesnippets.services.openoffice.org/Office/Office.ConvertDocuments.snip If you are running this from a J2EE application you need to take into consideration that ***one*** OO-Instance can not deal with multiple request at the same time, so must: - serialize access to OO - create a pool of instances you connect to and serialize access to them Tom Kirk Israel wrote: This project got backburnered but is now coming up again, the concept of integrating OOo's doc to HTML conversion as seamlessly as possible into an exisint J2EE application. My understanding is that OOo must be present (copied to, but not necceaarily installed, baed on Mathias' previous comments. At that point, it should be fairly easy to go through with the UNO libraries...is that about the size of it? Am I missing anything, or are there any resources that might make this easier? Thanks, Kirk
Re: [dev] scripted multiplatform .doc to .html conversion
Kirk Israel wrote: Sorry, I'm not being willfully dense here...I understand that if I'm doing this through the API, there has to be an instance of OOo running, but are you saying that the segment of the source responsible for reading in Doc (and the other segment, reseponsible for spitting out HTML) is so tightly coupled with the rest of the system as a whole that extracting those two segments isn't feasible, that saying aha, THIS is the conversion function wouldn't get you anywhere, because it depends on so much other stuff working to run? I think you have a misconception how document conversion in OOo works. There is no direct translation between input and output format, input filters always convert the input format into a representation in memory (the core of a document) and the output filter converts this into the output format. If you think about this a little bit you will see that anything else doesn't make sense, at the end OOo is an application and not a conversion service: why should there be code that directly translates from e.g. doc to html? OOo itself doesn't need such code. So it will never make sense to isolate the filter code, you always also need the code of the document core also. Theoretically it is possible to take the code of the filters and the core and make it a smaller package but until now nobody needed something like this so very badly that he started the work to create such an environment. You will need a kind of an application anyway and you will need UNO and its bootstrapping, you will need some of the services in OOo used by the filters etc. So it's possible but quite some work to do and all you earn from the work to make it happen would be that you safe some MB on disk. Is that worth the effort? BTW: you don't need an *installed* version of OOo on your machine, it's enough to have a runnable *copy* (though in this case you have to create each UNO connection manually because your system doesn't provide a hint where the OOo installation is). Best regards, Mathias -- Mathias Bauer - OpenOffice.org Application Framework Project Lead Please reply to the list only, [EMAIL PROTECTED] is a spam sink. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [dev] scripted multiplatform .doc to .html conversion
Mathias, thank you for your feedback...I have a few responses. I think you have a misconception how document conversion in OOo works. There is no direct translation between input and output format, input filters always convert the input format into a representation in memory (the core of a document) and the output filter converts this into the output format. If you think about this a little bit you will see that anything else doesn't make sense, at the end OOo is an application and not a conversion service: why should there be code that directly translates from e.g. doc to html? OOo itself doesn't need such code. I assumed that it would be a doc to internal unmarshalling followed by a internal to HTML unmarshalling, for obvious reasons (like need 2n filters rather n!)...I guess I was envisioning a small(ish) bit of code that would do something like (in pseudojava) Document doc = OOoUtils.getDocument(HTML_CONVERTER,somefile.html); OOoUtils.writeDocument(DOC_CONVERTER,doc,output.doc); maybe with some Input/Output Streams or services instead, but that's the general jist. So it will never make sense to isolate the filter code, you always also need the code of the document core also. Theoretically it is possible to take the code of the filters and the core and make it a smaller package but until now nobody needed something like this so very badly that he started the work to create such an environment. You will need a kind of an application anyway and you will need UNO and its bootstrapping, you will need some of the services in OOo used by the filters etc. I see what you're getting at, the conversion process isn't self-contained but dependent on a series of services, strucutres, and what not. Just by reading some recent archives of this list, I'd say this kind of scripting is fairly sought after...but maybe the people who want to cherrypick the functionality aren't the same kind of people willing to put in the work to make it an isolated tool. So it's possible but quite some work to do and all you earn from the work to make it happen would be that you safe some MB on disk. Is that worth the effort? Quite possibly not...I think it was a desire for more easily embedding installation of just the conversion stuff rather than having OOo be a seperate install. If you could easily embed just a few filters and some supporting classes at the source code level into a larger project, that would make it more transparent to the user. BTW: you don't need an *installed* version of OOo on your machine, it's enough to have a runnable *copy* (though in this case you have to create each UNO connection manually because your system doesn't provide a hint where the OOo installation is). Aha, good to know. Best regards, Mathias Thank you! Kirk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [dev] scripted multiplatform .doc to .html conversion
Hi Krik, take a look into the SDK example java\DocumentHandling\DocumentConverter you can easy implement a Java remote client application doing the conversion for you. But you always need an installed office working as a server (for example with UI if necessary) Juergen Kirk Israel wrote: So the folks at my new job decided to really give me a trial by fire...they'd like me to outline a clear and detailed outline of how to include .doc to .html conversion in our product, in an automated kind of way. Openoffice seems to handle the basic task gracefully through the UI. Can anyone tell me if there's a commandline version that would enable this from the commandline? Or, possibly even better, is there a specific callable module responsible for this, is there an intermediate in-memory format that can be marshalled/unmarshalled with the various file formats? I'm at a bit of a loss to know where to start code diving...would it be a better idea for a n00b to start using the CVS feed, or is there a downloadable archive lurking around on one of the websites? Thanks for any and all advice! I'm really in dire straits here, so suggestions are acts of mercy... -Kirk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [dev] scripted multiplatform .doc to .html conversion
On 12/9/05, Laurent Godard [EMAIL PROTECTED] wrote: you may have a look at this, for a very frist shoot http://oooconv.free.fr/oooconv/oooconv_en.html So that's a webpage in PHP, and macro for use in an existing instance of OOo, making a web application for that kind of conversion? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [dev] scripted multiplatform .doc to .html conversion
On 12/9/05, Jürgen Schmidt [EMAIL PROTECTED] wrote: Hi Krik, take a look into the SDK example java\DocumentHandling\DocumentConverter you can easy implement a Java remote client application doing the conversion for you. But you always need an installed office working as a server (for example with UI if necessary) Hmm. Is your feeling then, that just the document functionality might too difficult to extract on a source code level? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [dev] scripted multiplatform .doc to .html conversion
Kirk Israel wrote: On 12/9/05, Jürgen Schmidt [EMAIL PROTECTED] wrote: Hi Krik, take a look into the SDK example java\DocumentHandling\DocumentConverter you can easy implement a Java remote client application doing the conversion for you. But you always need an installed office working as a server (for example with UI if necessary) Hmm. Is your feeling then, that just the document functionality might too difficult to extract on a source code level? Yes exactly, the current architecture doesn't allow to extract only this small part. Maybe it will be possible some time in the future ;-) For using the API you need always a runnig office instance. The other possiblity is to work directly on the xml file format and work with XSL transformations but that of course is not possible for most of binary formats. Juergen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [dev] scripted multiplatform .doc to .html conversion
Hi kirk So that's a webpage in PHP, and macro for use in an existing instance of OOo, making a web application for that kind of conversion? a 2 1/2 year old first shoot to give some ideas better can be done of course (and i will release when time a tool like this based on python, xml-rpc OOo) Laurent -- Laurent Godard [EMAIL PROTECTED] - Ingénierie OpenOffice.org Indesko http://www.indesko.com Nuxeo CPS http://www.nuxeo.com - http://www.cps-project.org Livre Programmation OpenOffice.org, Eyrolles 2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [dev] scripted multiplatform .doc to .html conversion
On 12/9/05, Jürgen Schmidt [EMAIL PROTECTED] wrote: Kirk Israel wrote: On 12/9/05, Jürgen Schmidt [EMAIL PROTECTED] wrote: Hmm. Is your feeling then, that just the document functionality might too difficult to extract on a source code level? Yes exactly, the current architecture doesn't allow to extract only this small part. Maybe it will be possible some time in the future ;-) For using the API you need always a runnig office instance. The other possiblity is to work directly on the xml file format and work with XSL transformations but that of course is not possible for most of binary formats. Sorry, I'm not being willfully dense here...I understand that if I'm doing this through the API, there has to be an instance of OOo running, but are you saying that the segment of the source responsible for reading in Doc (and the other segment, reseponsible for spitting out HTML) is so tightly coupled with the rest of the system as a whole that extracting those two segments isn't feasible, that saying aha, THIS is the conversion function wouldn't get you anywhere, because it depends on so much other stuff working to run? Dang, if that IS the case my manager isn't going to like that I'm shooting down the team's preferred cool new idea :-) Thanks, Kirk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [dev] scripted multiplatform .doc to .html conversion
Hi Openoffice seems to handle the basic task gracefully through the UI. Can anyone tell me if there's a commandline version that would enable this from the commandline? Or, possibly even better, is there a specific callable module responsible for this, is there an intermediate in-memory format that can be marshalled/unmarshalled with the various file formats? you may have a look at this, for a very frist shoot http://oooconv.free.fr/oooconv/oooconv_en.html Laurent -- Laurent Godard [EMAIL PROTECTED] - Ingénierie OpenOffice.org Indesko http://www.indesko.com Nuxeo CPS http://www.nuxeo.com - http://www.cps-project.org Livre Programmation OpenOffice.org, Eyrolles 2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]