Re: Writer and .docx
Hi Dave, all > On 10/18/2020 3:13 AM Dave Fisher wrote: > I think it means that OpenOffice could be the arbiter of converting OOXML > into ODF. As such I’m more interested of using tools like POI to drive that > conversion into ODF and leave the other direction to the commercial vendors. That is exactly my point. The Import should be as perfect as possible and there should be an Export to... (like in Gimp) menu option or an Extension for those who wish to keep using MS XML (and this function should be as perfect as possible, obviously). But saving and working on a document should always use Open Document as a format. > There is a similar but more complex semantically conversion of PDF into ODF. That could be added to the already existing (and unfortunately unmaintained) extension ;) Regards, Pedro - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
RE: Writer and .docx
Hello, @all: I am happy to see that there is a current discussion about OOXML filters, because this is an important topic. > -Original Message- > From: Dave Fisher [mailto:w...@apache.org] > Sent: Sunday, October 18, 2020 4:14 AM > To: dev@openoffice.apache.org > Subject: Re: Writer and .docx > > Hi - > > Top posting as well. I think we should consider our goals as > a project. If one of those goals is support for ODF as a > theory of everything office that you can trust and know will > remain parsable a century from now then what does that mean for OOXML? > > I think it means that OpenOffice could be the arbiter of > converting OOXML into ODF. As such I’m more interested of > using tools like POI to drive that conversion into ODF and > leave the other direction to the commercial vendors. > > There is a similar but more complex semantically conversion > of PDF into ODF. > > Alternatively, maybe there is a way to enhance plug-ability > of filters with more modern methods like OSGi. You're right in principle, but the real requirements in practice are the OOXML filters that are needed. by the way: The OOXML-filters there were not enforced by companies involved in LO, but the development of these filters was initially done by an initiative of a voluntary German organization (if I remember correctly: osb-alliance.de) Jörg - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
RE: Writer and .docx
> -Original Message- > From: Damjan Jovanovic [mailto:dam...@apache.org] > Sent: Sunday, October 18, 2020 5:19 AM > To: Apache OO > Subject: Re: Writer and .docx > > On Sun, Oct 18, 2020 at 4:13 AM Dave Fisher wrote: > > As for ODF, LO is publishing ODF 1.3, so unless we keep up, > we won't be > able to read all ODF soon either, let alone a century from now. Once again I say to ODF: we should make our decision with the needs of the users in mind and not 'politics'. For me this means that the preferred implementation is ISO-ODF (and not OASIS-ODF). LO's approach of always implementing the very latest OASIS version not only ensures the rapid development of ODF in practice, but also creates more incompatibility than necessary. In short: it does not seem to me to be reasonable to put Microsoft under pressure _if_ the interests of our users suffer as a result. Right ... what I just said also presupposes to communicate strategically with LO (and others), but we should not shy away from this effort. And no ... I don't think we should slow down the development of ODF. So let's put pressure on the ISO to quickly adopt existing new OASIS versions into their own ODF standard. Jörg - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
On Sun, Oct 18, 2020 at 4:13 AM Dave Fisher wrote: > Hi - > > Top posting as well. I think we should consider our goals as a project. If > one of those goals is support for ODF as a theory of everything office that > you can trust and know will remain parsable a century from now then what > does that mean for OOXML? > > Saving to OOXML is probably our most requested feature, so users definitely want it. Office suites are all about being broadly general and interoperable, there is a reason why we have the longest and most complex clipboard handling code I've ever seen, support for many raster and vector image formats, allow scripting and component development in many languages, and why we import and export documents in many formats from many vendors spanning many decades. 80% of the time users use 20% of the features, but there are many different groups of users each using a different 20% of the features. Many here sound like they don't need saving to OOXML, but the user groups that do (for whatever reason) will be negatively affected if we don't support it. As for ODF, LO is publishing ODF 1.3, so unless we keep up, we won't be able to read all ODF soon either, let alone a century from now. > I think it means that OpenOffice could be the arbiter of converting OOXML > into ODF. As such I’m more interested of using tools like POI to drive that > conversion into ODF and leave the other direction to the commercial vendors. > > We should certainly collaborate more with POI. I have OOXML documents which open in POI but have data missing when opened in AOO. We could compare what POI does vs what AOO does to improve our OOXML filter. There are probably similar cases where AOO is better and we could improve POI from it. > There is a similar but more complex semantically conversion of PDF into > ODF. > > Alternatively, maybe there is a way to enhance plug-ability of filters > with more modern methods like OSGi. > > I haven't had good experiences with OSGi, it breaks JDBC, breaks RMI, and has problems with other cases where classes are dynamically loaded at runtime without using OSGi's own APIs to do it. AOO's UNO can only load classes at runtime and use custom classloaders. > Regards, > Dave > > Regards Damjan
Re: Writer and .docx
Hi - Top posting as well. I think we should consider our goals as a project. If one of those goals is support for ODF as a theory of everything office that you can trust and know will remain parsable a century from now then what does that mean for OOXML? I think it means that OpenOffice could be the arbiter of converting OOXML into ODF. As such I’m more interested of using tools like POI to drive that conversion into ODF and leave the other direction to the commercial vendors. There is a similar but more complex semantically conversion of PDF into ODF. Alternatively, maybe there is a way to enhance plug-ability of filters with more modern methods like OSGi. Regards, Dave Sent from my iPhone > On Oct 17, 2020, at 1:55 PM, Hagar Delest wrote: > > Top posting. > I fully agree with Pedro. > The MS Office OOXML support is a core question for the project IMHO. > I think that the success of LO is heavily based on OOXML support. It provides > users something they believe is a clone of MS Office (or at least good enough > to meet their needs and exchange docx/xlsx/pptx with other users). > > But it is very detrimental to ODF because if it works so well for users using > OOXML, then why bother with another format (ODF)??? > > By the way, our company just upgraded from MS Office 2010 to 2016 and it is > quite a nightmare with many documents needing readjustments due to changes in > the OOXML version it seems (it provides again a "compatibility mode"). > Meaning that their own format is very likely to change again and again, > making it always difficult for the other applications to catch up with the > changes. > > The best method to work with OOXML is to buy MS Office. Why not focusing on > improving AOO first? > If people do need the OOXML export, then they will all switch to LO. If a AOO > user base remains, then better improve AOO for them. > > Even if AOO had a very good import/export filter, since it has less features > anyway, what would be the point exactly? > > Hagar > >> Le 17/10/2020 à 15:33, Pedro Lino a écrit : >> Hi Andrew >> On 10/17/2020 1:37 PM Andrew Pitonyak wrote: >>> (1) Sometimes contractually obligated to deliver some products in DOCX >>> format. I am pretty good at knowing what things will export properly to >>> DOCX format and which will not (just because I have done it often enough). >>> Only once have I had a client (it was government DoD) that would accept >>> (and even required) and ODT file. >> Then you are a very valuable person for helping in fixing these >> Import/Export issues >> I always send documents in Open Document (to EU, ICCAT, etc) and they will >> not refuse it ;) >> >> >>> (2) Frequently exchange documents with clients that will use/require DOCX. >> I usually save them in DOC and I haven't had complaints. In the cases where >> formatting is lost, then I have to switch to Windows 7 and an old copy of MS >> Office 2010 and even so there are issues sometimes (but I refuse to >> continually buy the new version that MS is pushing). I don't use LibreOffice >> for that because there is also format loss (sometimes even content loss) >> >>> (3) I frequently work with people who are better off not having to deal >>> with the extra steps of converting between formats. >> That is indeed an obstacle. But using DOC, XLS and PPT usually solves the >> problem. In fact I think that MS old formats have become the lingua franca >> of the office documents ;) >> >>> All else being equal, if you fall into the categories above, I usually tell >>> them to use LibreOffice because it will natively support reading and >>> writing DOCX format. When I ask people why they chose LibreOffice over >>> Apache OpenOffice, DOCX support is the reason usually listed. >> Yes, that is indeed one advantage. But as I mentioned before, there are >> serious glitches when using MS XML formats in LibreOffice and in addition >> this will help Microsoft make their format the standard. And I believe that >> is the wrong option. >> >> Many governments and organizations have accepted Open Document as a solution >> to be free from proprietary formats. The reason they can't switch to it is >> because all PCs are loaded with Windows and Office... And MS Office will >> only accept without warnings ODF documents created and edited in MS Office... >> >> I do not have a solution for this profit based bullying but accepting to use >> MS XML formats is becoming part of the problem and not part of the >> solution... >> >> Regards, >> Pedro >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org >> For additional commands, e-mail: dev-h...@openoffice.apache.org >> > > > - > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > For additional commands, e-mail: dev-h...@openoffice.apache.org >
Re: Writer and .docx
Top posting. I fully agree with Pedro. The MS Office OOXML support is a core question for the project IMHO. I think that the success of LO is heavily based on OOXML support. It provides users something they believe is a clone of MS Office (or at least good enough to meet their needs and exchange docx/xlsx/pptx with other users). But it is very detrimental to ODF because if it works so well for users using OOXML, then why bother with another format (ODF)??? By the way, our company just upgraded from MS Office 2010 to 2016 and it is quite a nightmare with many documents needing readjustments due to changes in the OOXML version it seems (it provides again a "compatibility mode"). Meaning that their own format is very likely to change again and again, making it always difficult for the other applications to catch up with the changes. The best method to work with OOXML is to buy MS Office. Why not focusing on improving AOO first? If people do need the OOXML export, then they will all switch to LO. If a AOO user base remains, then better improve AOO for them. Even if AOO had a very good import/export filter, since it has less features anyway, what would be the point exactly? Hagar Le 17/10/2020 à 15:33, Pedro Lino a écrit : Hi Andrew On 10/17/2020 1:37 PM Andrew Pitonyak wrote: (1) Sometimes contractually obligated to deliver some products in DOCX format. I am pretty good at knowing what things will export properly to DOCX format and which will not (just because I have done it often enough). Only once have I had a client (it was government DoD) that would accept (and even required) and ODT file. Then you are a very valuable person for helping in fixing these Import/Export issues I always send documents in Open Document (to EU, ICCAT, etc) and they will not refuse it ;) (2) Frequently exchange documents with clients that will use/require DOCX. I usually save them in DOC and I haven't had complaints. In the cases where formatting is lost, then I have to switch to Windows 7 and an old copy of MS Office 2010 and even so there are issues sometimes (but I refuse to continually buy the new version that MS is pushing). I don't use LibreOffice for that because there is also format loss (sometimes even content loss) (3) I frequently work with people who are better off not having to deal with the extra steps of converting between formats. That is indeed an obstacle. But using DOC, XLS and PPT usually solves the problem. In fact I think that MS old formats have become the lingua franca of the office documents ;) All else being equal, if you fall into the categories above, I usually tell them to use LibreOffice because it will natively support reading and writing DOCX format. When I ask people why they chose LibreOffice over Apache OpenOffice, DOCX support is the reason usually listed. Yes, that is indeed one advantage. But as I mentioned before, there are serious glitches when using MS XML formats in LibreOffice and in addition this will help Microsoft make their format the standard. And I believe that is the wrong option. Many governments and organizations have accepted Open Document as a solution to be free from proprietary formats. The reason they can't switch to it is because all PCs are loaded with Windows and Office... And MS Office will only accept without warnings ODF documents created and edited in MS Office... I do not have a solution for this profit based bullying but accepting to use MS XML formats is becoming part of the problem and not part of the solution... Regards, Pedro - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
Hi Andrew > On 10/17/2020 1:37 PM Andrew Pitonyak wrote: > (1) Sometimes contractually obligated to deliver some products in DOCX > format. I am pretty good at knowing what things will export properly to DOCX > format and which will not (just because I have done it often enough). Only > once have I had a client (it was government DoD) that would accept (and even > required) and ODT file. Then you are a very valuable person for helping in fixing these Import/Export issues I always send documents in Open Document (to EU, ICCAT, etc) and they will not refuse it ;) > (2) Frequently exchange documents with clients that will use/require DOCX. I usually save them in DOC and I haven't had complaints. In the cases where formatting is lost, then I have to switch to Windows 7 and an old copy of MS Office 2010 and even so there are issues sometimes (but I refuse to continually buy the new version that MS is pushing). I don't use LibreOffice for that because there is also format loss (sometimes even content loss) > (3) I frequently work with people who are better off not having to deal with > the extra steps of converting between formats. That is indeed an obstacle. But using DOC, XLS and PPT usually solves the problem. In fact I think that MS old formats have become the lingua franca of the office documents ;) > All else being equal, if you fall into the categories above, I usually tell > them to use LibreOffice because it will natively support reading and writing > DOCX format. When I ask people why they chose LibreOffice over Apache > OpenOffice, DOCX support is the reason usually listed. Yes, that is indeed one advantage. But as I mentioned before, there are serious glitches when using MS XML formats in LibreOffice and in addition this will help Microsoft make their format the standard. And I believe that is the wrong option. Many governments and organizations have accepted Open Document as a solution to be free from proprietary formats. The reason they can't switch to it is because all PCs are loaded with Windows and Office... And MS Office will only accept without warnings ODF documents created and edited in MS Office... I do not have a solution for this profit based bullying but accepting to use MS XML formats is becoming part of the problem and not part of the solution... Regards, Pedro - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
On Saturday, October 17, 2020 04:56 EDT, Pedro Lino wrote: Hi all > On 10/17/2020 9:11 AM Matthias Seidel wrote: > My point is that one should do the work in ODF and only export to > "foreign" formats if needed. +1 This is how Gimp works. You can import any format, work on it using the program's own format XCF (not Photoshop's PSD to please the majority) and in the end you can export to whatever format (PNG, JPG, etc) In fact the "foreign" formats don't even show up in the Save options. For me this is the best solution! On the other hand (as it happens in LibreOffice) exporting to Microsoft's XML will never be perfect (Microsoft will make sure!) and there will always be people complaining but it is far better that there is a single conversion before sending the document! (1) Sometimes contractually obligated to deliver some products in DOCX format. I am pretty good at knowing what things will export properly to DOCX format and which will not (just because I have done it often enough). Only once have I had a client (it was government DoD) that would accept (and even required) and ODT file. (2) Frequently exchange documents with clients that will use/require DOCX. (3) I frequently work with people who are better off not having to deal with the extra steps of converting between formats. All else being equal, if you fall into the categories above, I usually tell them to use LibreOffice because it will natively support reading and writing DOCX format. When I ask people why they chose LibreOffice over Apache OpenOffice, DOCX support is the reason usuall listed. Andrew Pitonyak
Re: Writer and .docx
Hi all > On 10/17/2020 9:11 AM Matthias Seidel wrote: > My point is that one should do the work in ODF and only export to > "foreign" formats if needed. +1 This is how Gimp works. You can import any format, work on it using the program's own format XCF (not Photoshop's PSD to please the majority) and in the end you can export to whatever format (PNG, JPG, etc) In fact the "foreign" formats don't even show up in the Save options. For me this is the best solution! On the other hand (as it happens in LibreOffice) exporting to Microsoft's XML will never be perfect (Microsoft will make sure!) and there will always be people complaining but it is far better that there is a single conversion before sending the document! Regards, Pedro - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
On Sat, 17 Oct 2020 10:05:45 +0200 Damjan Jovanovic wrote: > On Fri, Oct 16, 2020 at 11:50 AM Bidouille wrote: > > > > OpenOffice users can open documents in .docx format, but they cannot > > > save in that format. > > Well, remember that last version of Microsoft Office (since 2016) can open > > ODT format. > > > > > Unfortunately not all MS Office editions have ODT support, and many people > I sent ODT to complained they can't open it. Many MS Office users expect their File Associations to be set to allow them double click on a file to open it. Few of them know the longer way. -- Rory O'Farrell - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
Hi Damjan, Am 17.10.20 um 10:05 schrieb Damjan Jovanovic: > On Fri, Oct 16, 2020 at 11:50 AM Bidouille wrote: > >>> OpenOffice users can open documents in .docx format, but they cannot >>> save in that format. >> Well, remember that last version of Microsoft Office (since 2016) can open >> ODT format. >> >> > Unfortunately not all MS Office editions have ODT support, and many people > I sent ODT to complained they can't open it. Acknowledged! My point is that one should do the work in ODF and only export to "foreign" formats if needed. Export using POI was an idea we also got at ApacheCon Berlin in 2019. Developing it as an extension would be the best for all. Regards, Matthias > smime.p7s Description: S/MIME Cryptographic Signature
Re: Writer and .docx
On Fri, Oct 16, 2020 at 11:50 AM Bidouille wrote: > > OpenOffice users can open documents in .docx format, but they cannot > > save in that format. > Well, remember that last version of Microsoft Office (since 2016) can open > ODT format. > > Unfortunately not all MS Office editions have ODT support, and many people I sent ODT to complained they can't open it.
Re: Writer and .docx
I m also in support, whatever that is worth :) Am 16.10.20 um 21:56 schrieb Carl Marcum: I might get back into this next month, especially if others want to collaborate, but don't expect something generally usable, let alone Excel-quality XSLX saving, any time soon. Regards Damjan Yes I'm definitely interested in collaborating on this. Do you have a branch with your work in it? It's been 5 years and the code is in bits and pieces, but I'll try to put together a working branch over the weekend. Whenever you have time. Just let me know. Thanks, Carl - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
Hi Bidouille, Am 16.10.20 um 11:49 schrieb Bidouille: >> OpenOffice users can open documents in .docx format, but they cannot >> save in that format. > Well, remember that last version of Microsoft Office (since 2016) can open > ODT format. Exactly, instead of promoting Microsoft formats we should better encourage users to use ODF. Regards, Matthias > > > - > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > For additional commands, e-mail: dev-h...@openoffice.apache.org > smime.p7s Description: S/MIME Cryptographic Signature
Re: Writer and .docx
I might get back into this next month, especially if others want to collaborate, but don't expect something generally usable, let alone Excel-quality XSLX saving, any time soon. Regards Damjan Yes I'm definitely interested in collaborating on this. Do you have a branch with your work in it? It's been 5 years and the code is in bits and pieces, but I'll try to put together a working branch over the weekend. Whenever you have time. Just let me know. Thanks, Carl - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
On Fri, Oct 16, 2020 at 4:24 PM Carl Marcum wrote: > Hi Damjan, > > On 10/16/20 9:23 AM, Damjan Jovanovic wrote: > > On Fri, Oct 16, 2020 at 2:05 PM Dave Fisher > wrote: > > > >> Hi - > >> > >> Sent from my iPhone > >> > >>> On Oct 16, 2020, at 4:04 AM, Mechtilde wrote: > >>> > >>> Hello Joost, > >>> > >>> I'm very happy to read from you. > >>> > Am 16.10.20 um 12:50 schrieb Joost Andrae: > Hi Simon, > > it's an honor to me to see a sign of life of you here. Welcome ! > > Instead of user picking here to get users leave from AOO to LO a > developer could create a Java based OOo/LO extension that uses Apache > POI to export OpenDocument type documents to MSXML formats by using > the > binary MSO export to export those documents to the MSXML format in > between. Or maybe it's possible to XSL this document format by using > OpenOffice together with Apache POI. Using XSL scripts (in AOO menu > item > XML filter settings) to make document conversions is possible within > >> OOo. > >>> I offer my help to test the implementation. sorry but I'm not a > >>> programmer. So we as the project need help from Java programmers to > work > >>> on it and contribute it. > >> I’m a PMC Member of Apache POI for over 12 years. My team donated the > >> initial PowerPoint support and were involved in the initial support for > >> OOXML. > >> > >> POI is embedded into Apache SOLr and Tika along with commercial > products. > >> The project took over the dormant XMLBeans project and is releasing a > 4.0 > >> that supports modern Java. > >> > >> An OSGi bundle of POI will be available in the next release if you build > >> from source. > >> > >> The Tika, POI, and PDFBox projects maintain a large regression corpus > >> scraped from the internet using CommonCrawl. I’m sure that this could be > >> shared in one way or another. > >> > >> Regards, > >> Dave > >> > >> > > Hi > > > > I did start writing a POI-based OOXML export filter for AOO some years > ago > > (search the dev mailing list), and got it to the point of being able to > > save very basic spreadsheets (no formulas, no formatting, just text and > > numbers). > > > > There were several major problems with using POI. > > > > Firstly the code in POI is at various stages of completeness. The legacy > > XLS filter is very good, supports SAX parsing, etc. The DOC filter is > > minimal and unmaintained. What we would need, the OOXML filter for at > least > > XLSX, is somewhere in between. AFAIK it only supports DOM parsing, > meaning > > everything needs to be in memory before it can be written to disk, so a > big > > spreadsheet could consume gigabytes of RAM during saving, and if you > don't > > have enough memory free, you can't save! > > > > Also I do use POI at work, and it's outstanding for parsing spreadsheets > > (it can even parse some that AOO can't), but it's very memory hungry. A > > spreadsheet with 10 rows consumed 6 GB of RAM, compared to 200 MB in > LO > > (30 times less). That isn't really POI's fault, Java has too much > > per-object overhead and there are a great many objects in a spreadsheet > > that big. So DOM + Java really do not add up to efficient memory usage. > By > > comparison, our current OOXML reading is not only SAX-based, but converts > > XML tags to integers for faster comparisons and lower memory usage. > > > > Finally AOO itself had limitations that made developing a filter in Java > > difficult. Each sheet in a spreadsheet has 1 billion cells. Obviously > only > > a minority of these contain data - most are empty. In C++ there are > special > > iterators that can be used to access only the non-empty cells, but these > > are not exposed to UNO, or through it, to Java. The only way to tell > which > > cells are in use is to iterate over all 1 billion cells (per sheet), > which > > is hopelessly slow. > > > > Some of these problems can be solved. We can expose the cell iterators > over > > UNO. The memory usage might not matter that much in practice, and we > could > > patch POI to do SAX parsing/saving at a later stage. But users expect > > fonts, styles, charts, images, custom formats, OLE, pivot tables, VBA > > macros, form controls, mathematical formulas, change tracking, etc. all > > saved losslessly and 100% compatible with Excel, which doesn't only > require > > work in the filter, but in the rest of AOO too, and POI probably doesn't > > support all of those features either. > I'm not sure if you've look at the newer Streaming Usermodel API SXSSF. > It may help for memory consumption in this case. > > Can SXSSF work with formulas that reference earlier cells? > > > > I might get back into this next month, especially if others want to > > collaborate, but don't expect something generally usable, let alone > > Excel-quality XSLX saving, any time soon. > > > > Regards > > Damjan > > > Yes I'm definitely interested in collaborating on this. > Do you have a branch with your work
Re: Writer and .docx
Hi Damjan, On 10/16/20 9:23 AM, Damjan Jovanovic wrote: On Fri, Oct 16, 2020 at 2:05 PM Dave Fisher wrote: Hi - Sent from my iPhone On Oct 16, 2020, at 4:04 AM, Mechtilde wrote: Hello Joost, I'm very happy to read from you. Am 16.10.20 um 12:50 schrieb Joost Andrae: Hi Simon, it's an honor to me to see a sign of life of you here. Welcome ! Instead of user picking here to get users leave from AOO to LO a developer could create a Java based OOo/LO extension that uses Apache POI to export OpenDocument type documents to MSXML formats by using the binary MSO export to export those documents to the MSXML format in between. Or maybe it's possible to XSL this document format by using OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item XML filter settings) to make document conversions is possible within OOo. I offer my help to test the implementation. sorry but I'm not a programmer. So we as the project need help from Java programmers to work on it and contribute it. I’m a PMC Member of Apache POI for over 12 years. My team donated the initial PowerPoint support and were involved in the initial support for OOXML. POI is embedded into Apache SOLr and Tika along with commercial products. The project took over the dormant XMLBeans project and is releasing a 4.0 that supports modern Java. An OSGi bundle of POI will be available in the next release if you build from source. The Tika, POI, and PDFBox projects maintain a large regression corpus scraped from the internet using CommonCrawl. I’m sure that this could be shared in one way or another. Regards, Dave Hi I did start writing a POI-based OOXML export filter for AOO some years ago (search the dev mailing list), and got it to the point of being able to save very basic spreadsheets (no formulas, no formatting, just text and numbers). There were several major problems with using POI. Firstly the code in POI is at various stages of completeness. The legacy XLS filter is very good, supports SAX parsing, etc. The DOC filter is minimal and unmaintained. What we would need, the OOXML filter for at least XLSX, is somewhere in between. AFAIK it only supports DOM parsing, meaning everything needs to be in memory before it can be written to disk, so a big spreadsheet could consume gigabytes of RAM during saving, and if you don't have enough memory free, you can't save! Also I do use POI at work, and it's outstanding for parsing spreadsheets (it can even parse some that AOO can't), but it's very memory hungry. A spreadsheet with 10 rows consumed 6 GB of RAM, compared to 200 MB in LO (30 times less). That isn't really POI's fault, Java has too much per-object overhead and there are a great many objects in a spreadsheet that big. So DOM + Java really do not add up to efficient memory usage. By comparison, our current OOXML reading is not only SAX-based, but converts XML tags to integers for faster comparisons and lower memory usage. Finally AOO itself had limitations that made developing a filter in Java difficult. Each sheet in a spreadsheet has 1 billion cells. Obviously only a minority of these contain data - most are empty. In C++ there are special iterators that can be used to access only the non-empty cells, but these are not exposed to UNO, or through it, to Java. The only way to tell which cells are in use is to iterate over all 1 billion cells (per sheet), which is hopelessly slow. Some of these problems can be solved. We can expose the cell iterators over UNO. The memory usage might not matter that much in practice, and we could patch POI to do SAX parsing/saving at a later stage. But users expect fonts, styles, charts, images, custom formats, OLE, pivot tables, VBA macros, form controls, mathematical formulas, change tracking, etc. all saved losslessly and 100% compatible with Excel, which doesn't only require work in the filter, but in the rest of AOO too, and POI probably doesn't support all of those features either. I'm not sure if you've look at the newer Streaming Usermodel API SXSSF. It may help for memory consumption in this case. I might get back into this next month, especially if others want to collaborate, but don't expect something generally usable, let alone Excel-quality XSLX saving, any time soon. Regards Damjan Yes I'm definitely interested in collaborating on this. Do you have a branch with your work in it? Thanks, Carl - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
On Fri, Oct 16, 2020 at 2:05 PM Dave Fisher wrote: > Hi - > > Sent from my iPhone > > > On Oct 16, 2020, at 4:04 AM, Mechtilde wrote: > > > > Hello Joost, > > > > I'm very happy to read from you. > > > >> Am 16.10.20 um 12:50 schrieb Joost Andrae: > >> Hi Simon, > >> > >> it's an honor to me to see a sign of life of you here. Welcome ! > >> > >> Instead of user picking here to get users leave from AOO to LO a > >> developer could create a Java based OOo/LO extension that uses Apache > >> POI to export OpenDocument type documents to MSXML formats by using the > >> binary MSO export to export those documents to the MSXML format in > >> between. Or maybe it's possible to XSL this document format by using > >> OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item > >> XML filter settings) to make document conversions is possible within > OOo. > > > > I offer my help to test the implementation. sorry but I'm not a > > programmer. So we as the project need help from Java programmers to work > > on it and contribute it. > > I’m a PMC Member of Apache POI for over 12 years. My team donated the > initial PowerPoint support and were involved in the initial support for > OOXML. > > POI is embedded into Apache SOLr and Tika along with commercial products. > The project took over the dormant XMLBeans project and is releasing a 4.0 > that supports modern Java. > > An OSGi bundle of POI will be available in the next release if you build > from source. > > The Tika, POI, and PDFBox projects maintain a large regression corpus > scraped from the internet using CommonCrawl. I’m sure that this could be > shared in one way or another. > > Regards, > Dave > > Hi I did start writing a POI-based OOXML export filter for AOO some years ago (search the dev mailing list), and got it to the point of being able to save very basic spreadsheets (no formulas, no formatting, just text and numbers). There were several major problems with using POI. Firstly the code in POI is at various stages of completeness. The legacy XLS filter is very good, supports SAX parsing, etc. The DOC filter is minimal and unmaintained. What we would need, the OOXML filter for at least XLSX, is somewhere in between. AFAIK it only supports DOM parsing, meaning everything needs to be in memory before it can be written to disk, so a big spreadsheet could consume gigabytes of RAM during saving, and if you don't have enough memory free, you can't save! Also I do use POI at work, and it's outstanding for parsing spreadsheets (it can even parse some that AOO can't), but it's very memory hungry. A spreadsheet with 10 rows consumed 6 GB of RAM, compared to 200 MB in LO (30 times less). That isn't really POI's fault, Java has too much per-object overhead and there are a great many objects in a spreadsheet that big. So DOM + Java really do not add up to efficient memory usage. By comparison, our current OOXML reading is not only SAX-based, but converts XML tags to integers for faster comparisons and lower memory usage. Finally AOO itself had limitations that made developing a filter in Java difficult. Each sheet in a spreadsheet has 1 billion cells. Obviously only a minority of these contain data - most are empty. In C++ there are special iterators that can be used to access only the non-empty cells, but these are not exposed to UNO, or through it, to Java. The only way to tell which cells are in use is to iterate over all 1 billion cells (per sheet), which is hopelessly slow. Some of these problems can be solved. We can expose the cell iterators over UNO. The memory usage might not matter that much in practice, and we could patch POI to do SAX parsing/saving at a later stage. But users expect fonts, styles, charts, images, custom formats, OLE, pivot tables, VBA macros, form controls, mathematical formulas, change tracking, etc. all saved losslessly and 100% compatible with Excel, which doesn't only require work in the filter, but in the rest of AOO too, and POI probably doesn't support all of those features either. I might get back into this next month, especially if others want to collaborate, but don't expect something generally usable, let alone Excel-quality XSLX saving, any time soon. Regards Damjan
Re: Writer and .docx
Hi - Sent from my iPhone > On Oct 16, 2020, at 4:04 AM, Mechtilde wrote: > > Hello Joost, > > I'm very happy to read from you. > >> Am 16.10.20 um 12:50 schrieb Joost Andrae: >> Hi Simon, >> >> it's an honor to me to see a sign of life of you here. Welcome ! >> >> Instead of user picking here to get users leave from AOO to LO a >> developer could create a Java based OOo/LO extension that uses Apache >> POI to export OpenDocument type documents to MSXML formats by using the >> binary MSO export to export those documents to the MSXML format in >> between. Or maybe it's possible to XSL this document format by using >> OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item >> XML filter settings) to make document conversions is possible within OOo. > > I offer my help to test the implementation. sorry but I'm not a > programmer. So we as the project need help from Java programmers to work > on it and contribute it. I’m a PMC Member of Apache POI for over 12 years. My team donated the initial PowerPoint support and were involved in the initial support for OOXML. POI is embedded into Apache SOLr and Tika along with commercial products. The project took over the dormant XMLBeans project and is releasing a 4.0 that supports modern Java. An OSGi bundle of POI will be available in the next release if you build from source. The Tika, POI, and PDFBox projects maintain a large regression corpus scraped from the internet using CommonCrawl. I’m sure that this could be shared in one way or another. Regards, Dave > >> >> Document conversions do not necessarily need to be a native >> implementation within AOO or LO. >> >> Kind regards, Joost > > Kind regards > > > -- > Mechtilde Stehmann > ## Apache OpenOffice > ## Freie Office Suite für Linux, MacOSX, Windows und OS/2 > ## Debian Developer > ## PGP encryption welcome > ## F0E3 7F3D C87A 4998 2899 39E7 F287 7BBA 141A AD7F > - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
On Fri, Oct 16, 2020 at 12:26 PM Joost Andrae wrote: > regarding the documentliberation stuff: > I know some of those filters for a longer time. AFAIK these where > implementations from one of the Novell guys (Fridrich Strba as far as I > remember; see http://fridrich.blogspot.com/ ) > Yes, DLP is his project. S.
Re: Writer and .docx
Hi, regarding the documentliberation stuff: I know some of those filters for a longer time. AFAIK these where implementations from one of the Novell guys (Fridrich Strba as far as I remember; see http://fridrich.blogspot.com/ ) Best, Joost Am 16.10.2020 um 13:17 schrieb Joost Andrae: Hi Simon, some developer just needs to try to implement this kind of approach. Fortunately this developer doesn't really need to dive deeply into AOO implementation details except creating an extension. From my knowlege the Apache POI implementation is quite stable and it's used by a lot of software projects since a longer time. Best regards, Joost Am 16.10.2020 um 13:05 schrieb Simon Phipps: Hi Joost! On Fri, Oct 16, 2020 at 11:49 AM Joost Andrae wrote: Hi Simon, it's an honor to me to see a sign of life of you here. Welcome ! I've been a relatively active member here from the beginning! Instead of user picking here to get users leave from AOO to LO a developer could create a Java based OOo/LO extension that uses Apache POI to export OpenDocument type documents to MSXML formats by using the binary MSO export to export those documents to the MSXML format in between. Or maybe it's possible to XSL this document format by using OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item XML filter settings) to make document conversions is possible within OOo. That sounds an interesting new user feature. Rather than only using POI, a pluggable approach that could also use libraries from the Document Liberation Project https://www.documentliberation.org/ would be excellent. Cheers, Simon - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
Hi Simon, some developer just needs to try to implement this kind of approach. Fortunately this developer doesn't really need to dive deeply into AOO implementation details except creating an extension. From my knowlege the Apache POI implementation is quite stable and it's used by a lot of software projects since a longer time. Best regards, Joost Am 16.10.2020 um 13:05 schrieb Simon Phipps: Hi Joost! On Fri, Oct 16, 2020 at 11:49 AM Joost Andrae wrote: Hi Simon, it's an honor to me to see a sign of life of you here. Welcome ! I've been a relatively active member here from the beginning! Instead of user picking here to get users leave from AOO to LO a developer could create a Java based OOo/LO extension that uses Apache POI to export OpenDocument type documents to MSXML formats by using the binary MSO export to export those documents to the MSXML format in between. Or maybe it's possible to XSL this document format by using OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item XML filter settings) to make document conversions is possible within OOo. That sounds an interesting new user feature. Rather than only using POI, a pluggable approach that could also use libraries from the Document Liberation Project https://www.documentliberation.org/ would be excellent. Cheers, Simon - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
Hi Joost! On Fri, Oct 16, 2020 at 11:49 AM Joost Andrae wrote: > Hi Simon, > > it's an honor to me to see a sign of life of you here. Welcome ! > I've been a relatively active member here from the beginning! Instead of user picking here to get users leave from AOO to LO a > developer could create a Java based OOo/LO extension that uses Apache > POI to export OpenDocument type documents to MSXML formats by using the > binary MSO export to export those documents to the MSXML format in > between. Or maybe it's possible to XSL this document format by using > OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item > XML filter settings) to make document conversions is possible within OOo. > That sounds an interesting new user feature. Rather than only using POI, a pluggable approach that could also use libraries from the Document Liberation Project https://www.documentliberation.org/ would be excellent. Cheers, Simon
Re: Writer and .docx
Hello Joost, I'm very happy to read from you. Am 16.10.20 um 12:50 schrieb Joost Andrae: > Hi Simon, > > it's an honor to me to see a sign of life of you here. Welcome ! > > Instead of user picking here to get users leave from AOO to LO a > developer could create a Java based OOo/LO extension that uses Apache > POI to export OpenDocument type documents to MSXML formats by using the > binary MSO export to export those documents to the MSXML format in > between. Or maybe it's possible to XSL this document format by using > OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item > XML filter settings) to make document conversions is possible within OOo. I offer my help to test the implementation. sorry but I'm not a programmer. So we as the project need help from Java programmers to work on it and contribute it. > > Document conversions do not necessarily need to be a native > implementation within AOO or LO. > > Kind regards, Joost Kind regards -- Mechtilde Stehmann ## Apache OpenOffice ## Freie Office Suite für Linux, MacOSX, Windows und OS/2 ## Debian Developer ## PGP encryption welcome ## F0E3 7F3D C87A 4998 2899 39E7 F287 7BBA 141A AD7F signature.asc Description: OpenPGP digital signature
Re: Writer and .docx
Hi Simon, it's an honor to me to see a sign of life of you here. Welcome ! Instead of user picking here to get users leave from AOO to LO a developer could create a Java based OOo/LO extension that uses Apache POI to export OpenDocument type documents to MSXML formats by using the binary MSO export to export those documents to the MSXML format in between. Or maybe it's possible to XSL this document format by using OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item XML filter settings) to make document conversions is possible within OOo. Document conversions do not necessarily need to be a native implementation within AOO or LO. Kind regards, Joost Am 16.10.2020 um 12:16 schrieb Simon Phipps: Hi! As Peter said, it seems unlikely this branch of OpenOffice.org will be enhanced with the ability to write .DOCX format. However, another branch has added this capability and offers all the other convenient options you mention as well. You can get it from our "sister" community at https://libreoffice.org/download Cheers Simon On Fri, Oct 16, 2020 at 8:44 AM Наталья Василенко wrote: Hello! I would like to know is there any hope that users can save documents in Writer in .docx format? In the latest version of your OpenOffice users can open documents in .docx format, but they cannot save in that format. I think it is not comfortable for many users with the fact that your product is very convenient in other options. Could this feature be enabled in later versions of your product? Thank you for your response. - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
Hi! As Peter said, it seems unlikely this branch of OpenOffice.org will be enhanced with the ability to write .DOCX format. However, another branch has added this capability and offers all the other convenient options you mention as well. You can get it from our "sister" community at https://libreoffice.org/download Cheers Simon On Fri, Oct 16, 2020 at 8:44 AM Наталья Василенко wrote: > Hello! I would like to know is there any hope that users can save > documents in Writer in .docx format? In the latest version of your > OpenOffice users can open documents in .docx format, but they cannot save > in that format. I think it is not comfortable for many users with the fact > that your product is very convenient in other options. > Could this feature be enabled in later versions of your product? > > Thank you for your response. >
Re: Writer and .docx
> OpenOffice users can open documents in .docx format, but they cannot > save in that format. Well, remember that last version of Microsoft Office (since 2016) can open ODT format. - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: Writer and .docx
Am 16.10.20 um 09:25 schrieb Наталья Василенко: Hello! I would like to know is there any hope that users can save documents in Writer in .docx format? In the latest version of your OpenOffice users can open documents in .docx format, but they cannot save in that format. I think it is not comfortable for many users with the fact that your product is very convenient in other options. Could this feature be enabled in later versions of your product? Yes we want to improve the capability to read and write documents produced by Microsoft Office. However, our development community is made up of volunteers working in their free time, next to a regular job. We have no timeline when we will add this feature, but we know this feature has a high demand within the user base of OpenOffice. I do not expect that this feature will be imporved soon. Thank you for your response. You are Welcome. - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org