Re: Writer and .docx

2020-10-18 Thread Pedro Lino
Hi Dave, all

> On 10/18/2020 3:13 AM Dave Fisher  wrote:

> I think it means that OpenOffice could be the arbiter of converting OOXML 
> into ODF. As such I’m more interested of using tools like POI to drive that 
> conversion into ODF and leave the other direction to the commercial vendors.

That is exactly my point. The Import should be as perfect as possible and there 
should be an Export to... (like in Gimp) menu option or an Extension for those 
who wish to keep using MS XML (and this function should be as perfect as 
possible, obviously). But saving and working on a document should always use 
Open Document as a format.
 
> There is a similar but more complex semantically conversion of PDF into ODF.

That could be added to the already existing (and unfortunately unmaintained) 
extension ;) 

Regards,
Pedro

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



RE: Writer and .docx

2020-10-18 Thread Jörg Schmidt
Hello,

@all:
I am happy to see that there is a current discussion about OOXML filters, 
because this is an important topic. 

> -Original Message-
> From: Dave Fisher [mailto:w...@apache.org] 
> Sent: Sunday, October 18, 2020 4:14 AM
> To: dev@openoffice.apache.org
> Subject: Re: Writer and .docx
> 
> Hi -
> 
> Top posting as well. I think we should consider our goals as 
> a project. If one of those goals is support for ODF as a 
> theory of everything office that you can trust and know will 
> remain parsable a century from now then what does that mean for OOXML?
> 
> I think it means that OpenOffice could be the arbiter of 
> converting OOXML into ODF. As such I’m more interested of 
> using tools like POI to drive that conversion into ODF and 
> leave the other direction to the commercial vendors.
> 
> There is a similar but more complex semantically conversion 
> of PDF into ODF. 
> 
> Alternatively, maybe there is a way to enhance plug-ability 
> of filters with more modern methods like OSGi.

You're right in principle, but the real requirements in practice are the OOXML 
filters that are needed.

by the way:
The OOXML-filters there were not enforced by companies involved in LO, but the 
development of these filters was initially done by an initiative of a voluntary 
German organization (if I remember correctly: osb-alliance.de)



Jörg


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



RE: Writer and .docx

2020-10-18 Thread Jörg Schmidt
> -Original Message-
> From: Damjan Jovanovic [mailto:dam...@apache.org] 
> Sent: Sunday, October 18, 2020 5:19 AM
> To: Apache OO
> Subject: Re: Writer and .docx
> 
> On Sun, Oct 18, 2020 at 4:13 AM Dave Fisher  wrote:
> 

> As for ODF, LO is publishing ODF 1.3, so unless we keep up, 
> we won't be
> able to read all ODF soon either, let alone a century from now.

Once again I say to ODF:

we should make our decision with the needs of the users in mind and not 
'politics'.
For me this means that the preferred implementation is ISO-ODF (and not 
OASIS-ODF).

LO's approach of always implementing the very latest OASIS version not only 
ensures the rapid development of ODF in practice, but also creates more 
incompatibility than necessary.
In short: it does not seem to me to be reasonable to put Microsoft under 
pressure _if_ the interests of our users suffer as a result.

Right ... what I just said also presupposes to communicate strategically with 
LO (and others), but we should not shy away from this effort.

And no ... I don't think we should slow down the development of ODF. So let's 
put pressure on the ISO to quickly adopt existing new OASIS versions into their 
own ODF standard.



Jörg


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-17 Thread Damjan Jovanovic
On Sun, Oct 18, 2020 at 4:13 AM Dave Fisher  wrote:

> Hi -
>
> Top posting as well. I think we should consider our goals as a project. If
> one of those goals is support for ODF as a theory of everything office that
> you can trust and know will remain parsable a century from now then what
> does that mean for OOXML?
>
>
Saving to OOXML is probably our most requested feature, so users definitely
want it. Office suites are all about being broadly general and
interoperable, there is a reason why we have the longest and most complex
clipboard handling code I've ever seen, support for many raster and vector
image formats, allow scripting and component development in many languages,
and why we import and export documents in many formats from many vendors
spanning many decades. 80% of the time users use 20% of the features, but
there are many different groups of users each using a different 20% of the
features. Many here sound like they don't need saving to OOXML, but the
user groups that do (for whatever reason) will be negatively affected if we
don't support it.

As for ODF, LO is publishing ODF 1.3, so unless we keep up, we won't be
able to read all ODF soon either, let alone a century from now.


> I think it means that OpenOffice could be the arbiter of converting OOXML
> into ODF. As such I’m more interested of using tools like POI to drive that
> conversion into ODF and leave the other direction to the commercial vendors.
>
>
We should certainly collaborate more with POI. I have OOXML documents which
open in POI but have data missing when opened in AOO. We could compare what
POI does vs what AOO does to improve our OOXML filter. There are probably
similar cases where AOO is better and we could improve POI from it.


> There is a similar but more complex semantically conversion of PDF into
> ODF.
>
> Alternatively, maybe there is a way to enhance plug-ability of filters
> with more modern methods like OSGi.
>
>
I haven't had good experiences with OSGi, it breaks JDBC, breaks RMI, and
has problems with other cases where classes are dynamically loaded at
runtime without using OSGi's own APIs to do it. AOO's UNO can only load
classes at runtime and use custom classloaders.



> Regards,
> Dave
>
>
Regards
Damjan


Re: Writer and .docx

2020-10-17 Thread Dave Fisher
Hi -

Top posting as well. I think we should consider our goals as a project. If one 
of those goals is support for ODF as a theory of everything office that you can 
trust and know will remain parsable a century from now then what does that mean 
for OOXML?

I think it means that OpenOffice could be the arbiter of converting OOXML into 
ODF. As such I’m more interested of using tools like POI to drive that 
conversion into ODF and leave the other direction to the commercial vendors.

There is a similar but more complex semantically conversion of PDF into ODF. 

Alternatively, maybe there is a way to enhance plug-ability of filters with 
more modern methods like OSGi.

Regards,
Dave

Sent from my iPhone

> On Oct 17, 2020, at 1:55 PM, Hagar Delest  wrote:
> 
> Top posting.
> I fully agree with Pedro.
> The MS Office OOXML support is a core question for the project IMHO.
> I think that the success of LO is heavily based on OOXML support. It provides 
> users something they believe is a clone of MS Office (or at least good enough 
> to meet their needs and exchange docx/xlsx/pptx with other users).
> 
> But it is very detrimental to ODF because if it works so well for users using 
> OOXML, then why bother with another format (ODF)???
> 
> By the way, our company just upgraded from MS Office 2010 to 2016 and it is 
> quite a nightmare with many documents needing readjustments due to changes in 
> the OOXML version it seems (it provides again a "compatibility mode").
> Meaning that their own format is very likely to change again and again, 
> making it always difficult for the other applications to catch up with the 
> changes.
> 
> The best method to work with OOXML is to buy MS Office. Why not focusing on 
> improving AOO first?
> If people do need the OOXML export, then they will all switch to LO. If a AOO 
> user base remains, then better improve AOO for them.
> 
> Even if AOO had a very good import/export filter, since it has less features 
> anyway, what would be the point exactly?
> 
> Hagar
> 
>> Le 17/10/2020 à 15:33, Pedro Lino a écrit :
>> Hi Andrew
>> 
 On 10/17/2020 1:37 PM Andrew Pitonyak  wrote:
>>> (1) Sometimes contractually obligated to deliver some products in DOCX 
>>> format. I am pretty good at knowing what things will export properly to 
>>> DOCX format and which will not (just because I have done it often enough). 
>>> Only once have I had a client (it was government DoD) that would accept 
>>> (and even required) and ODT file.
>> Then you are a very valuable person for helping in fixing these 
>> Import/Export issues
>> I always send documents in Open Document (to EU, ICCAT, etc) and they will 
>> not refuse it ;)
>> 
>> 
>>> (2) Frequently exchange documents with clients that will use/require DOCX.
>> I usually save them in DOC and I haven't had complaints. In the cases where 
>> formatting is lost, then I have to switch to Windows 7 and an old copy of MS 
>> Office 2010 and even so there are issues sometimes (but I refuse to 
>> continually buy the new version that MS is pushing). I don't use LibreOffice 
>> for that because there is also format loss (sometimes even content loss)
>> 
>>> (3) I frequently work with people who are better off not having to deal 
>>> with the extra steps of converting between formats.
>> That is indeed an obstacle. But using DOC, XLS and PPT usually solves the 
>> problem. In fact I think that MS old formats have become the lingua franca 
>> of the office documents ;)
>> 
>>> All else being equal, if you fall into the categories above, I usually tell 
>>> them to use LibreOffice because it will natively support reading and 
>>> writing DOCX format. When I ask people why they chose LibreOffice over 
>>> Apache OpenOffice, DOCX support is the reason usually listed.
>> Yes, that is indeed one advantage. But as I mentioned before, there are 
>> serious glitches when using MS XML formats in LibreOffice and in addition 
>> this will help Microsoft make their format the standard. And I believe that 
>> is the wrong option.
>> 
>> Many governments and organizations have accepted Open Document as a solution 
>> to be free from proprietary formats. The reason they can't switch to it is 
>> because all PCs are loaded with Windows and Office... And MS Office will 
>> only accept without warnings ODF documents created and edited in MS Office...
>> 
>> I do not have a solution for this profit based bullying but accepting to use 
>> MS XML formats is becoming part of the problem and not part of the 
>> solution...
>> 
>> Regards,
>> Pedro
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
>> For additional commands, e-mail: dev-h...@openoffice.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
> 



Re: Writer and .docx

2020-10-17 Thread Hagar Delest

Top posting.
I fully agree with Pedro.
The MS Office OOXML support is a core question for the project IMHO.
I think that the success of LO is heavily based on OOXML support. It 
provides users something they believe is a clone of MS Office (or at 
least good enough to meet their needs and exchange docx/xlsx/pptx with 
other users).


But it is very detrimental to ODF because if it works so well for users 
using OOXML, then why bother with another format (ODF)???


By the way, our company just upgraded from MS Office 2010 to 2016 and it 
is quite a nightmare with many documents needing readjustments due to 
changes in the OOXML version it seems (it provides again a 
"compatibility mode").
Meaning that their own format is very likely to change again and again, 
making it always difficult for the other applications to catch up with 
the changes.


The best method to work with OOXML is to buy MS Office. Why not focusing 
on improving AOO first?
If people do need the OOXML export, then they will all switch to LO. If 
a AOO user base remains, then better improve AOO for them.


Even if AOO had a very good import/export filter, since it has less 
features anyway, what would be the point exactly?


Hagar

Le 17/10/2020 à 15:33, Pedro Lino a écrit :

Hi Andrew


On 10/17/2020 1:37 PM Andrew Pitonyak  wrote:
(1) Sometimes contractually obligated to deliver some products in DOCX format. 
I am pretty good at knowing what things will export properly to DOCX format and 
which will not (just because I have done it often enough). Only once have I had 
a client (it was government DoD) that would accept (and even required) and ODT 
file.

Then you are a very valuable person for helping in fixing these Import/Export 
issues
I always send documents in Open Document (to EU, ICCAT, etc) and they will not 
refuse it ;)



(2) Frequently exchange documents with clients that will use/require DOCX.

I usually save them in DOC and I haven't had complaints. In the cases where 
formatting is lost, then I have to switch to Windows 7 and an old copy of MS 
Office 2010 and even so there are issues sometimes (but I refuse to continually 
buy the new version that MS is pushing). I don't use LibreOffice for that 
because there is also format loss (sometimes even content loss)


(3) I frequently work with people who are better off not having to deal with 
the extra steps of converting between formats.

That is indeed an obstacle. But using DOC, XLS and PPT usually solves the 
problem. In fact I think that MS old formats have become the lingua franca of 
the office documents ;)


All else being equal, if you fall into the categories above, I usually tell 
them to use LibreOffice because it will natively support reading and writing 
DOCX format. When I ask people why they chose LibreOffice over Apache 
OpenOffice, DOCX support is the reason usually listed.

Yes, that is indeed one advantage. But as I mentioned before, there are serious 
glitches when using MS XML formats in LibreOffice and in addition this will 
help Microsoft make their format the standard. And I believe that is the wrong 
option.

Many governments and organizations have accepted Open Document as a solution to 
be free from proprietary formats. The reason they can't switch to it is because 
all PCs are loaded with Windows and Office... And MS Office will only accept 
without warnings ODF documents created and edited in MS Office...

I do not have a solution for this profit based bullying but accepting to use MS 
XML formats is becoming part of the problem and not part of the solution...

Regards,
Pedro

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-17 Thread Pedro Lino
Hi Andrew

> On 10/17/2020 1:37 PM Andrew Pitonyak  wrote:

> (1) Sometimes contractually obligated to deliver some products in DOCX 
> format. I am pretty good at knowing what things will export properly to DOCX 
> format and which will not (just because I have done it often enough). Only 
> once have I had a client (it was government DoD) that would accept (and even 
> required) and ODT file.

Then you are a very valuable person for helping in fixing these Import/Export 
issues
I always send documents in Open Document (to EU, ICCAT, etc) and they will not 
refuse it ;)


> (2) Frequently exchange documents with clients that will use/require DOCX. 

I usually save them in DOC and I haven't had complaints. In the cases where 
formatting is lost, then I have to switch to Windows 7 and an old copy of MS 
Office 2010 and even so there are issues sometimes (but I refuse to continually 
buy the new version that MS is pushing). I don't use LibreOffice for that 
because there is also format loss (sometimes even content loss)

> (3) I frequently work with people who are better off not having to deal with 
> the extra steps of converting between formats. 

That is indeed an obstacle. But using DOC, XLS and PPT usually solves the 
problem. In fact I think that MS old formats have become the lingua franca of 
the office documents ;)

> All else being equal, if you fall into the categories above, I usually tell 
> them to use LibreOffice because it will natively support reading and writing 
> DOCX format. When I ask people why they chose LibreOffice over Apache 
> OpenOffice, DOCX support is the reason usually listed.

Yes, that is indeed one advantage. But as I mentioned before, there are serious 
glitches when using MS XML formats in LibreOffice and in addition this will 
help Microsoft make their format the standard. And I believe that is the wrong 
option.

Many governments and organizations have accepted Open Document as a solution to 
be free from proprietary formats. The reason they can't switch to it is because 
all PCs are loaded with Windows and Office... And MS Office will only accept 
without warnings ODF documents created and edited in MS Office...

I do not have a solution for this profit based bullying but accepting to use MS 
XML formats is becoming part of the problem and not part of the solution...

Regards,
Pedro

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-17 Thread Andrew Pitonyak

On Saturday, October 17, 2020 04:56 EDT, Pedro Lino 
 wrote:
 Hi all

> On 10/17/2020 9:11 AM Matthias Seidel  wrote:

> My point is that one should do the work in ODF and only export to
> "foreign" formats if needed.

+1
This is how Gimp works. You can import any format, work on it using the 
program's own format XCF (not Photoshop's PSD to please the majority) and in 
the end you can export to whatever format (PNG, JPG, etc)

In fact the "foreign" formats don't even show up in the Save options. For me 
this is the best solution!

On the other hand (as it happens in LibreOffice) exporting to Microsoft's XML 
will never be perfect (Microsoft will make sure!) and there will always be 
people complaining but it is far better that there is a single conversion 
before sending the document!
(1) Sometimes contractually obligated to deliver some products in DOCX format. 
I am pretty good at knowing what things will export properly to DOCX format and 
which will not (just because I have done it often enough). Only once have I had 
a client (it was government DoD) that would accept (and even required) and ODT 
file. 

(2) Frequently exchange documents with clients that will use/require DOCX. 

(3) I frequently work with people who are better off not having to deal with 
the extra steps of converting between formats. 

All else being equal, if you fall into the categories above, I usually tell 
them to use LibreOffice because it will natively support reading and writing 
DOCX format. When I ask people why they chose LibreOffice over Apache 
OpenOffice, DOCX support is the reason usuall listed. 

Andrew Pitonyak


 


Re: Writer and .docx

2020-10-17 Thread Pedro Lino
Hi all

> On 10/17/2020 9:11 AM Matthias Seidel  wrote:

> My point is that one should do the work in ODF and only export to
> "foreign" formats if needed.

+1
This is how Gimp works. You can import any format, work on it using the 
program's own format XCF (not Photoshop's PSD to please the majority) and in 
the end you can export to whatever format (PNG, JPG, etc)

In fact the "foreign" formats don't even show up in the Save options. For me 
this is the best solution!

On the other hand (as it happens in LibreOffice) exporting to Microsoft's XML 
will never be perfect (Microsoft will make sure!) and there will always be 
people complaining but it is far better that there is a single conversion 
before sending the document!

Regards,
Pedro

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-17 Thread Rory O'Farrell
On Sat, 17 Oct 2020 10:05:45 +0200
Damjan Jovanovic  wrote:

> On Fri, Oct 16, 2020 at 11:50 AM Bidouille  wrote:
> 
> > > OpenOffice users can open documents in .docx format, but they cannot
> > > save in that format.
> > Well, remember that last version of Microsoft Office (since 2016) can open
> > ODT format.
> >
> >
> Unfortunately not all MS Office editions have ODT support, and many people
> I sent ODT to complained they can't open it.

Many MS Office users expect their File Associations to be set to allow them 
double click on a file to open it.  Few of them know the longer way.

-- 
Rory O'Farrell 

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-17 Thread Matthias Seidel
Hi Damjan,

Am 17.10.20 um 10:05 schrieb Damjan Jovanovic:
> On Fri, Oct 16, 2020 at 11:50 AM Bidouille  wrote:
>
>>> OpenOffice users can open documents in .docx format, but they cannot
>>> save in that format.
>> Well, remember that last version of Microsoft Office (since 2016) can open
>> ODT format.
>>
>>
> Unfortunately not all MS Office editions have ODT support, and many people
> I sent ODT to complained they can't open it.

Acknowledged!

My point is that one should do the work in ODF and only export to
"foreign" formats if needed.

Export using POI was an idea we also got at ApacheCon Berlin in 2019.
Developing it as an extension would be the best for all.

Regards,

   Matthias

>



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Writer and .docx

2020-10-17 Thread Damjan Jovanovic
On Fri, Oct 16, 2020 at 11:50 AM Bidouille  wrote:

> > OpenOffice users can open documents in .docx format, but they cannot
> > save in that format.
> Well, remember that last version of Microsoft Office (since 2016) can open
> ODT format.
>
>
Unfortunately not all MS Office editions have ODT support, and many people
I sent ODT to complained they can't open it.


Re: Writer and .docx

2020-10-16 Thread Peter Kovacs

I m also in support, whatever that is worth :)

Am 16.10.20 um 21:56 schrieb Carl Marcum:



I might get back into this next month, especially if others want to
collaborate, but don't expect something generally usable, let alone
Excel-quality XSLX saving, any time soon.

Regards
Damjan


Yes I'm definitely interested in collaborating on this.
Do you have a branch with your work in it?


It's been 5 years and the code is in bits and pieces, but I'll try to 
put

together a working branch over the weekend.


Whenever you have time.
Just let me know.

Thanks,
Carl

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-16 Thread Matthias Seidel
Hi Bidouille,

Am 16.10.20 um 11:49 schrieb Bidouille:
>> OpenOffice users can open documents in .docx format, but they cannot
>> save in that format. 
> Well, remember that last version of Microsoft Office (since 2016) can open 
> ODT format.

Exactly, instead of promoting Microsoft formats we should better
encourage users to use ODF.

Regards,

   Matthias

>  
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
>



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Writer and .docx

2020-10-16 Thread Carl Marcum



I might get back into this next month, especially if others want to
collaborate, but don't expect something generally usable, let alone
Excel-quality XSLX saving, any time soon.

Regards
Damjan


Yes I'm definitely interested in collaborating on this.
Do you have a branch with your work in it?



It's been 5 years and the code is in bits and pieces, but I'll try to put
together a working branch over the weekend.


Whenever you have time.
Just let me know.

Thanks,
Carl

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-16 Thread Damjan Jovanovic
On Fri, Oct 16, 2020 at 4:24 PM Carl Marcum  wrote:

> Hi Damjan,
>
> On 10/16/20 9:23 AM, Damjan Jovanovic wrote:
> > On Fri, Oct 16, 2020 at 2:05 PM Dave Fisher 
> wrote:
> >
> >> Hi -
> >>
> >> Sent from my iPhone
> >>
> >>> On Oct 16, 2020, at 4:04 AM, Mechtilde  wrote:
> >>>
> >>> Hello Joost,
> >>>
> >>> I'm very happy to read from you.
> >>>
>  Am 16.10.20 um 12:50 schrieb Joost Andrae:
>  Hi Simon,
> 
>  it's an honor to me to see a sign of life of you here. Welcome !
> 
>  Instead of user picking here to get users leave from AOO to LO a
>  developer could create a Java based OOo/LO extension that uses Apache
>  POI to export OpenDocument type documents to MSXML formats by using
> the
>  binary MSO export to export those documents to the MSXML format in
>  between. Or maybe it's possible to XSL this document format by using
>  OpenOffice together with Apache POI. Using XSL scripts (in AOO menu
> item
>  XML filter settings) to make document conversions is possible within
> >> OOo.
> >>> I offer my help to test the implementation. sorry but I'm not a
> >>> programmer. So we as the project need help from Java programmers to
> work
> >>> on it and contribute it.
> >> I’m a PMC Member of Apache POI for over 12 years. My team donated the
> >> initial PowerPoint support and were involved in the initial support for
> >> OOXML.
> >>
> >> POI is embedded into Apache SOLr and Tika along with commercial
> products.
> >> The project took over the dormant XMLBeans project and is releasing a
> 4.0
> >> that supports modern Java.
> >>
> >> An OSGi bundle of POI will be available in the next release if you build
> >> from source.
> >>
> >> The Tika, POI, and PDFBox projects maintain a large regression corpus
> >> scraped from the internet using CommonCrawl. I’m sure that this could be
> >> shared in one way or another.
> >>
> >> Regards,
> >> Dave
> >>
> >>
> > Hi
> >
> > I did start writing a POI-based OOXML export filter for AOO some years
> ago
> > (search the dev mailing list), and got it to the point of being able to
> > save very basic spreadsheets (no formulas, no formatting, just text and
> > numbers).
> >
> > There were several major problems with using POI.
> >
> > Firstly the code in POI is at various stages of completeness. The legacy
> > XLS filter is very good, supports SAX parsing, etc. The DOC filter is
> > minimal and unmaintained. What we would need, the OOXML filter for at
> least
> > XLSX, is somewhere in between. AFAIK it only supports DOM parsing,
> meaning
> > everything needs to be in memory before it can be written to disk, so a
> big
> > spreadsheet could consume gigabytes of RAM during saving, and if you
> don't
> > have enough memory free, you can't save!
> >
> > Also I do use POI at work, and it's outstanding for parsing spreadsheets
> > (it can even parse some that AOO can't), but it's very memory hungry. A
> > spreadsheet with 10 rows consumed 6 GB of RAM, compared to 200 MB in
> LO
> > (30 times less). That isn't really POI's fault, Java has too much
> > per-object overhead and there are a great many objects in a spreadsheet
> > that big. So DOM + Java really do not add up to efficient memory usage.
> By
> > comparison, our current OOXML reading is not only SAX-based, but converts
> > XML tags to integers for faster comparisons and lower memory usage.
> >
> > Finally AOO itself had limitations that made developing a filter in Java
> > difficult. Each sheet in a spreadsheet has 1 billion cells. Obviously
> only
> > a minority of these contain data - most are empty. In C++ there are
> special
> > iterators that can be used to access only the non-empty cells, but these
> > are not exposed to UNO, or through it, to Java. The only way to tell
> which
> > cells are in use is to iterate over all 1 billion cells (per sheet),
> which
> > is hopelessly slow.
> >
> > Some of these problems can be solved. We can expose the cell iterators
> over
> > UNO. The memory usage might not matter that much in practice, and we
> could
> > patch POI to do SAX parsing/saving at a later stage. But users expect
> > fonts, styles, charts, images, custom formats, OLE, pivot tables, VBA
> > macros, form controls, mathematical formulas, change tracking, etc. all
> > saved losslessly and 100% compatible with Excel, which doesn't only
> require
> > work in the filter, but in the rest of AOO too, and POI probably doesn't
> > support all of those features either.
> I'm not sure if you've look at the newer Streaming Usermodel API SXSSF.
> It may help for memory consumption in this case.
>
>
Can SXSSF work with formulas that reference earlier cells?


> >
> > I might get back into this next month, especially if others want to
> > collaborate, but don't expect something generally usable, let alone
> > Excel-quality XSLX saving, any time soon.
> >
> > Regards
> > Damjan
> >
> Yes I'm definitely interested in collaborating on this.
> Do you have a branch with your work 

Re: Writer and .docx

2020-10-16 Thread Carl Marcum

Hi Damjan,

On 10/16/20 9:23 AM, Damjan Jovanovic wrote:

On Fri, Oct 16, 2020 at 2:05 PM Dave Fisher  wrote:


Hi -

Sent from my iPhone


On Oct 16, 2020, at 4:04 AM, Mechtilde  wrote:

Hello Joost,

I'm very happy to read from you.


Am 16.10.20 um 12:50 schrieb Joost Andrae:
Hi Simon,

it's an honor to me to see a sign of life of you here. Welcome !

Instead of user picking here to get users leave from AOO to LO a
developer could create a Java based OOo/LO extension that uses Apache
POI to export OpenDocument type documents to MSXML formats by using the
binary MSO export to export those documents to the MSXML format in
between. Or maybe it's possible to XSL this document format by using
OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item
XML filter settings) to make document conversions is possible within

OOo.

I offer my help to test the implementation. sorry but I'm not a
programmer. So we as the project need help from Java programmers to work
on it and contribute it.

I’m a PMC Member of Apache POI for over 12 years. My team donated the
initial PowerPoint support and were involved in the initial support for
OOXML.

POI is embedded into Apache SOLr and Tika along with commercial products.
The project took over the dormant XMLBeans project and is releasing a 4.0
that supports modern Java.

An OSGi bundle of POI will be available in the next release if you build
from source.

The Tika, POI, and PDFBox projects maintain a large regression corpus
scraped from the internet using CommonCrawl. I’m sure that this could be
shared in one way or another.

Regards,
Dave



Hi

I did start writing a POI-based OOXML export filter for AOO some years ago
(search the dev mailing list), and got it to the point of being able to
save very basic spreadsheets (no formulas, no formatting, just text and
numbers).

There were several major problems with using POI.

Firstly the code in POI is at various stages of completeness. The legacy
XLS filter is very good, supports SAX parsing, etc. The DOC filter is
minimal and unmaintained. What we would need, the OOXML filter for at least
XLSX, is somewhere in between. AFAIK it only supports DOM parsing, meaning
everything needs to be in memory before it can be written to disk, so a big
spreadsheet could consume gigabytes of RAM during saving, and if you don't
have enough memory free, you can't save!

Also I do use POI at work, and it's outstanding for parsing spreadsheets
(it can even parse some that AOO can't), but it's very memory hungry. A
spreadsheet with 10 rows consumed 6 GB of RAM, compared to 200 MB in LO
(30 times less). That isn't really POI's fault, Java has too much
per-object overhead and there are a great many objects in a spreadsheet
that big. So DOM + Java really do not add up to efficient memory usage. By
comparison, our current OOXML reading is not only SAX-based, but converts
XML tags to integers for faster comparisons and lower memory usage.

Finally AOO itself had limitations that made developing a filter in Java
difficult. Each sheet in a spreadsheet has 1 billion cells. Obviously only
a minority of these contain data - most are empty. In C++ there are special
iterators that can be used to access only the non-empty cells, but these
are not exposed to UNO, or through it, to Java. The only way to tell which
cells are in use is to iterate over all 1 billion cells (per sheet), which
is hopelessly slow.

Some of these problems can be solved. We can expose the cell iterators over
UNO. The memory usage might not matter that much in practice, and we could
patch POI to do SAX parsing/saving at a later stage. But users expect
fonts, styles, charts, images, custom formats, OLE, pivot tables, VBA
macros, form controls, mathematical formulas, change tracking, etc. all
saved losslessly and 100% compatible with Excel, which doesn't only require
work in the filter, but in the rest of AOO too, and POI probably doesn't
support all of those features either.
I'm not sure if you've look at the newer Streaming Usermodel API SXSSF. 
It may help for memory consumption in this case.




I might get back into this next month, especially if others want to
collaborate, but don't expect something generally usable, let alone
Excel-quality XSLX saving, any time soon.

Regards
Damjan


Yes I'm definitely interested in collaborating on this.
Do you have a branch with your work in it?

Thanks,
Carl

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-16 Thread Damjan Jovanovic
On Fri, Oct 16, 2020 at 2:05 PM Dave Fisher  wrote:

> Hi -
>
> Sent from my iPhone
>
> > On Oct 16, 2020, at 4:04 AM, Mechtilde  wrote:
> >
> > Hello Joost,
> >
> > I'm very happy to read from you.
> >
> >> Am 16.10.20 um 12:50 schrieb Joost Andrae:
> >> Hi Simon,
> >>
> >> it's an honor to me to see a sign of life of you here. Welcome !
> >>
> >> Instead of user picking here to get users leave from AOO to LO a
> >> developer could create a Java based OOo/LO extension that uses Apache
> >> POI to export OpenDocument type documents to MSXML formats by using the
> >> binary MSO export to export those documents to the MSXML format in
> >> between. Or maybe it's possible to XSL this document format by using
> >> OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item
> >> XML filter settings) to make document conversions is possible within
> OOo.
> >
> > I offer my help to test the implementation. sorry but I'm not a
> > programmer. So we as the project need help from Java programmers to work
> > on it and contribute it.
>
> I’m a PMC Member of Apache POI for over 12 years. My team donated the
> initial PowerPoint support and were involved in the initial support for
> OOXML.
>
> POI is embedded into Apache SOLr and Tika along with commercial products.
> The project took over the dormant XMLBeans project and is releasing a 4.0
> that supports modern Java.
>
> An OSGi bundle of POI will be available in the next release if you build
> from source.
>
> The Tika, POI, and PDFBox projects maintain a large regression corpus
> scraped from the internet using CommonCrawl. I’m sure that this could be
> shared in one way or another.
>
> Regards,
> Dave
>
>
Hi

I did start writing a POI-based OOXML export filter for AOO some years ago
(search the dev mailing list), and got it to the point of being able to
save very basic spreadsheets (no formulas, no formatting, just text and
numbers).

There were several major problems with using POI.

Firstly the code in POI is at various stages of completeness. The legacy
XLS filter is very good, supports SAX parsing, etc. The DOC filter is
minimal and unmaintained. What we would need, the OOXML filter for at least
XLSX, is somewhere in between. AFAIK it only supports DOM parsing, meaning
everything needs to be in memory before it can be written to disk, so a big
spreadsheet could consume gigabytes of RAM during saving, and if you don't
have enough memory free, you can't save!

Also I do use POI at work, and it's outstanding for parsing spreadsheets
(it can even parse some that AOO can't), but it's very memory hungry. A
spreadsheet with 10 rows consumed 6 GB of RAM, compared to 200 MB in LO
(30 times less). That isn't really POI's fault, Java has too much
per-object overhead and there are a great many objects in a spreadsheet
that big. So DOM + Java really do not add up to efficient memory usage. By
comparison, our current OOXML reading is not only SAX-based, but converts
XML tags to integers for faster comparisons and lower memory usage.

Finally AOO itself had limitations that made developing a filter in Java
difficult. Each sheet in a spreadsheet has 1 billion cells. Obviously only
a minority of these contain data - most are empty. In C++ there are special
iterators that can be used to access only the non-empty cells, but these
are not exposed to UNO, or through it, to Java. The only way to tell which
cells are in use is to iterate over all 1 billion cells (per sheet), which
is hopelessly slow.

Some of these problems can be solved. We can expose the cell iterators over
UNO. The memory usage might not matter that much in practice, and we could
patch POI to do SAX parsing/saving at a later stage. But users expect
fonts, styles, charts, images, custom formats, OLE, pivot tables, VBA
macros, form controls, mathematical formulas, change tracking, etc. all
saved losslessly and 100% compatible with Excel, which doesn't only require
work in the filter, but in the rest of AOO too, and POI probably doesn't
support all of those features either.

I might get back into this next month, especially if others want to
collaborate, but don't expect something generally usable, let alone
Excel-quality XSLX saving, any time soon.

Regards
Damjan


Re: Writer and .docx

2020-10-16 Thread Dave Fisher
Hi -

Sent from my iPhone

> On Oct 16, 2020, at 4:04 AM, Mechtilde  wrote:
> 
> Hello Joost,
> 
> I'm very happy to read from you.
> 
>> Am 16.10.20 um 12:50 schrieb Joost Andrae:
>> Hi Simon,
>> 
>> it's an honor to me to see a sign of life of you here. Welcome !
>> 
>> Instead of user picking here to get users leave from AOO to LO a
>> developer could create a Java based OOo/LO extension that uses Apache
>> POI to export OpenDocument type documents to MSXML formats by using the
>> binary MSO export to export those documents to the MSXML format in
>> between. Or maybe it's possible to XSL this document format by using
>> OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item
>> XML filter settings) to make document conversions is possible within OOo.
> 
> I offer my help to test the implementation. sorry but I'm not a
> programmer. So we as the project need help from Java programmers to work
> on it and contribute it.

I’m a PMC Member of Apache POI for over 12 years. My team donated the initial 
PowerPoint support and were involved in the initial support for OOXML.

POI is embedded into Apache SOLr and Tika along with commercial products. The 
project took over the dormant XMLBeans project and is releasing a 4.0 that 
supports modern Java. 

An OSGi bundle of POI will be available in the next release if you build from 
source.

The Tika, POI, and PDFBox projects maintain a large regression corpus scraped 
from the internet using CommonCrawl. I’m sure that this could be shared in one 
way or another.

Regards,
Dave

> 
>> 
>> Document conversions do not necessarily need to be a native
>> implementation within AOO or LO.
>> 
>> Kind regards, Joost
> 
> Kind regards
> 
> 
> -- 
> Mechtilde Stehmann
> ## Apache OpenOffice
> ## Freie Office Suite für Linux, MacOSX, Windows und OS/2
> ## Debian Developer
> ## PGP encryption welcome
> ## F0E3 7F3D C87A 4998 2899  39E7 F287 7BBA 141A AD7F
> 


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-16 Thread Simon Phipps
On Fri, Oct 16, 2020 at 12:26 PM Joost Andrae  wrote:

> regarding the documentliberation stuff:
> I know some of those filters for a longer time. AFAIK these where
> implementations from one of the Novell guys (Fridrich Strba as far as I
> remember; see http://fridrich.blogspot.com/ )
>

Yes, DLP is his project.

S.


Re: Writer and .docx

2020-10-16 Thread Joost Andrae

Hi,

regarding the documentliberation stuff:
I know some of those filters for a longer time. AFAIK these where 
implementations from one of the Novell guys (Fridrich Strba as far as I 
remember; see http://fridrich.blogspot.com/ )


Best, Joost

Am 16.10.2020 um 13:17 schrieb Joost Andrae:

Hi Simon,

some developer just needs to try to implement this kind of approach. 
Fortunately this developer doesn't really need to dive deeply into AOO 
implementation details except creating an extension. From my knowlege 
the Apache POI implementation is quite stable and it's used by a lot of 
software projects since a longer time.


Best regards, Joost

Am 16.10.2020 um 13:05 schrieb Simon Phipps:

Hi Joost!

On Fri, Oct 16, 2020 at 11:49 AM Joost Andrae  
wrote:



Hi Simon,

it's an honor to me to see a sign of life of you here. Welcome !



I've been a relatively active member here from the beginning!

Instead of user picking here to get users leave from AOO to LO a

developer could create a Java based OOo/LO extension that uses Apache
POI to export OpenDocument type documents to MSXML formats by using the
binary MSO export to export those documents to the MSXML format in
between. Or maybe it's possible to XSL this document format by using
OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item
XML filter settings) to make document conversions is possible within 
OOo.




That sounds an interesting new user feature. Rather than only using 
POI, a

pluggable approach that could also use libraries from the Document
Liberation Project https://www.documentliberation.org/ would be 
excellent.


Cheers,

Simon





-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-16 Thread Joost Andrae

Hi Simon,

some developer just needs to try to implement this kind of approach. 
Fortunately this developer doesn't really need to dive deeply into AOO 
implementation details except creating an extension. From my knowlege 
the Apache POI implementation is quite stable and it's used by a lot of 
software projects since a longer time.


Best regards, Joost

Am 16.10.2020 um 13:05 schrieb Simon Phipps:

Hi Joost!

On Fri, Oct 16, 2020 at 11:49 AM Joost Andrae  wrote:


Hi Simon,

it's an honor to me to see a sign of life of you here. Welcome !



I've been a relatively active member here from the beginning!

Instead of user picking here to get users leave from AOO to LO a

developer could create a Java based OOo/LO extension that uses Apache
POI to export OpenDocument type documents to MSXML formats by using the
binary MSO export to export those documents to the MSXML format in
between. Or maybe it's possible to XSL this document format by using
OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item
XML filter settings) to make document conversions is possible within OOo.



That sounds an interesting new user feature. Rather than only using POI, a
pluggable approach that could also use libraries from the Document
Liberation Project https://www.documentliberation.org/ would be excellent.

Cheers,

Simon





-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-16 Thread Simon Phipps
Hi Joost!

On Fri, Oct 16, 2020 at 11:49 AM Joost Andrae  wrote:

> Hi Simon,
>
> it's an honor to me to see a sign of life of you here. Welcome !
>

I've been a relatively active member here from the beginning!

Instead of user picking here to get users leave from AOO to LO a
> developer could create a Java based OOo/LO extension that uses Apache
> POI to export OpenDocument type documents to MSXML formats by using the
> binary MSO export to export those documents to the MSXML format in
> between. Or maybe it's possible to XSL this document format by using
> OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item
> XML filter settings) to make document conversions is possible within OOo.
>

That sounds an interesting new user feature. Rather than only using POI, a
pluggable approach that could also use libraries from the Document
Liberation Project https://www.documentliberation.org/ would be excellent.

Cheers,

Simon


Re: Writer and .docx

2020-10-16 Thread Mechtilde
Hello Joost,

I'm very happy to read from you.

Am 16.10.20 um 12:50 schrieb Joost Andrae:
> Hi Simon,
> 
> it's an honor to me to see a sign of life of you here. Welcome !
> 
> Instead of user picking here to get users leave from AOO to LO a
> developer could create a Java based OOo/LO extension that uses Apache
> POI to export OpenDocument type documents to MSXML formats by using the
> binary MSO export to export those documents to the MSXML format in
> between. Or maybe it's possible to XSL this document format by using
> OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item
> XML filter settings) to make document conversions is possible within OOo.

I offer my help to test the implementation. sorry but I'm not a
programmer. So we as the project need help from Java programmers to work
on it and contribute it.

> 
> Document conversions do not necessarily need to be a native
> implementation within AOO or LO.
> 
> Kind regards, Joost

Kind regards


-- 
Mechtilde Stehmann
## Apache OpenOffice
## Freie Office Suite für Linux, MacOSX, Windows und OS/2
## Debian Developer
## PGP encryption welcome
## F0E3 7F3D C87A 4998 2899  39E7 F287 7BBA 141A AD7F



signature.asc
Description: OpenPGP digital signature


Re: Writer and .docx

2020-10-16 Thread Joost Andrae

Hi Simon,

it's an honor to me to see a sign of life of you here. Welcome !

Instead of user picking here to get users leave from AOO to LO a 
developer could create a Java based OOo/LO extension that uses Apache 
POI to export OpenDocument type documents to MSXML formats by using the 
binary MSO export to export those documents to the MSXML format in 
between. Or maybe it's possible to XSL this document format by using 
OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item 
XML filter settings) to make document conversions is possible within OOo.


Document conversions do not necessarily need to be a native 
implementation within AOO or LO.


Kind regards, Joost

Am 16.10.2020 um 12:16 schrieb Simon Phipps:

Hi!  As Peter said, it seems unlikely this branch of OpenOffice.org will be
enhanced with the ability to write .DOCX format. However, another branch
has added this capability and offers all the other convenient options you
mention as well. You can get it from our "sister" community at
https://libreoffice.org/download

Cheers

Simon

On Fri, Oct 16, 2020 at 8:44 AM Наталья Василенко 
wrote:


Hello! I would like to know is there any hope that users can save
documents in Writer in .docx format? In the latest version of your
OpenOffice users can open documents in .docx format, but they cannot save
in that format. I think it is not comfortable for many users with the fact
that your product is very convenient in other options.
Could this feature be enabled in later versions of your product?

Thank you for your response.







-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-16 Thread Simon Phipps
Hi!  As Peter said, it seems unlikely this branch of OpenOffice.org will be
enhanced with the ability to write .DOCX format. However, another branch
has added this capability and offers all the other convenient options you
mention as well. You can get it from our "sister" community at
https://libreoffice.org/download

Cheers

Simon

On Fri, Oct 16, 2020 at 8:44 AM Наталья Василенко 
wrote:

> Hello! I would like to know is there any hope that users can save
> documents in Writer in .docx format? In the latest version of your
> OpenOffice users can open documents in .docx format, but they cannot save
> in that format. I think it is not comfortable for many users with the fact
> that your product is very convenient in other options.
> Could this feature be enabled in later versions of your product?
>
> Thank you for your response.
>


Re: Writer and .docx

2020-10-16 Thread Bidouille
> OpenOffice users can open documents in .docx format, but they cannot
> save in that format. 
Well, remember that last version of Microsoft Office (since 2016) can open ODT 
format.
 

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Writer and .docx

2020-10-16 Thread Peter Kovacs



Am 16.10.20 um 09:25 schrieb Наталья Василенко:

Hello! I would like to know is there any hope that users can save documents in 
Writer in .docx format? In the latest version of your OpenOffice users can open 
documents in .docx format, but they cannot save in that format. I think it is 
not comfortable for many users with the fact that your product is very 
convenient in other options.
Could this feature be enabled in later versions of your product?
Yes we want to improve the capability to read and write documents 
produced by Microsoft Office. However, our development community is made 
up of volunteers working in their free time, next to a regular job.


We have no timeline when we will add this feature, but we know this 
feature has a high demand within the user base of OpenOffice.


I do not expect that this feature will be imporved soon.


Thank you for your response.

You are Welcome.

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org