Re: [GitHub] [openoffice] DamjanJovanovic opened a new pull request, #164: Update EditEngine code to use 32 bit paragraph storage

2023-01-03 Thread Dirk-Willem van Gulik
On 3 Jan 2023, at 05:51, GitBox  wrote:
> 
> DamjanJovanovic opened a new pull request, #164:
> URL: https://github.com/apache/openoffice/pull/164
> 
>   Our EditEngine, in main/editeng, has a number of containers for paragraphs, 
> paragraph portions, lines, etc. These are based on svl's "PTRARR" type 
> classes, which is just an array of some caller

> -becomes a new EditEngine paragraph, and after 65534 paragraphs, all further 
> cells are ignored, so when it's time to populate the spreadsheet, the data 
> isn't there to add.

Hurray to this getting attention* ! 

Dw

* This bug has bitten me many a time - usually in the final nightly hours of 
delivering some report with a deadline looming. When you had to integrate the 
work of several teams into one final document or call-for-tender submission.


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



[GitHub] [openoffice] DamjanJovanovic opened a new pull request, #164: Update EditEngine code to use 32 bit paragraph storage

2023-01-02 Thread GitBox


DamjanJovanovic opened a new pull request, #164:
URL: https://github.com/apache/openoffice/pull/164

   Our EditEngine, in main/editeng, has a number of containers for paragraphs, 
paragraph portions, lines, etc. These are based on svl's "PTRARR" type classes, 
which is just an array of some caller-defined class that grows as needed, like 
::std::vector but hardcoded to 16 bit limits. Worse, a lot of code, even 
outside of main/editeng, is written under these assumptions, and passes 16 bit 
types and constants to EditEngine classes and retrieves 16 bit results.
   
   One particularly notorious bug emerging from this design, is that when you 
try to paste an HTML table of more than 65534 cells into Calc, the first 65534 
cells are pasted, but all further cells are silently lost, something reported 
in at least 3 bugs in the last 18 years 
([57176](https://bz.apache.org/ooo/show_bug.cgi?id=57176), 
[110486](https://bz.apache.org/ooo/show_bug.cgi?id=110486) and 
[117225](https://bz.apache.org/ooo/show_bug.cgi?id=117225)). This apparently 
happens because our HTML import is similar to XML DOM parsing, done in 2 
phases, first parsing the results into memory, then processing those in-memory 
results to populate data into the spreadsheet (ScHTMLImport::WriteToDocument() 
in main/sc/source/filter/html/htmlimp.cxx). When parsing, each HTML cell () 
becomes a new EditEngine paragraph, and after 65534 paragraphs, all further 
cells are ignored, so when it's time to populate the spreadsheet, the data 
isn't there to add.
   
   This enormous patchset changes the container for paragraphs to a new 
BaseList class, which wraps ::std::vector and uses 32 bit integers to access 
it. All EditEngine methods are changed to take 32 bit paragraph index 
parameters, all EditEngine classes store any paragraph indices as 32 bit 
fields, all 16 bit paragraph index constants like 0x are changed to 32 bit 
0x (and to a more readable constant like EE_PARA_NOT_FOUND), and all 
(known) calling code everywhere in OpenOffice is updated to take these changes 
into account.
   
   The bug is definitely fixed, all sample documents from Bugzilla paste fully 
and correctly. What is harder to prove is that nothing else broke. Much had to 
change, 136 files in 9 modules. OpenGrok helped tremendously in finding obscure 
places where EditEngine methods were getting called from. I've really tried to 
avoid introducing any new bugs, but it is hard to be sure with a change of this 
size. I am making a PR instead of directly pushing to allow you to test a lot 
;).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org