[
https://issues.apache.org/jira/browse/ODFTOOLKIT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234195#comment-15234195
]
Svante Schubert commented on ODFTOOLKIT-426:
--------------------------------------------
Thank you very much for the patches, Nimarukan!
They all look very professional, I have committed them (hopefully, as I lack
some professional tooling,) all.
It took me some time today to read myself into the functionality of the simple
API, which had been improved by the patch, here some general comments:
The basic problem being solved is that when content is being copied the styles
being used by the copied-content should be copied as well.
This happens as well for hard styles (part of office:automatic-styles) and
template styles (part of office:styles). When a style exists in the target, the
copied style is being renamed.
This had been added once by the IBM team from Beijing as part of a fork, which
has been rejoined here at Apache, again.
The design is questionable at the point when a referenced styles already
exists.
Think of copying a "Heading 1" style, the target document suddenly has two
different heading styles at level 1 in the target document.
But what really surprised me, is that styles/formats of graphical objects are
ALWAYS being copied, independent if they are part of the copied content or not.
This happens, In the second part of "copyForeignStyleRef(,,), where not a style
name is being searched for and copied, but everything that has a "draw:name" in
the styles.xml.
The spec writes: "The draw:name attribute specifies names that are used for
referencing graphical elements."
http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#__RefHeading__1417112_253892949
Why should someone copy styles of graphical objects, when they are not part of
the content being copied?
This seems quite a burden and is unfortunately not part of any existing test
case.
I removed it and the tests are still running, but there is a certainly a reason
why the IBM developers once put it in there.
The worst part to me is that this simple API functionality of copying used
styles would have benefit a lot in performance in an extension of the
underlying ODFDOM API to receive a set of Maps. Not only office:styles as
already existing, but also those from the office:automatic-styles (hard
formatting styles). Instead of receiving such a hash map the content.xml and/or
styles.xml DOM is being searched over and over again from the top using XPATH
and XSLT in the method "copyForeignStyleRef(,,), which is calling
"getCopyStyleList(..)".
You improved private method "getCopyStyleList(..)" being called by
"copyForeignStyleRef(,,) being used by several copy methods of the various ODF
documents.
Any bet that exchanging the usage of XPATH & XSLT by a better underling ODFDOM
API support will remove the bottleneck at all.
NOTE: That there is a possibility that the office:automatic-styles can be in
the styles.xml, whenever a footer or header has styles, as the content of the
footer/header is part of the page style. Nowadays I would design this feature
different and move all contents to the content.xml, referencing from the
styles.xml to the "content fragments" within content.xml
After reading the existing code, I am uncertain if copying styles from a
content being part of a header/footer will work..
> PERFORMANCE: simple Document.getCopyStyleList improvements to reduce String
> allocation, hashing, etc.
> -----------------------------------------------------------------------------------------------------
>
> Key: ODFTOOLKIT-426
> URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-426
> Project: ODF Toolkit
> Issue Type: Improvement
> Components: simple api
> Environment: 0.8.2-incubating-SNAPSHOT, jdk1.8.0_77, win7
> Reporter: Nimarukan
> Priority: Minor
> Labels: patch, performance
> Attachments: Issue-426-patches.zip,
> result-ods-xml-before-vs-after-patches_diff.zip
>
>
> org.odftoolkit.simple
> Document.getCopyStyleList
> is a top hotspot in a (NetBeans) sampled profiling run of test_1 of
> https://issues.apache.org/jira/secure/attachment/12795379/odftoolkit-performance-test.zip
> When a sheet is copied from one document to another, simple.Document also
> merges metadata such as styles. For each style it iterates over every cell
> with that style. The current code can be improved by reusing constructed
> strings (also reducing garbage collection), removing unnecessary hash table
> lookups, etc.
> After these changes, Document.getCopyStyleList takes about 1/16th the
> original time in the profiled run.
> (This is surprising. Maybe it enables some optimizations that eliminate
> String creation in an inner loop in common cases.)
> (The gain is not due to an error of omission: The template.ods source file
> used in this performance test is a spreadsheet with styled cells. The
> differences between the ods xml files of the result spreadsheet produced
> before these patches are applied, and the ods xml files of the result
> spreadsheet produced after these patches are applied, are only in times and
> in random strings appended to style names. So these changes did not
> significantly change the result spreadsheet.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)