Hi,

I think the main reason this is not done currently is that nobody
spent enough time to do this cleanly.

Not sure if comparing the single items of CellStyle would be a good
idea, as this is likely to change at some point with new features
being added, so you would probably compare the XML representation, but
this can quickly become a performance problem if done for each
save-operation, so I would opt for something like this being an
optional operation that the user of POI has to call. Many users care
more about runtime of the application than the size of the resulting
document!

If you are interesting in working on something like this, please
create a bug in our bugtracker and post patches there fore
review/inclusion.

Dominik.

On Mon, Sep 21, 2015 at 3:12 PM, Murphy, Mark <[email protected]> wrote:
> POI 3.12
>
> In looking at the styles.xml file generated by POI, It appears that a lot of 
> borders and fills are created that are not used. Maybe POI needs a method to 
> trim off those unused fills and borders on save. This method would also be 
> able to normalize the fills and borders to remove duplicates and system 
> colors (i.e. fgColor = 64 or bgColor = 65). I notice that Excel does this 
> when it saves a spreadsheet created by POI. Here is an example of the fills, 
> first as created by POI, second as saved by Excel (same spreadsheet):
>
> POI 3.12
>        <fills count="9">
>               <fill>
>                      <patternFill patternType="none" />
>               </fill>
>               <fill>
>                      <patternFill patternType="darkGray" />
>               </fill>
>               <fill>
>                      <patternFill patternType="none">
>                            <fgColor indexed="64" />
>                      </patternFill>
>               </fill>
>               <fill>
>                      <patternFill patternType="none">
>                            <fgColor indexed="64" />
>                            <bgColor indexed="65" />
>                      </patternFill>
>               </fill>
>               <fill>
>                      <patternFill>
>                            <fgColor indexed="64" />
>                            <bgColor indexed="65" />
>                      </patternFill>
>               </fill>
>               <fill>
>                      <patternFill patternType="none">
>                            <fgColor indexed="44" />
>                      </patternFill>
>               </fill>
>               <fill>
>                      <patternFill patternType="none">
>                            <fgColor indexed="44" />
>                            <bgColor indexed="65" />
>                      </patternFill>
>               </fill>
>               <fill>
>                      <patternFill patternType="solid">
>                            <fgColor indexed="44" />
>                            <bgColor indexed="65" />
>                      </patternFill>
>               </fill>
>               <fill>
>                      <patternFill patternType="none">
>                            <bgColor indexed="65" />
>                      </patternFill>
>               </fill>
>        </fills>
>
> Excel 2010 Save As (same spreadsheet)
>                 <fills count="4">
>               <fill>
>                      <patternFill patternType="none" />
>               </fill>
>               <fill>
>                      <patternFill patternType="gray125" />
>               </fill>
>               <fill>
>                      <patternFill patternType="none" />
>               </fill>
>               <fill>
>                      <patternFill patternType="solid">
>                            <fgColor indexed="44" />
>                      </patternFill>
>               </fill>
>        </fills>
>
> Note, not even Excel was perfect as it left a duplicate fill (0, and 2). 
> Borders was even more dramatic POI generated 94, Excel saved 13. And now that 
> I look at it, POI generated 42 styles while Excel kept 23. I understand that 
> you are keeping all the fills borders and styles as they are specified, and 
> re-using them as you get to a final setting, but normalizing and trimming 
> seem to be what you are missing. That can also be applied to fonts and 
> formats.
>
> Some additional thoughts, based on not yet looking at all the code. I can see 
> three steps to collecting all the garbage: Normalization, Duplicate removal, 
> Orphan removal. Normalization would involve setting default values, and 
> removing system colors. On the fill above, the POI generated fill would look 
> like this after normalization:
>
>        <fills count="9">
>               <fill>
>                      <patternFill patternType="none" />
>               </fill>
>               <fill>
>                      <patternFill patternType="darkGray" />
>               </fill>
>               <fill>
>                      <patternFill patternType="none" />
>               </fill>
>               <fill>
>                      <patternFill patternType="none" />
>               </fill>
>               <fill>
>                      <patternFill patternType="none" />
>               </fill>
>               <fill>
>                      <patternFill patternType="none">
>                            <fgColor indexed="44" />
>                      </patternFill>
>               </fill>
>               <fill>
>                      <patternFill patternType="none">
>                            <fgColor indexed="44" />
>                      </patternFill>
>               </fill>
>               <fill>
>                      <patternFill patternType="solid">
>                            <fgColor indexed="44" />
>                      </patternFill>
>               </fill>
>               <fill>
>                      <patternFill patternType="none" />
>               </fill>
>        </fills>
>
> Duplicate removal would remove the many duplicates left by normalization. 
> This would have to be carried back to the styles themselves and potentially 
> to individual cells in the spreadsheet that contain fill id's. After 
> duplicate removal you would have something like the following in the fills:
>
>        <fills count="4">
>               <fill>
>                      <patternFill patternType="none" />
>               </fill>
>               <fill>
>                      <patternFill patternType="darkGray" />
>               </fill>
>               <fill>
>                      <patternFill patternType="none">
>                            <fgColor indexed="44" />
>                      </patternFill>
>               </fill>
>               <fill>
>                      <patternFill patternType="solid">
>                            <fgColor indexed="44" />
>                      </patternFill>
>               </fill>
>        </fills>
>
> And I just happen to know that my spreadsheet never uses fill "none" - 44, so 
> that could d be stripped out during orphan removal to bring the fill to:
>
>        <fills count="3">
>               <fill>
>                      <patternFill patternType="none" />
>               </fill>
>               <fill>
>                      <patternFill patternType="darkGray" />
>               </fill>
>               <fill>
>                      <patternFill patternType="solid">
>                            <fgColor indexed="44" />
>                      </patternFill>
>               </fill>
>        </fills>
>
> I suspect that the same process could be used for borders, fonts, formats and 
> styles where the order of operation would be normalization and duplicate 
> removal for fills, borders, fonts, formats, then styles. This followed by 
> orphan removal for styles then fills, borders, fonts, and formats.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to