If you're using a HSSFWorkbook, HSSFOptimiser [1] might be helpful.
Unfortunately, a XSSFOptimiser or generic SpreadsheetOptimiser
counterpart hasn't been written yet.

[1] 
https://poi.apache.org/apidocs/org/apache/poi/hssf/usermodel/HSSFOptimiser.html
Javen O'Neal


On Tue, Sep 22, 2015 at 1:54 PM, Javen O'Neal <[email protected]> wrote:
> This problem isn't limited to just fill and border cell styles, but
> cell styles in general. I have a similar problem when creating a
> workbook with many (identical) styles
> Along the same note, creating a new workbook, then calling:
>
> for (int i=0; i<1000; i++) {
>   wb.createCellStyle()
> }
>
> And saving the workbook produces a workbook with 1000 styles, even
> though the styles are all duplicates (easy to check) and never used (a
> little more difficult/expensive to check). Opening, saving, and
> closing the 1000 style workbook in Excel will consolidate the styles.
> I believe the style consolidation is done when the file is saved, but
> it's possible it's done when the file is opened in Excel. I'd need to
> write some VBA code to verify that
>
> In my POI application, I want to take a cell and change just the data
> format of the cell style (either 0.00 or 0.000)
> Let's say I'm starting with a workbook where all numbers are formatted
> as 0.00 and I want to format a few columns as 0.000.
> cell.getCellStyle().setDataFormat(format) will change all cells with
> the same style to 0.000, so I'm left with a workbook with all 0.000.
> cell.setCellStyle(cell.getCellStyle().clone().setDataFormat(format))
> will change the data format for just the one cell, but creates a new
> style for every cell.
> What I'm doing is temporarily changing the cell style with the new
> data format, checking if there is another cell style in the styles
> table that matches the modified style, and if so setting the cell's
> style to the matching cell style, otherwise clone the cell style.
> Finally, revert the cell style back to the original data format. It
> looks something like this:
>
> CellStyle originalStyle = cell.getCellStyle();
> DataFormat originalDataFormat = originalStyle.getDataFormat();
> // temporarily modify originalStyle to be the desired style
> originalStyle.setDataFormat(newDataFormat);
> try {
>     // search for existing styles that match the desired style
>     CellStyle matchingStyle = null;
>     for (CellStyle style : stylesSource) {
>         // styles match and do not refer to the same style
>         if (originalStyle.equals(style) && originalStyle != style) {
>             matchingStyle = style;
>             break;
>         }
>     }
>     // only create a new style if there are no styles that match the
> desired style
>     if (matchingStyle != null) {
>         cell.setCellStyle(matchingStyle);
>     } else {
>         cell.setCellStyle(originalStyle.clone());
>     }
> } finally {
>     // revert originalStyle back to what it was before
>     originalStyle.setDataFormat(originalDataFormat);
> }
>
> This consolidation is done when the styles are created rather than
> when the workbook is written to file.
> This seems like something that many users write into their own POI
> applications if they modify cell styles and are concerned about file
> size or memory limitations, but I'm not sure how to standardize this
> as part of the POI API.
>
> I agree that calling a method to consolidate existing cell styles
> should be called explicitly, as it might invalidate any existing
> CellStyle variables the user has access to. Might want to only make
> this change when transforming the in-memory objects into XML to write
> to file, and leave the in-memory objects unmodified as a result of
> this consolidation. Thus, either the workbook holds a
> "consolidateCellStylesOnWrite" state or the write method is passed an
> optional boolean consolidateCellStyles parameter. The former seems
> cleaner, though adds state into the workbook.
>
> I'd be happy to help you write a fix.
>
>
> On Tue, Sep 22, 2015 at 12:24 PM, Dominik Stadler
> <[email protected]> wrote:
>> Hi,
>>
>> I think the main reason this is not done currently is that nobody
>> spent enough time to do this cleanly.
>>
>> Not sure if comparing the single items of CellStyle would be a good
>> idea, as this is likely to change at some point with new features
>> being added, so you would probably compare the XML representation, but
>> this can quickly become a performance problem if done for each
>> save-operation, so I would opt for something like this being an
>> optional operation that the user of POI has to call. Many users care
>> more about runtime of the application than the size of the resulting
>> document!
>>
>> If you are interesting in working on something like this, please
>> create a bug in our bugtracker and post patches there fore
>> review/inclusion.
>>
>> Dominik.
>>
>> On Mon, Sep 21, 2015 at 3:12 PM, Murphy, Mark <[email protected]> 
>> wrote:
>>> POI 3.12
>>>
>>> In looking at the styles.xml file generated by POI, It appears that a lot 
>>> of borders and fills are created that are not used. Maybe POI needs a 
>>> method to trim off those unused fills and borders on save. This method 
>>> would also be able to normalize the fills and borders to remove duplicates 
>>> and system colors (i.e. fgColor = 64 or bgColor = 65). I notice that Excel 
>>> does this when it saves a spreadsheet created by POI. Here is an example of 
>>> the fills, first as created by POI, second as saved by Excel (same 
>>> spreadsheet):
>>>
>>> POI 3.12
>>>        <fills count="9">
>>>               <fill>
>>>                      <patternFill patternType="none" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="darkGray" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none">
>>>                            <fgColor indexed="64" />
>>>                      </patternFill>
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none">
>>>                            <fgColor indexed="64" />
>>>                            <bgColor indexed="65" />
>>>                      </patternFill>
>>>               </fill>
>>>               <fill>
>>>                      <patternFill>
>>>                            <fgColor indexed="64" />
>>>                            <bgColor indexed="65" />
>>>                      </patternFill>
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none">
>>>                            <fgColor indexed="44" />
>>>                      </patternFill>
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none">
>>>                            <fgColor indexed="44" />
>>>                            <bgColor indexed="65" />
>>>                      </patternFill>
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="solid">
>>>                            <fgColor indexed="44" />
>>>                            <bgColor indexed="65" />
>>>                      </patternFill>
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none">
>>>                            <bgColor indexed="65" />
>>>                      </patternFill>
>>>               </fill>
>>>        </fills>
>>>
>>> Excel 2010 Save As (same spreadsheet)
>>>                 <fills count="4">
>>>               <fill>
>>>                      <patternFill patternType="none" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="gray125" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="solid">
>>>                            <fgColor indexed="44" />
>>>                      </patternFill>
>>>               </fill>
>>>        </fills>
>>>
>>> Note, not even Excel was perfect as it left a duplicate fill (0, and 2). 
>>> Borders was even more dramatic POI generated 94, Excel saved 13. And now 
>>> that I look at it, POI generated 42 styles while Excel kept 23. I 
>>> understand that you are keeping all the fills borders and styles as they 
>>> are specified, and re-using them as you get to a final setting, but 
>>> normalizing and trimming seem to be what you are missing. That can also be 
>>> applied to fonts and formats.
>>>
>>> Some additional thoughts, based on not yet looking at all the code. I can 
>>> see three steps to collecting all the garbage: Normalization, Duplicate 
>>> removal, Orphan removal. Normalization would involve setting default 
>>> values, and removing system colors. On the fill above, the POI generated 
>>> fill would look like this after normalization:
>>>
>>>        <fills count="9">
>>>               <fill>
>>>                      <patternFill patternType="none" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="darkGray" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none">
>>>                            <fgColor indexed="44" />
>>>                      </patternFill>
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none">
>>>                            <fgColor indexed="44" />
>>>                      </patternFill>
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="solid">
>>>                            <fgColor indexed="44" />
>>>                      </patternFill>
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none" />
>>>               </fill>
>>>        </fills>
>>>
>>> Duplicate removal would remove the many duplicates left by normalization. 
>>> This would have to be carried back to the styles themselves and potentially 
>>> to individual cells in the spreadsheet that contain fill id's. After 
>>> duplicate removal you would have something like the following in the fills:
>>>
>>>        <fills count="4">
>>>               <fill>
>>>                      <patternFill patternType="none" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="darkGray" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="none">
>>>                            <fgColor indexed="44" />
>>>                      </patternFill>
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="solid">
>>>                            <fgColor indexed="44" />
>>>                      </patternFill>
>>>               </fill>
>>>        </fills>
>>>
>>> And I just happen to know that my spreadsheet never uses fill "none" - 44, 
>>> so that could d be stripped out during orphan removal to bring the fill to:
>>>
>>>        <fills count="3">
>>>               <fill>
>>>                      <patternFill patternType="none" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="darkGray" />
>>>               </fill>
>>>               <fill>
>>>                      <patternFill patternType="solid">
>>>                            <fgColor indexed="44" />
>>>                      </patternFill>
>>>               </fill>
>>>        </fills>
>>>
>>> I suspect that the same process could be used for borders, fonts, formats 
>>> and styles where the order of operation would be normalization and 
>>> duplicate removal for fills, borders, fonts, formats, then styles. This 
>>> followed by orphan removal for styles then fills, borders, fonts, and 
>>> formats.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to