If you're using a HSSFWorkbook, HSSFOptimiser [1] might be helpful. Unfortunately, a XSSFOptimiser or generic SpreadsheetOptimiser counterpart hasn't been written yet.
[1] https://poi.apache.org/apidocs/org/apache/poi/hssf/usermodel/HSSFOptimiser.html Javen O'Neal On Tue, Sep 22, 2015 at 1:54 PM, Javen O'Neal <[email protected]> wrote: > This problem isn't limited to just fill and border cell styles, but > cell styles in general. I have a similar problem when creating a > workbook with many (identical) styles > Along the same note, creating a new workbook, then calling: > > for (int i=0; i<1000; i++) { > wb.createCellStyle() > } > > And saving the workbook produces a workbook with 1000 styles, even > though the styles are all duplicates (easy to check) and never used (a > little more difficult/expensive to check). Opening, saving, and > closing the 1000 style workbook in Excel will consolidate the styles. > I believe the style consolidation is done when the file is saved, but > it's possible it's done when the file is opened in Excel. I'd need to > write some VBA code to verify that > > In my POI application, I want to take a cell and change just the data > format of the cell style (either 0.00 or 0.000) > Let's say I'm starting with a workbook where all numbers are formatted > as 0.00 and I want to format a few columns as 0.000. > cell.getCellStyle().setDataFormat(format) will change all cells with > the same style to 0.000, so I'm left with a workbook with all 0.000. > cell.setCellStyle(cell.getCellStyle().clone().setDataFormat(format)) > will change the data format for just the one cell, but creates a new > style for every cell. > What I'm doing is temporarily changing the cell style with the new > data format, checking if there is another cell style in the styles > table that matches the modified style, and if so setting the cell's > style to the matching cell style, otherwise clone the cell style. > Finally, revert the cell style back to the original data format. It > looks something like this: > > CellStyle originalStyle = cell.getCellStyle(); > DataFormat originalDataFormat = originalStyle.getDataFormat(); > // temporarily modify originalStyle to be the desired style > originalStyle.setDataFormat(newDataFormat); > try { > // search for existing styles that match the desired style > CellStyle matchingStyle = null; > for (CellStyle style : stylesSource) { > // styles match and do not refer to the same style > if (originalStyle.equals(style) && originalStyle != style) { > matchingStyle = style; > break; > } > } > // only create a new style if there are no styles that match the > desired style > if (matchingStyle != null) { > cell.setCellStyle(matchingStyle); > } else { > cell.setCellStyle(originalStyle.clone()); > } > } finally { > // revert originalStyle back to what it was before > originalStyle.setDataFormat(originalDataFormat); > } > > This consolidation is done when the styles are created rather than > when the workbook is written to file. > This seems like something that many users write into their own POI > applications if they modify cell styles and are concerned about file > size or memory limitations, but I'm not sure how to standardize this > as part of the POI API. > > I agree that calling a method to consolidate existing cell styles > should be called explicitly, as it might invalidate any existing > CellStyle variables the user has access to. Might want to only make > this change when transforming the in-memory objects into XML to write > to file, and leave the in-memory objects unmodified as a result of > this consolidation. Thus, either the workbook holds a > "consolidateCellStylesOnWrite" state or the write method is passed an > optional boolean consolidateCellStyles parameter. The former seems > cleaner, though adds state into the workbook. > > I'd be happy to help you write a fix. > > > On Tue, Sep 22, 2015 at 12:24 PM, Dominik Stadler > <[email protected]> wrote: >> Hi, >> >> I think the main reason this is not done currently is that nobody >> spent enough time to do this cleanly. >> >> Not sure if comparing the single items of CellStyle would be a good >> idea, as this is likely to change at some point with new features >> being added, so you would probably compare the XML representation, but >> this can quickly become a performance problem if done for each >> save-operation, so I would opt for something like this being an >> optional operation that the user of POI has to call. Many users care >> more about runtime of the application than the size of the resulting >> document! >> >> If you are interesting in working on something like this, please >> create a bug in our bugtracker and post patches there fore >> review/inclusion. >> >> Dominik. >> >> On Mon, Sep 21, 2015 at 3:12 PM, Murphy, Mark <[email protected]> >> wrote: >>> POI 3.12 >>> >>> In looking at the styles.xml file generated by POI, It appears that a lot >>> of borders and fills are created that are not used. Maybe POI needs a >>> method to trim off those unused fills and borders on save. This method >>> would also be able to normalize the fills and borders to remove duplicates >>> and system colors (i.e. fgColor = 64 or bgColor = 65). I notice that Excel >>> does this when it saves a spreadsheet created by POI. Here is an example of >>> the fills, first as created by POI, second as saved by Excel (same >>> spreadsheet): >>> >>> POI 3.12 >>> <fills count="9"> >>> <fill> >>> <patternFill patternType="none" /> >>> </fill> >>> <fill> >>> <patternFill patternType="darkGray" /> >>> </fill> >>> <fill> >>> <patternFill patternType="none"> >>> <fgColor indexed="64" /> >>> </patternFill> >>> </fill> >>> <fill> >>> <patternFill patternType="none"> >>> <fgColor indexed="64" /> >>> <bgColor indexed="65" /> >>> </patternFill> >>> </fill> >>> <fill> >>> <patternFill> >>> <fgColor indexed="64" /> >>> <bgColor indexed="65" /> >>> </patternFill> >>> </fill> >>> <fill> >>> <patternFill patternType="none"> >>> <fgColor indexed="44" /> >>> </patternFill> >>> </fill> >>> <fill> >>> <patternFill patternType="none"> >>> <fgColor indexed="44" /> >>> <bgColor indexed="65" /> >>> </patternFill> >>> </fill> >>> <fill> >>> <patternFill patternType="solid"> >>> <fgColor indexed="44" /> >>> <bgColor indexed="65" /> >>> </patternFill> >>> </fill> >>> <fill> >>> <patternFill patternType="none"> >>> <bgColor indexed="65" /> >>> </patternFill> >>> </fill> >>> </fills> >>> >>> Excel 2010 Save As (same spreadsheet) >>> <fills count="4"> >>> <fill> >>> <patternFill patternType="none" /> >>> </fill> >>> <fill> >>> <patternFill patternType="gray125" /> >>> </fill> >>> <fill> >>> <patternFill patternType="none" /> >>> </fill> >>> <fill> >>> <patternFill patternType="solid"> >>> <fgColor indexed="44" /> >>> </patternFill> >>> </fill> >>> </fills> >>> >>> Note, not even Excel was perfect as it left a duplicate fill (0, and 2). >>> Borders was even more dramatic POI generated 94, Excel saved 13. And now >>> that I look at it, POI generated 42 styles while Excel kept 23. I >>> understand that you are keeping all the fills borders and styles as they >>> are specified, and re-using them as you get to a final setting, but >>> normalizing and trimming seem to be what you are missing. That can also be >>> applied to fonts and formats. >>> >>> Some additional thoughts, based on not yet looking at all the code. I can >>> see three steps to collecting all the garbage: Normalization, Duplicate >>> removal, Orphan removal. Normalization would involve setting default >>> values, and removing system colors. On the fill above, the POI generated >>> fill would look like this after normalization: >>> >>> <fills count="9"> >>> <fill> >>> <patternFill patternType="none" /> >>> </fill> >>> <fill> >>> <patternFill patternType="darkGray" /> >>> </fill> >>> <fill> >>> <patternFill patternType="none" /> >>> </fill> >>> <fill> >>> <patternFill patternType="none" /> >>> </fill> >>> <fill> >>> <patternFill patternType="none" /> >>> </fill> >>> <fill> >>> <patternFill patternType="none"> >>> <fgColor indexed="44" /> >>> </patternFill> >>> </fill> >>> <fill> >>> <patternFill patternType="none"> >>> <fgColor indexed="44" /> >>> </patternFill> >>> </fill> >>> <fill> >>> <patternFill patternType="solid"> >>> <fgColor indexed="44" /> >>> </patternFill> >>> </fill> >>> <fill> >>> <patternFill patternType="none" /> >>> </fill> >>> </fills> >>> >>> Duplicate removal would remove the many duplicates left by normalization. >>> This would have to be carried back to the styles themselves and potentially >>> to individual cells in the spreadsheet that contain fill id's. After >>> duplicate removal you would have something like the following in the fills: >>> >>> <fills count="4"> >>> <fill> >>> <patternFill patternType="none" /> >>> </fill> >>> <fill> >>> <patternFill patternType="darkGray" /> >>> </fill> >>> <fill> >>> <patternFill patternType="none"> >>> <fgColor indexed="44" /> >>> </patternFill> >>> </fill> >>> <fill> >>> <patternFill patternType="solid"> >>> <fgColor indexed="44" /> >>> </patternFill> >>> </fill> >>> </fills> >>> >>> And I just happen to know that my spreadsheet never uses fill "none" - 44, >>> so that could d be stripped out during orphan removal to bring the fill to: >>> >>> <fills count="3"> >>> <fill> >>> <patternFill patternType="none" /> >>> </fill> >>> <fill> >>> <patternFill patternType="darkGray" /> >>> </fill> >>> <fill> >>> <patternFill patternType="solid"> >>> <fgColor indexed="44" /> >>> </patternFill> >>> </fill> >>> </fills> >>> >>> I suspect that the same process could be used for borders, fonts, formats >>> and styles where the order of operation would be normalization and >>> duplicate removal for fills, borders, fonts, formats, then styles. This >>> followed by orphan removal for styles then fills, borders, fonts, and >>> formats. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
