On Wed, 4 May 2011, Alex Geller wrote:
The good news is that it is fast enough, uses little memory and handles all the decorational items we were worrying about (CellStyles, MergedRegions and Images), just fine. A .xlsx file with 300,000 rows and 33 columns is generated in 50 seconds using the default heap space of 64 MB.

Great news

The BigGridDemo should be improved to replace the "sheetData" element with the real data rather then than replacing the complete file because MergedRegions, Image references and other information such as margins are otherwise lost on overwriting the file.

I think (as you hint later) that the demo is probably approaching the point where it can be re-done as a streaming api similar to the read one, rather than an example you write your code into.

Since we need to continue supporting HSSF as well as XSSF we implemented an API compatible streaming version of XSSF called SXSSF ("S" for streaming) so that we wouldn't have to change the code and could have identical code for both file formats (Apart from the "Workbook" construction of course (change from "new XSSFWorkbook" to "new SXSSFWorkbook")).

Interesting idea. I'd probably say that an event model writing code would be the first thing to nail down, then possibly an event backed workbook implementation would be next after that. If you've done much of that then it could help!


If you're happy, what I'd suggest is you post your code to a new bug in bugzilla, then we all work together to get the code into svn


We have some open points which don't bother us for the moment because we are using only string, numeric and date cell values. We are not using formulas, rich text, hyperlinks and comments but we may in the future:

What cell types and values have to be written into the "c" elements of the "sheetData" structure for the cell types BLANK, FORMULA and ERROR? How is rich text written?

The quick way to check is just to look at some example files. Full details are in the microsoft specifications, which should be linked from the POI website.

How are comments and hyperlinks written into the file?

With a fair bit of xml around them... Take a look at some example files to see. It's quite fiddly, the xssf usermodel code should show you the details too

What are the exact semantics of Sheet.shiftRows()? Why is the code in HSSFSheet complicated? Is it perhaps because of formulas that are cell relative that need to be recomputed in the new location?

HSSF is often more complicated than XSSF, as more of the work needs to be done and more things need to be kept in sync. Formulas are just one bit

What should the return value for Row.getZeroHeight() be if Row.setZeroHeight() has not been called previously?

I'd suggest you crib off what xssf does?

How are cell formulas parsed? How is the type determined? What is the lifecycle of formulas? What is the precomputed value and when is is precomputed? What is the difference between "precomputed" and "cached" and when are values cached?

XSSF is much simpler than HSSF for this. Formulas are stored as text, and the cached/precomputed (same thing) value goes inline with them. You normally set those values at the end of writing the cells, using the formula evaluator. However, as you don't have all the cells in memory at once, any references will be very tricky. The best bet might be to set the force calculation flag, skip writing the cached values, and let excel do them on next load

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to