I am doing exactly what Kai is, with the specification that the user input and output can come from web representations (or RESTfully) of the spreadsheet, so POI isn't exactly involved in those. (I used the To-HTML example from Ken Arnold—and what a treat to run into one of the fathers of Rogue here!—as a jumping off point.) And because it's a web app, it has an unknown and variable number of users. The page updates dynamically, as well, so with each cell entered (or even each key pressed!), the input data is sent to the server, the spreadsheet is populated, and the results sent back. Each action is done on a "fresh" copy; there's no persistent state for any given user.
For most of the spreadsheets, it doesn't really matter. They're under 100K and it's a trivial thing to load, parse and process things of that size. One of the spreadsheets is 4MB (taking up around 500MB in memory!) which takes POI some time to parse—10 seconds!*—which is too much time to wait for a recalc (which only starts after the 10s parsing). You can anticipate the need, as you say, and preload, but it becomes a lot trickier (and ultimately not much different from caching and pooling) with multiple users or (as happens with a web interface) the user makes requests in short order, such that the previous request may not have finished by the time the next request comes in. That's one reason a clone would be useful—presuming, of course, that cloning would be much faster than loading and parsing. *N.B. this spreadsheet will load nigh instantly in Excel, but on the other hand it's a thing of beauty that we're loading up a massive data-intensive, calculation heavy workbook without Excel.
