> --------------------------------------- > From: dennis <[email protected]> > To: [email protected] > Subject: [dev] Parallel functions in OpenOffice spreadsheet > Date: Fri, 12 Aug 2011 21:23:42 +0300 > > Hello, > My name is Dennis Groisman and I am a student at Ben Gurion University > in Be'er Sheva, Israel.
Hi Dennis -- As you've probably noticed, I've forwarded your note to the ooo-dev list at Apache. We'll try to keep you copied on this thread. You are also welcome to subscribe to the list by sending an email to : [email protected] > I'm studying electronics and computer engineering. Me and my > colleague, Tal Benach, are starting a project that involves speeding > up spreadsheet functions with long computation time by parallel run on > more than one core or even on a computing cloud/grid. > We would be very glad if you could assist us by answering some of our > questions: > > * Do you have any information about projects similar to ours? I'm not aware of similar work. But there may be three reasonable approaches: 1) Parallelize the pre-defined spreadsheet functions, e.g. SUM(), AVERAGE(), etc. 2) Parallelize the calculation of the formulas in individual cells, e.g. determine a topological sort of the cell dependencies and then dispatch calculation of the un-calculated leaf cells to multiple threads. 3) Decompose a spreadsheet into multiple graphs of cells that are part of the same calculation chain, and then calculate the independent graphs in parallel. > * Where can we get a list of functions implemented for OpenOffice? You can find a list here: http://wiki.services.openoffice.org/wiki/Documentation/How_Tos/Calc:_Functions_listed_by_category > * Do you know about "slow" functions that need a speed-up in > spreadsheet OpenOffice? I don't know of any such analysis. But you can see the wiki page on the Calc performance work here: http://wiki.services.openoffice.org/wiki/Performance Generally, a spreadsheet is intended to be interactive from an end-user's perspective. So any calculation that has a delay more than a second or two is annoying. The hard part is to maintain that responsiveness as users scale from hundreds of rows of data to hundreds of thousands of rows, So, there are a few ways of looking at this: 1) Look at the most used functions, like, SUM, AVERAGE, IF, etc. These will be frequent in large spreadsheets as well, so they will have a large influence on overall calc time in those cases. 2) Look at the functions that are more specialized, but which lend themselves to large speedups from parallel execution, e.g., SUMSQ: http://wiki.services.openoffice.org/wiki/Documentation/How_Tos/Calc:_SUMSQ_function 3) Find a large spreadsheet file that is particularly slow, and optimize that. The formulas that you might think are intrinsically expensive to calculate, the like Bessel functions, etc., are rare in spreadsheets. And they don't operate on large ranges of cells. So my guess is that the most common functions, used in large spreadsheets, would benefit most from taking advantage of multi-core. The grid/cloud opportunity is less clear. In most cases, calculations are interactive, 1-2 seconds top. There are some less-common cases, such as doing linear or non-linear programming and other constrained optimization problems, typically done via add-ins. These can take several minutes or even hours to run on large models. These might be good candidates for cloud/grid. But this is not the typical case for the core calculation code. > * Which OS is better when working in OpenOffice? Windows or Linux? It is easier to build OOo on Linux or Mac than on Windows. > * Where can we get the most up-to-date version of a spreadsheet source code? > OOo is stored in Mercurial here: http://hg.services.openoffice.org/OOO340 You might also look at these instructions, which allows you to add a new spreadsheet function to OOo. This might be good for experimenting: http://wiki.services.openoffice.org/wiki/Calc/Add-In/Simple_Calc_Add-In > We both thank you for your assistance, > Dennis Groisman and Tal Benach >
