Re: [Libreoffice] Calc - a better design ...

2011-10-20 Thread Kevin Hunter

Hi Michael,

I'm hesitant to ask this because I cannot personally promise time toward 
LO (only on an as-can basis, which is dismally small ATM), but hey, you 
can easily so no.  :-)  You mention really need[ing] a whiteboard to 
elaborate properly.  I submit that putting your thoughts together, 
perhaps in picture form and available on the LO wiki, or put together as 
a small video to Youtube, would be extremely useful to casual LO coders 
like myself.  As an individual volunteer without a face-to-face LO team 
member against whom to bounce ideas, I'd thoroughly love an 
actual-paid-engineer's thoughts on how best to proceed on this front.


I'm personally motivated for Calc because in my science career, I really 
have to bend over backwards to make Calc work effectively for my needs 
where Excel works just plain better/faster/smaller, yet I 
philosophically have stuck myself with Free software.  To me, one of the 
biggest areas of weakness for LO, after the various random crashes 
(which are getting better!), is the memory bloat, and speed.  It's not 
features.


I'm happy to mess with Ixion (and indeed have poked at it some already), 
but I imagine the going would be much more useful with a LO core 
developer's guiding thoughts, including technical end-goals and 
migration paths.


So, would you have the time to create a screencast or video of yourself 
in front of a whiteboard?


Cheers,

Kevin

At 9:42am -0400 Wed, 27 Jul 2011, Michael Meeks wrote:

Hi Kevin,

On Wed, 2011-07-27 at 06:56 -0400, Kevin Hunter wrote:

At 6:07am -0400 Wed, 27 Jul 2011, Michael Meeks wrote:

but IMHO -the- fundamental design problem with calc, is something
quite banal - the concept that a spreadsheet is built from cells:
without breaking that basic misconception I don't think we can do
any of the really interesting space / time optimisations we need to
do.


Can you elaborate a little on this fundamental design flaw ?


Hah - so, yes - and not easily - I'd really need a whiteboard.


  As a naive and unfortunately focused elsewhere personality, I don't
immediately know of a better model for creating a spreadsheet.


So - of course, the first thing to say is that (at a first pass), I'd
want the UI to continue to look the same - all that would change would
be the underlying model.


   Is it just a problem of sparsity?  Or is there a much more
sophisticated method for memory sharing of various similar cell
attributes, perhaps analogous to CSS?


Here is the thing - since we work on a per-cell basis, all of our type
checking, and formula adaptation, and storage, and dependencies etc. are
all ultimately per-cell based. This (crudely) means that we have an O(n)
where n is the number of cells in vast numbers of operations where we do
not want it. It also tends to explode storage and computation time
around dependencies - which is a substantial cost.

NB. much of this is quite normal for a spreadsheet FWIW, I believe
Excel is not completely dis-similar; however I think we can do better
with much improved (much more complex / slow) data structure design that
will ultimately be far faster to execute.

Take a banal example; when we do a SUM() of a million rows containing
plain doubles, we would want to (at root) have a function that [ in the
ideal case ] does:

double sum (double *array, sal_Int64 nItems);

Which we can shove onto a gpu, multi-thread if nItems is big, etc. etc.
unfortunately approaching this limit is basically impossible in calc.
Instead for this case we would do a very slow, pointer-chasing iteration
over a million scattered ScCell's - we would do type checking - to
ensure that each one was a double, and only then would we add /
accumulate them.

Of course none of that can be pushed to a GPU, none of it can be SIMD
accelerated, and threading it is rather hard.

Now ... if we stored contiguous spans of identically typed items [ ie.
a column of doubles ], as value runs in our data structures; currently
we (amazingly) have arrays of (row/cell*) pairs incidentally. Then we
could push a lot of branch-heavy, expensive type checking, and lots of
pointer chasing outside the main-loop, we'd also use rather less memory.

That's fine for doubles of course; but the really huge wins come from
an entirely new model of dependencies and areas containing formulae - I
propose only storing a base formula, and an iterator to describe how
that formula changes down a row: to compress an entire column of
formulae to a single formula. Futhermore on top of that substantially
discarding the existing dependency mechanism such that a single-cell
change in a contiguous range of doubles would have a shared broadcast on
that whole range (with the specific detail of what cell changed), such
that that could be turned into a specific row (or row range) to
re-compute in any dependent by comparing the base formula, plus it's
iterator with the range that changed ;-)


Re: [Libreoffice] Calc - a better design ...

2011-10-20 Thread Michael Meeks
Hi Kevin,

On Thu, 2011-10-20 at 02:00 -0400, Kevin Hunter wrote:
 I'm hesitant to ask this because I cannot personally promise time toward 
 LO (only on an as-can basis, which is dismally small ATM), but hey, you 
 can easily so no.  :-)  You mention really need[ing] a whiteboard to 
 elaborate properly.

That really helps; having said that - I sat down with Eike  Kohei to
discuss this in Paris, and (I hope) managed to communicate the essence
of the idea.

   I submit that putting your thoughts together, perhaps in picture
 form and available on the LO wiki, or put together as a small video
 to Youtube, would be extremely useful to casual LO coders 
 like myself.

Sure - so first off, since I'm not actively hacking on calc (as of
now), this is really not my call. I tried to persuade Kohei  Eike of
the intrinsic improvements possible with the new design - if I'm lucky
then they agree that I'm not mad  might think about that. Of course - I
can create a video too, but ... ;-)

   As an individual volunteer without a face-to-face LO team 
 member against whom to bounce ideas, I'd thoroughly love an 
 actual-paid-engineer's thoughts on how best to proceed on this front.

So - the very essence of what I'd like to see happen in calc, and the
foundation for it - is to remove the idea that a spreadsheet is a
collection of 'Cell' objects. This seems (to me) to be the foundation of
our scalability problems.

 I'm personally motivated for Calc because in my science career, I really 
 have to bend over backwards to make Calc work effectively for my needs 
 where Excel works just plain better/faster/smaller, yet I 
 philosophically have stuck myself with Free software.  To me, one of the 
 biggest areas of weakness for LO, after the various random crashes 
 (which are getting better!), is the memory bloat, and speed.  It's not 
 features.

Right. So the biggest piece (I see) that need tackling here before we
can take advantage of the new code is to start restricting the scope of
'ScBaseCell' pointers in LibreOffice calc. Last I looked (which was a
while ago) we use ScBaseCell pointers all around the place for things
like undo/redo, change tracking, copy/paste, document construction etc.

If you wanted to re-start the effort to remove ScBaseCell's mpNote
pointer (which is very infrequently used) - that'd be a great place to
see some of the problems: ultimately I think we want to remove
ScBaseCell (and it's derivatives) entirely - leaving a (numeric) cell as
a single 'double' inside a fixed column-array of entries of the same
type.

Of course, even without the grand vision coming to fruition, saving 4
(or 8) bytes per cell would be worthwhile, and improving the above areas
to handle storage of ranges of cell contents in a better encapsulated
way would be rather valuable - I think.

But of course, you really want to talk to Eike / Kohei / Markus.

 I'm happy to mess with Ixion (and indeed have poked at it some already)

Right - IMHO, the real problem we have is not so much Ixion (which is
great), but massaging the existing code into a good shape to be ready
for it's heart transplant ;-) The above would be a great step in that
direction.

Of course, if the calc developers don't object, I'm happy to create a
video of me making a fool of myself with a whiteboard too if you think
it helps :-)

All the best,

Michael.

-- 
michael.me...@suse.com  , Pseudo Engineer, itinerant idiot

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Calc - a better design ...

2011-10-20 Thread Noel Grandin
Hi

Sounds like a good idea would be to create a master tracking bug in bugzilla 
around this plan,
then split off the different changes into blocking sub-bugs,
and mark some of the easier ones so that other people can start doing them.

Regards, Noel.

Michael Meeks wrote:
 Hi Kevin,

 On Thu, 2011-10-20 at 02:00 -0400, Kevin Hunter wrote:
 I'm hesitant to ask this because I cannot personally promise time toward 
 LO (only on an as-can basis, which is dismally small ATM), but hey, you 
 can easily so no.  :-)  You mention really need[ing] a whiteboard to 
 elaborate properly.
   That really helps; having said that - I sat down with Eike  Kohei to
 discuss this in Paris, and (I hope) managed to communicate the essence
 of the idea.

   I submit that putting your thoughts together, perhaps in picture
 form and available on the LO wiki, or put together as a small video
 to Youtube, would be extremely useful to casual LO coders 
 like myself.
   Sure - so first off, since I'm not actively hacking on calc (as of
 now), this is really not my call. I tried to persuade Kohei  Eike of
 the intrinsic improvements possible with the new design - if I'm lucky
 then they agree that I'm not mad  might think about that. Of course - I
 can create a video too, but ... ;-)

   As an individual volunteer without a face-to-face LO team 
 member against whom to bounce ideas, I'd thoroughly love an 
 actual-paid-engineer's thoughts on how best to proceed on this front.
   So - the very essence of what I'd like to see happen in calc, and the
 foundation for it - is to remove the idea that a spreadsheet is a
 collection of 'Cell' objects. This seems (to me) to be the foundation of
 our scalability problems.

 I'm personally motivated for Calc because in my science career, I really 
 have to bend over backwards to make Calc work effectively for my needs 
 where Excel works just plain better/faster/smaller, yet I 
 philosophically have stuck myself with Free software.  To me, one of the 
 biggest areas of weakness for LO, after the various random crashes 
 (which are getting better!), is the memory bloat, and speed.  It's not 
 features.
   Right. So the biggest piece (I see) that need tackling here before we
 can take advantage of the new code is to start restricting the scope of
 'ScBaseCell' pointers in LibreOffice calc. Last I looked (which was a
 while ago) we use ScBaseCell pointers all around the place for things
 like undo/redo, change tracking, copy/paste, document construction etc.

   If you wanted to re-start the effort to remove ScBaseCell's mpNote
 pointer (which is very infrequently used) - that'd be a great place to
 see some of the problems: ultimately I think we want to remove
 ScBaseCell (and it's derivatives) entirely - leaving a (numeric) cell as
 a single 'double' inside a fixed column-array of entries of the same
 type.

   Of course, even without the grand vision coming to fruition, saving 4
 (or 8) bytes per cell would be worthwhile, and improving the above areas
 to handle storage of ranges of cell contents in a better encapsulated
 way would be rather valuable - I think.

   But of course, you really want to talk to Eike / Kohei / Markus.

 I'm happy to mess with Ixion (and indeed have poked at it some already)
   Right - IMHO, the real problem we have is not so much Ixion (which is
 great), but massaging the existing code into a good shape to be ready
 for it's heart transplant ;-) The above would be a great step in that
 direction.

   Of course, if the calc developers don't object, I'm happy to create a
 video of me making a fool of myself with a whiteboard too if you think
 it helps :-)

   All the best,

   Michael.


Disclaimer: http://www.peralex.com/disclaimer.html


___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Calc - a better design ...

2011-07-27 Thread Kevin Hunter

At 9:42am -0400 Wed, 27 Jul 2011, Michael Meeks wrote:

Anyhow - as text, this is not terribly convincing; drawing on a
whiteboard would be more so, and with lots of working code - even
more so ;-) I keep hoping to have time to play with ixion with Kohei
in this regard.


Actually, as text it is convincing.  So my original analogy to CSS 
works, but your example of sum(*array) vs for(i = 0...) and making use 
of SIMD is also telling.


Who is working on these particular internals, if anybody?  Do I take 
from your last sentence that the answer is partially nobody?


I would /love/ to get my hands dirty on exactly this kind of 
restructuring and code work ... if only my hands weren't currently tied 
to a graduate degree that I also enjoy.  Sigh.


Cheers (and thanks for the extensive reply!),

Kevin
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice