Re: [sympy] Parallel evaluation of a mathematical expression

Aaron Meurer Fri, 21 Mar 2014 18:20:12 -0700

Is this an issue with any subtraction of similar magnitude, or only if
it changes the exponent of the number (i.e., it gets closer to 0)? In
other words, when you say "magnitude", do you mean "value", or "order
of magnitude (log2(x))"?


Aaron Meurer

On Mon, Mar 17, 2014 at 5:59 PM, Ondřej Čertík <[email protected]> wrote:
> On Mon, Mar 17, 2014 at 4:08 PM, Jason Moore <[email protected]> wrote:
>> Thanks. I'm a dummy when it comes to C, living my life mostly in high level
>> languages. Can I use a floating point number? I'm using doubles in the C
>> code because that is what this proprietary product generates for similar
>> systems that we used to use. It is pretty battle tested and I've never had
>> issues with the double precision nor heard anyone complain about it. Maybe
>> the kinds of expressions we generate are not as susceptible. Not sure.
>
> The example above is in Python, it has nothing to do with C. It's
> using Python floats, which
> are machine precision doubles usually. This is what is called
> "floating point number" usually.
>
> There is this famous essay "What Every Computer Scientist Should Know
> About Floating-Point Arithmetic"
>
> http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
>
> But it's kind of long. But basically +, *, / don't lose significant
> accuracy, but "-" loses. This is easy to understand,
> in Python:
>
> In [16]: a = 1+2e-16
>
> In [17]: a
> Out[17]: 1.0000000000000002
>
> We still around 16 significant digits. But if we do "minus":
>
> In [18]: a-1
> Out[18]: 2.220446049250313e-16
>
> We suddenly only have 1 significant digit correct, the rest is (in
> general) random garbage (to be precise, the number we got is the
> machine epsilon: http://en.wikipedia.org/wiki/Machine_epsilon, but
> that's not the point).
>
> The conclusion is, that when you use "minus", you need to *avoid*
> subtracting numbers of similar magnitude, to minimize the loss. E.g.
> if you subtract 1 and 2e-16, then it's fine:
>
> In [21]: 1-2e-16
> Out[21]: 0.9999999999999998
>
> No issue.
>
> In practice, the legendre polynomial has "minus" of numbers of similar
> magnitude, so you are totally out of luck and you simply can't use it.
> So that's why I was asking if you are sure that what you get for your
> complicated systems is actually correct and not garbage. For example
> if you were doing an equivalent of a legendre_poly(50, x) by
> evaluating the expression directly in C, you get garbage. So it's not
> some academic exercise.
>
> Ondrej
>
>>
>>
>> Jason
>> moorepants.info
>> +01 530-601-9791
>>
>>
>> On Mon, Mar 17, 2014 at 5:54 PM, Ondřej Čertík <[email protected]>
>> wrote:
>>>
>>> On Mon, Mar 17, 2014 at 3:35 PM, Jason Moore <[email protected]> wrote:
>>> > Ok, if we had some concrete examples of that non-cancellation then we
>>> > could
>>> > do something about it. This is goes for everything that we've been
>>> > implementing, if we don't have a good benchmark problem then it is hard
>>> > to
>>> > design something that solve it. We can look into this more when it
>>> > becomes
>>> > and issue, I guess.
>>>
>>> Just try this:
>>>
>>> In [1]: from sympy import *
>>>
>>> In [2]: var("x")
>>> Out[2]: x
>>>
>>> In [3]: e = legendre_poly(50, x)
>>>
>>> In [4]: e.subs(x, 0.9)
>>> Out[4]: 1.25000000000000
>>>
>>> In [5]: e.subs(x, S(9)/10).n()
>>> Out[5]: -0.170037659943837
>>>
>>> and it only gets worse:
>>>
>>> In [6]: e = legendre_poly(100, x)
>>>
>>> In [7]: e.subs(x, 0.9)
>>> Out[7]: -3.65635994747142e+17
>>>
>>> In [8]: e.subs(x, S(9)/10).n()
>>> Out[8]: 0.102265820558719
>>>
>>>
>>> In other words, double precision is simply not enough at all, and
>>> that's just Legendre polynomial of order 50, which is not much, i.e.
>>> it looks like this:
>>>
>>> In [9]: legendre_poly(50, x)
>>> Out[9]: 12611418068195524166851562157*x**50/140737488355328 -
>>> 156050375086257748529223875175*x**48/140737488355328 +
>>> 226836112238787036521861509275*x**46/35184372088832 -
>>> 823773249709279237895181270525*x**44/35184372088832 +
>>> 4189728463575151392735706892025*x**42/70368744177664 -
>>> 7928255400303748020099876118755*x**40/70368744177664 +
>>> 5790298887862287879848224131675*x**38/35184372088832 -
>>> 6684039602901787158511168414725*x**36/35184372088832 +
>>> 24770264410753681822717859419275*x**34/140737488355328 -
>>> 18602568051449552212241926551825*x**32/140737488355328 +
>>> 1423900270604780539702468452115*x**30/17592186044416 -
>>> 712769410486857922635873160725*x**28/17592186044416 +
>>> 583174972216520118520259858775*x**26/35184372088832 -
>>> 194391657405506706173419952925*x**24/35184372088832 +
>>> 26248579962778792027330678575*x**22/17592186044416 -
>>> 5693353963757653481984400705*x**20/17592186044416 +
>>> 7838675747202566388239392275*x**18/140737488355328 -
>>> 1052956443654076082002306425*x**16/140737488355328 +
>>> 26998883170617335435956575*x**14/35184372088832 -
>>> 2052546673789621992207225*x**12/35184372088832 +
>>> 222078820442811559812585*x**10/70368744177664 -
>>> 8065816723104536070675*x**8/70368744177664 +
>>> 90048990529077755175*x**6/35184372088832 -
>>> 1067774591253886425*x**4/35184372088832 +
>>> 20146690401016725*x**2/140737488355328 -
>>> 15801325804719/140737488355328
>>>
>>>
>>> By "minus" sign, I mean all the minus signs in there, which cause the
>>> cancellation.
>>>
>>> Ondrej
>>>
>>>
>>> >
>>> > Yes, we should have an option in the code generation to generate a
>>> > function
>>> > that can evaluate the Jacobian of the RHS. This can be used directly for
>>> > implicit ODE integration routines that are best suited for for stiff
>>> > problems.
>>> >
>>> >
>>> > Jason
>>> > moorepants.info
>>> > +01 530-601-9791
>>> >
>>> >
>>> > On Mon, Mar 17, 2014 at 5:26 PM, Gilbert Gede <[email protected]>
>>> > wrote:
>>> >>
>>> >> Jason, I've definitely seen examples where small errors in subtraction
>>> >> can
>>> >> lead to larger errors later on - especially if things are supposed to
>>> >> cancel
>>> >> out. In our domain, I think it pops up in mass/inertia division
>>> >> operations.
>>> >> I'm not sure how often it happens with real-world examples vs.
>>> >> academic/idealized examples though.
>>> >>
>>> >> I would definitely agree that step-size (for stiff systems) is the
>>> >> bigger
>>> >> challenge in practice - although I believe providing the Jacobian of
>>> >> your
>>> >> RHS function can help with that. That might be something we want to
>>> >> automate
>>> >> in the code-gen module.
>>> >>
>>> >> BTW, thanks for starting this discussion. Very informative.
>>> >>
>>> >> -Gilbert
>>> >>
>>> >>
>>> >> On Mon, Mar 17, 2014 at 2:13 PM, Jason Moore <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> For numerical ODE integration, I don't think double precision presents
>>> >>> near as much an issue in accuracy as picking the correct step size. We
>>> >>> usually fight the step size issue and once that is good the accuracy
>>> >>> is
>>> >>> adequate for the solutions we need. I've never run into an issue with
>>> >>> double
>>> >>> precision being limiting. So I don't really know. I think we can often
>>> >>> get
>>> >>> away with much lower accuracy than double precision for realistic
>>> >>> simulations of typically multibody systems.
>>> >>>
>>> >>> Wow, so a minus sign can cause errors so big that you only get 1 or 2
>>> >>> digits of accuracy. That's new to me. I'm not sure what minus signs
>>> >>> you are
>>> >>> talking about. What would you do instead of minus signs?
>>> >>>
>>> >>> But like I said, numerical accuracy is generally not an issue for our
>>> >>> systems, unless the ODE integration routine is bad.
>>> >>>
>>> >>>
>>> >>> Jason
>>> >>> moorepants.info
>>> >>> +01 530-601-9791
>>> >>>
>>> >>>
>>> >>> On Mon, Mar 17, 2014 at 1:00 PM, Ondřej Čertík
>>> >>> <[email protected]>
>>> >>> wrote:
>>> >>>>
>>> >>>> On Mon, Mar 17, 2014 at 9:21 AM, Jason Moore <[email protected]>
>>> >>>> wrote:
>>> >>>> > I'm still digesting what Matthew and Max wrote. Lots of new words
>>> >>>> > for
>>> >>>> > me :)
>>> >>>> > But here is a simple example taken from C code we generate for a
>>> >>>> > simple 2
>>> >>>> > link pendulum.
>>> >>>> >
>>> >>>> > First the C code with SymPy's CSE expressions automatically
>>> >>>> > generated:
>>> >>>> >
>>> >>>> > #include <math.h>
>>> >>>> > #include "multibody_system_c.h"
>>> >>>> >
>>> >>>> > void mass_forcing(double constants[6], // constants = [g, m0, l0,
>>> >>>> > m1,
>>> >>>> > l1,
>>> >>>> > m2]
>>> >>>> >                   double coordinates[3], // coordinates = [q0, q1,
>>> >>>> > q2]
>>> >>>> >                   double speeds[3], // speeds = [u0, u1, u2]
>>> >>>> >                   double mass_matrix[36], // computed
>>> >>>> >                   double forcing_vector[6]) // computed
>>> >>>> > {
>>> >>>> >     // common subexpressions
>>> >>>> >     double z_0 = coordinates[1];
>>> >>>> >     double z_1 = sin(z_0);
>>> >>>> >     double z_2 = constants[2]*z_1;
>>> >>>> >     double z_3 = -constants[3]*z_2 - constants[5]*z_2;
>>> >>>> >     double z_4 = coordinates[2];
>>> >>>> >     double z_5 = sin(z_4);
>>> >>>> >     double z_6 = -constants[4]*constants[5]*z_5;
>>> >>>> >     double z_7 = pow(constants[2], 2);
>>> >>>> >     double z_8 = constants[2]*constants[4]*constants[5];
>>> >>>> >     double z_9 = cos(z_0);
>>> >>>> >     double z_10 = cos(z_4);
>>> >>>> >     double z_11 = z_8*(z_1*z_5 + z_10*z_9);
>>> >>>> >     double z_12 = speeds[1];
>>> >>>> >     double z_13 = speeds[2];
>>> >>>> >     double z_14 = pow(z_12, 2);
>>> >>>> >     double z_15 = constants[2]*z_14*z_9;
>>> >>>> >     double z_16 = pow(z_13, 2);
>>> >>>> >     double z_17 = constants[4]*constants[5]*z_10;
>>> >>>> >     double z_18 = constants[0]*constants[2]*z_9;
>>> >>>> >     double z_19 = z_5*z_9;
>>> >>>> >     double z_20 = z_1*z_10;
>>> >>>> >
>>> >>>> >     // mass matrix
>>> >>>> >     mass_matrix[0] = 1;
>>> >>>> >     mass_matrix[1] = 0;
>>> >>>> >     mass_matrix[2] = 0;
>>> >>>> >     mass_matrix[3] = 0;
>>> >>>> >     mass_matrix[4] = 0;
>>> >>>> >     mass_matrix[5] = 0;
>>> >>>> >     mass_matrix[6] = 0;
>>> >>>> >     mass_matrix[7] = 1;
>>> >>>> >     mass_matrix[8] = 0;
>>> >>>> >     mass_matrix[9] = 0;
>>> >>>> >     mass_matrix[10] = 0;
>>> >>>> >     mass_matrix[11] = 0;
>>> >>>> >     mass_matrix[12] = 0;
>>> >>>> >     mass_matrix[13] = 0;
>>> >>>> >     mass_matrix[14] = 1;
>>> >>>> >     mass_matrix[15] = 0;
>>> >>>> >     mass_matrix[16] = 0;
>>> >>>> >     mass_matrix[17] = 0;
>>> >>>> >     mass_matrix[18] = 0;
>>> >>>> >     mass_matrix[19] = 0;
>>> >>>> >     mass_matrix[20] = 0;
>>> >>>> >     mass_matrix[21] = constants[1] + constants[3] + constants[5];
>>> >>>> >     mass_matrix[22] = z_3;
>>> >>>> >     mass_matrix[23] = z_6;
>>> >>>> >     mass_matrix[24] = 0;
>>> >>>> >     mass_matrix[25] = 0;
>>> >>>> >     mass_matrix[26] = 0;
>>> >>>> >     mass_matrix[27] = z_3;
>>> >>>> >     mass_matrix[28] = constants[3]*z_7 + constants[5]*z_7;
>>> >>>> >     mass_matrix[29] = z_11;
>>> >>>> >     mass_matrix[30] = 0;
>>> >>>> >     mass_matrix[31] = 0;
>>> >>>> >     mass_matrix[32] = 0;
>>> >>>> >     mass_matrix[33] = z_6;
>>> >>>> >     mass_matrix[34] = z_11;
>>> >>>> >     mass_matrix[35] = pow(constants[4], 2)*constants[5];
>>> >>>> >
>>> >>>> >     // forcing vector
>>> >>>> >     forcing_vector[0] = speeds[0];
>>> >>>> >     forcing_vector[1] = z_12;
>>> >>>> >     forcing_vector[2] = z_13;
>>> >>>> >     forcing_vector[3] = constants[3]*z_15 + constants[5]*z_15 +
>>> >>>> > z_16*z_17;
>>> >>>> >     forcing_vector[4] = -constants[3]*z_18 - constants[5]*z_18 +
>>> >>>> > z_16*z_8*(z_19 - z_20);
>>> >>>> >     forcing_vector[5] = -constants[0]*z_17 + z_14*z_8*(-z_19 +
>>> >>>> > z_20);
>>> >>>> > }
>>> >>>> >
>>> >>>> >
>>> >>>> > Now I manually group these expression evaluations into "stacks",
>>> >>>> > i.e.
>>> >>>> > those
>>> >>>> > calls which could happen in parallel (there is of course a bit more
>>> >>>> > complicated dependency graph you can draw so that you maximize the
>>> >>>> > time that
>>> >>>> > your cores have a task).
>>> >>>> >
>>> >>>> > // These are not computations but just value assignments.
>>> >>>> > z_0 = coordinates[1];
>>> >>>> > z_4 = coordinates[2];
>>> >>>> > z_12 = speeds[1];
>>> >>>> > z_13 = speeds[2];
>>> >>>> > mass_matrix[0] = 1;
>>> >>>> > mass_matrix[1] = 0;
>>> >>>> > mass_matrix[2] = 0;
>>> >>>> > mass_matrix[3] = 0;
>>> >>>> > mass_matrix[4] = 0;
>>> >>>> > mass_matrix[5] = 0;
>>> >>>> > mass_matrix[6] = 0;
>>> >>>> > mass_matrix[7] = 1;
>>> >>>> > mass_matrix[8] = 0;
>>> >>>> > mass_matrix[9] = 0;
>>> >>>> > mass_matrix[10] = 0;
>>> >>>> > mass_matrix[11] = 0;
>>> >>>> > mass_matrix[12] = 0;
>>> >>>> > mass_matrix[13] = 0;
>>> >>>> > mass_matrix[14] = 1;
>>> >>>> > mass_matrix[15] = 0;
>>> >>>> > mass_matrix[16] = 0;
>>> >>>> > mass_matrix[17] = 0;
>>> >>>> > mass_matrix[18] = 0;
>>> >>>> > mass_matrix[19] = 0;
>>> >>>> > mass_matrix[20] = 0;
>>> >>>> > mass_matrix[24] = 0;
>>> >>>> > mass_matrix[25] = 0;
>>> >>>> > mass_matrix[26] = 0;
>>> >>>> > mass_matrix[30] = 0;
>>> >>>> > mass_matrix[31] = 0;
>>> >>>> > mass_matrix[32] = 0;
>>> >>>> > forcing_vector[0] = speeds[0];
>>> >>>> > forcing_vector[1] = z_12;
>>> >>>> > forcing_vector[2] = z_13;
>>> >>>> >
>>> >>>> > // These are computations that involve the initial values passed
>>> >>>> > into
>>> >>>> > the
>>> >>>> > // function, i.e. stack #1.
>>> >>>> > z_7 = pow(constants[2], 2);
>>> >>>> > z_8 = constants[2]*constants[4]*constants[5];
>>> >>>> > z_14 = pow(z_12, 2);
>>> >>>> > z_16 = pow(z_13, 2);
>>> >>>> > mass_matrix[21] = constants[1] + constants[3] + constants[5];
>>> >>>> > mass_matrix[35] = pow(constants[4], 2)*constants[5];
>>> >>>> >
>>> >>>> > // Stack #2
>>> >>>> > z_1 = sin(z_0);
>>> >>>> > z_5 = sin(z_4);
>>> >>>> > z_9 = cos(z_0);
>>> >>>> > z_10 = cos(z_4);
>>> >>>> > z_2 = constants[2]*z_1;
>>> >>>> > mass_matrix[28] = constants[3]*z_7 + constants[5]*z_7;
>>> >>>> >
>>> >>>> > // Stack #3
>>> >>>> > z_3 = -constants[3]*z_2 - constants[5]*z_2;
>>> >>>> > z_6 = -constants[4]*constants[5]*z_5;
>>> >>>> > z_11 = z_8*(z_1*z_5 + z_10*z_9);
>>> >>>> > z_15 = constants[2]*z_14*z_9;
>>> >>>> > z_17 = constants[4]*constants[5]*z_10;
>>> >>>> > z_18 = constants[0]*constants[2]*z_9;
>>> >>>> > z_19 = z_5*z_9;
>>> >>>> > z_20 = z_1*z_10;
>>> >>>> >
>>> >>>> > // Stack #4
>>> >>>> > mass_matrix[22] = z_3;
>>> >>>> > mass_matrix[23] = z_6;
>>> >>>> > mass_matrix[27] = z_3;
>>> >>>> > mass_matrix[29] = z_11;
>>> >>>> > mass_matrix[33] = z_6;
>>> >>>> > mass_matrix[34] = z_11;
>>> >>>> > forcing_vector[3] = constants[3]*z_15 + constants[5]*z_15 +
>>> >>>> > z_16*z_17;
>>> >>>> > forcing_vector[4] = -constants[3]*z_18 - constants[5]*z_18 +
>>> >>>> > z_16*z_8*(z_19
>>> >>>> > - z_20);
>>> >>>> > forcing_vector[5] = -constants[0]*z_17 + z_14*z_8*(-z_19 + z_20);
>>> >>>> >
>>> >>>> >
>>> >>>> > So this simplified example of the dependencies in the CSE's shows
>>> >>>> > that
>>> >>>> > if I
>>> >>>> > had enough cores available I could parallelize each stack,
>>> >>>> > potentially
>>> >>>> > increasing the execution speed. So instead of 31 evaluations, you
>>> >>>> > could have
>>> >>>> > 4 evaluations in parallel, ideally a 7.75x speedup. For more
>>> >>>> > complicated
>>> >>>> > problems, there could be thousands and thousands of these CSEs, but
>>> >>>> > I'll
>>> >>>> > need to generate their dependencies with code to see if things
>>> >>>> > stack
>>> >>>> > this
>>> >>>> > nicely for the big problems. I suspect the dependency chain could
>>> >>>> > be
>>> >>>> > such
>>> >>>> > that the higher number stacks could have hundreds of expressions
>>> >>>> > whereas the
>>> >>>> > lower stacks have fewer, or vice versa.
>>> >>>> >
>>> >>>> > How do I generate a DAG for long expressions in SymPy? Is this part
>>> >>>> > of
>>> >>>> > the
>>> >>>> > internal architecture of SymPy expressions? I don't understand how
>>> >>>> > the
>>> >>>> > cse()
>>> >>>> > code works yet either, but it seems like this information should be
>>> >>>> > computed
>>> >>>> > already. I just need to visualize the graph for some of our bigger
>>> >>>> > problems.
>>> >>>> >
>>> >>>> > Also, the for the number of scalars and number of operations in
>>> >>>> > each.
>>> >>>> > Here
>>> >>>> > is an bigger problem with 2000 or so CSE's:
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> > https://github.com/moorepants/dissertation/blob/master/src/extensions/arms/ArmsDynamics.c
>>> >>>>
>>> >>>> One thing I wonder is ---- how accurate are your double precision
>>> >>>> results from the C code?
>>> >>>> Did you try to compare it with high accuracy (e.g. 100 digits) in
>>> >>>> SymPy using Floats?
>>> >>>>
>>> >>>> The "minus" sign always causes some numerical cancellation.
>>> >>>>
>>> >>>> In my experience, if the symbolic expressions grow into hundrends of
>>> >>>> terms and if you use few
>>> >>>> "minus" signs in there, you always get numerical cancellations that
>>> >>>> sometimes are just too big
>>> >>>> for double precision to handle and you get bogus numbers, or only 1
>>> >>>> or
>>> >>>> 2 digits accuracy.
>>> >>>>
>>> >>>> One has to be very careful about this.
>>> >>>>
>>> >>>>
>>> >>>> Otherwise I think it's a great idea to parallelize the evaluation.
>>> >>>> Btw, with CSymPy I am also
>>> >>>> interested in parallelizing the symbolic manipulation, including also
>>> >>>> the numerical evaluation
>>> >>>> (both double precision and higher accuracy). One can do it on the
>>> >>>> "library" level as well
>>> >>>> as on the code generation level, which is the problem you are
>>> >>>> tackling
>>> >>>> now.
>>> >>>>
>>> >>>> Ondrej
>>> >>>>
>>> >>>> >
>>> >>>> > This problem has 12 scalars that have 2000+ CSE's and there are
>>> >>>> > 5840
>>> >>>> > additions and subtractions, 9847 multiplications and divisions, 14
>>> >>>> > cosines,
>>> >>>> > and 14 sines. So roughly 1300 operations per scalar.
>>> >>>> >
>>> >>>> >
>>> >>>> > Jason
>>> >>>> > moorepants.info
>>> >>>> > +01 530-601-9791
>>> >>>> >
>>> >>>> >
>>> >>>> > On Mon, Mar 17, 2014 at 12:06 AM, Matthew Rocklin
>>> >>>> > <[email protected]>
>>> >>>> > wrote:
>>> >>>> >>
>>> >>>> >> Response from Max follows (for some reason he was getting bounced
>>> >>>> >> by
>>> >>>> >> the
>>> >>>> >> mailing list).
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> On Sun, Mar 16, 2014 at 8:55 PM, Max Hutchinson
>>> >>>> >> <[email protected]>
>>> >>>> >> wrote:
>>> >>>> >>>
>>> >>>> >>> tl;dr it depends on the DAG, but improved ILP is is likely
>>> >>>> >>> possible
>>> >>>> >>> (if
>>> >>>> >>> difficult) and there could be room for multi-core parallelism as
>>> >>>> >>> well.
>>> >>>> >>>
>>> >>>> >>> As I understand it, we're talking about a long computation
>>> >>>> >>> applied
>>> >>>> >>> to
>>> >>>> >>> short input vectors.  If the computation can be applied to many
>>> >>>> >>> input
>>> >>>> >>> vectors at once, independent of each other, then all levels of
>>> >>>> >>> parallelism
>>> >>>> >>> (multiple instructions, multiple cores, multiple sockets,
>>> >>>> >>> multiple
>>> >>>> >>> nodes)
>>> >>>> >>> can be used.  This is data-parallelism, which is great! However,
>>> >>>> >>> it
>>> >>>> >>> doesn't
>>> >>>> >>> sound like this is the case.
>>> >>>> >>>
>>> >>>> >>> It sounds like you're thinking of building a DAG of these CSEs
>>> >>>> >>> and
>>> >>>> >>> trying
>>> >>>> >>> to use task-parallelism over independent parts of it
>>> >>>> >>> (automatically
>>> >>>> >>> using
>>> >>>> >>> sympy or theano or what have you).  The tension here is going to
>>> >>>> >>> be
>>> >>>> >>> between
>>> >>>> >>> locality and parallelism: how much compute hardware can you
>>> >>>> >>> spread
>>> >>>> >>> your data
>>> >>>> >>> across without losing the nice cache performance that your small
>>> >>>> >>> input
>>> >>>> >>> vectors gain you.  I'd bet that going off-socket is way too wide.
>>> >>>> >>> Modern
>>> >>>> >>> multi-core architectures have core-local L2 and L1 caches, so if
>>> >>>> >>> your input
>>> >>>> >>> data fits nicely into L2 and your DAG isn't really local, you
>>> >>>> >>> probably won't
>>> >>>> >>> get anything out of multiple-cores.  Your last stand is
>>> >>>> >>> single-core
>>> >>>> >>> parallelism (instruction-level parallelism), which sympy et al
>>> >>>> >>> may
>>> >>>> >>> or may
>>> >>>> >>> not be well equipped to influence.
>>> >>>> >>>
>>> >>>> >>> To start, I'd recommend that you take a look at your DAGs and try
>>> >>>> >>> to
>>> >>>> >>> figure out how large the independent chunks are.  Then, estimate
>>> >>>> >>> the
>>> >>>> >>> amount
>>> >>>> >>> of instruction level parallelism when you run in 'serial' (which
>>> >>>> >>> you
>>> >>>> >>> can do
>>> >>>> >>> with flop-counting).  If your demonstrated ILP is less than your
>>> >>>> >>> independent
>>> >>>> >>> chunk size, then at least improved ILP should be possible.
>>> >>>> >>> Automatically
>>> >>>> >>> splitting up these DAGs and expressing them in a low-level enough
>>> >>>> >>> way to
>>> >>>> >>> affect ILP is a considerable task, though.
>>> >>>> >>>
>>> >>>> >>> To see if multi-core parallelism is worth it, you need to
>>> >>>> >>> estimate
>>> >>>> >>> how
>>> >>>> >>> many extra L3 loads you'd incur by spreading your data of
>>> >>>> >>> multiple
>>> >>>> >>> L2s.  I
>>> >>>> >>> don't have great advice for that, maybe someone else here does.
>>> >>>> >>> The
>>> >>>> >>> good
>>> >>>> >>> news is that if your problem has this level of locality, then you
>>> >>>> >>> can
>>> >>>> >>> probably get away with emitting C code with pthreads or even
>>> >>>> >>> openmp.
>>> >>>> >>> Just
>>> >>>> >>> bear in mind the thread creation/annihilation overhead (standing
>>> >>>> >>> thread-pools are your friend) and pin them to cores.
>>> >>>> >>>
>>> >>>> >>> Good luck,
>>> >>>> >>> Max
>>> >>>> >>
>>> >>>> >> --
>>> >>>> >> You received this message because you are subscribed to the Google
>>> >>>> >> Groups
>>> >>>> >> "sympy" group.
>>> >>>> >> To unsubscribe from this group and stop receiving emails from it,
>>> >>>> >> send an
>>> >>>> >> email to [email protected].
>>> >>>> >> To post to this group, send email to [email protected].
>>> >>>> >> Visit this group at http://groups.google.com/group/sympy.
>>> >>>> >> To view this discussion on the web visit
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> https://groups.google.com/d/msgid/sympy/CAJ8oX-Hc2y9C7FO07kkeraDAv7NNRGPkMJ2DvjgF2Oq7PzeS6g%40mail.gmail.com.
>>> >>>> >>
>>> >>>> >> For more options, visit https://groups.google.com/d/optout.
>>> >>>> >
>>> >>>> >
>>> >>>> > --
>>> >>>> > You received this message because you are subscribed to the Google
>>> >>>> > Groups
>>> >>>> > "sympy" group.
>>> >>>> > To unsubscribe from this group and stop receiving emails from it,
>>> >>>> > send
>>> >>>> > an
>>> >>>> > email to [email protected].
>>> >>>> > To post to this group, send email to [email protected].
>>> >>>> > Visit this group at http://groups.google.com/group/sympy.
>>> >>>> > To view this discussion on the web visit
>>> >>>> >
>>> >>>> >
>>> >>>> > https://groups.google.com/d/msgid/sympy/CAP7f1AggyuHprs7H%2B_PSVP37kG%2BWQv%3DuzSsBpiExh9U4HQe0dA%40mail.gmail.com.
>>> >>>> >
>>> >>>> > For more options, visit https://groups.google.com/d/optout.
>>> >>>>
>>> >>>> --
>>> >>>> You received this message because you are subscribed to the Google
>>> >>>> Groups "sympy" group.
>>> >>>> To unsubscribe from this group and stop receiving emails from it,
>>> >>>> send
>>> >>>> an email to [email protected].
>>> >>>> To post to this group, send email to [email protected].
>>> >>>> Visit this group at http://groups.google.com/group/sympy.
>>> >>>> To view this discussion on the web visit
>>> >>>>
>>> >>>> https://groups.google.com/d/msgid/sympy/CADDwiVC2Zd4tdJRoAxK7iBin9PWc%2BEihSvf-6X5DTW9m9ph7XA%40mail.gmail.com.
>>> >>>> For more options, visit https://groups.google.com/d/optout.
>>> >>>
>>> >>>
>>> >>> --
>>> >>> You received this message because you are subscribed to the Google
>>> >>> Groups
>>> >>> "sympy" group.
>>> >>> To unsubscribe from this group and stop receiving emails from it, send
>>> >>> an
>>> >>> email to [email protected].
>>> >>> To post to this group, send email to [email protected].
>>> >>> Visit this group at http://groups.google.com/group/sympy.
>>> >>> To view this discussion on the web visit
>>> >>>
>>> >>> https://groups.google.com/d/msgid/sympy/CAP7f1AiZccNH3DdtCvqi7FaeZx5ARXJMAzkPFJ2nci9ChvEXpg%40mail.gmail.com.
>>> >>>
>>> >>> For more options, visit https://groups.google.com/d/optout.
>>> >>
>>> >>
>>> >> --
>>> >> You received this message because you are subscribed to the Google
>>> >> Groups
>>> >> "sympy" group.
>>> >> To unsubscribe from this group and stop receiving emails from it, send
>>> >> an
>>> >> email to [email protected].
>>> >> To post to this group, send email to [email protected].
>>> >> Visit this group at http://groups.google.com/group/sympy.
>>> >> To view this discussion on the web visit
>>> >>
>>> >> https://groups.google.com/d/msgid/sympy/CAGZqWdr6ye5LMvVfvMjDYDterj5oVEA_qObyLQx96Wq9c8SbiQ%40mail.gmail.com.
>>> >>
>>> >> For more options, visit https://groups.google.com/d/optout.
>>> >
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> > Groups
>>> > "sympy" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> > an
>>> > email to [email protected].
>>> > To post to this group, send email to [email protected].
>>> > Visit this group at http://groups.google.com/group/sympy.
>>> > To view this discussion on the web visit
>>> >
>>> > https://groups.google.com/d/msgid/sympy/CAP7f1AhznhOHzKBNrRqxHsuKAWHfdDjF2Bwv0HJuL29HVXR1Jg%40mail.gmail.com.
>>> >
>>> > For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "sympy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/sympy.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/sympy/CADDwiVDiOmVeOCkUVwjNuwhdkSr4-7cr4eV4FRuKd5ZiAc01Bw%40mail.gmail.com.
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "sympy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/sympy.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/sympy/CAP7f1AgCiXFkf42i%2BWc%3DEYNyNkWMGCVkX0RSfJ%2Bjtfqm4Kzxhw%40mail.gmail.com.
>>
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sympy/CADDwiVDEYALyeEyLy%3DP%2BRC4HGe4TBkH1ejuMbOCXvXVm%2Be2cnw%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/CAKgW%3D6%2B9%3DPwdtaBd2djFzm5VEgAGvN7WEB9ZQvft%2BdNoh2dgaw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [sympy] Parallel evaluation of a mathematical expression

Reply via email to