Re: [sympy] Parallel evaluation of a mathematical expression

Matthew Rocklin Mon, 17 Mar 2014 09:56:34 -0700

theano.printing.pydotprint

SymPy has something similar in sympy.printing.dot



On Mon, Mar 17, 2014 at 9:52 AM, Jason Moore <[email protected]> wrote:

> How did you create the DAG visualization here:
>
> http://matthewrocklin.com/blog/work/2013/04/05/SymPy-Theano-part-3/
>
>
> Jason
> moorepants.info
> +01 530-601-9791
>
>
> On Mon, Mar 17, 2014 at 12:46 PM, Matthew Rocklin <[email protected]>wrote:
>
>> Logically we think of SymPy expressions as Trees.  When you consider
>> common sub-expressions then repeated branches of these trees get merged.
>> The result is a DAG.
>>
>>
>>  On Mon, Mar 17, 2014 at 9:42 AM, Jason Moore <[email protected]>wrote:
>>
>>>  Thanks Max. I'll look into the FLOP counting as you suggest.
>>>
>>> Matthew has created visual DAG's before, I'm just not sure what software
>>> he uses for it. He'll probably pipe in about that.
>>>
>>>
>>> Jason
>>> moorepants.info
>>> +01 530-601-9791
>>>
>>>
>>> On Mon, Mar 17, 2014 at 12:39 PM, Max Hutchinson <[email protected]>wrote:
>>>
>>>> Before trying to improve ILP, you should probably see how well the
>>>> compiler is already doing.  You can calculate a lower bound for the actual
>>>> ILP by dividing your FLOP rate (FLOPs/sec) by the core frequency
>>>> (cycles/sec) to get the number of FLOPs per cycle.  Then compare that to
>>>> the number of execution units (or whatever they're calling them now) for
>>>> your processor.
>>>>
>>>> That number may be hard to find, and it is highly idealized, so better
>>>> comparison might be your FLOP rate compared to a well known compute-bound
>>>> vectorized problem: the linpack benchmark [1].  If your single core FLOP
>>>> rate is near the linpack number, there isn't going to much room for
>>>> single-core (ILP) improvement.  Be sure to run linpack large enough to get
>>>> it compute-bound.
>>>>
>>>> What to do to actually improve ILP is probably architecture/compiler
>>>> specific and very much outside my area of expertise.  Stack Overflow or the
>>>> Computational Science SE [2] might be able to help.
>>>>
>>>> Hopefully someone else on this list can help with DAGs and SymPy.
>>>>
>>>> Max
>>>>
>>>> [1] http://www.top500.org/project/linpack/
>>>> [2] http://scicomp.stackexchange.com/
>>>>
>>>>
>>>> On Mon, Mar 17, 2014 at 11:13 AM, Jason Moore <[email protected]>wrote:
>>>>
>>>>> Thanks for that clarification on the locality. I understand that now.
>>>>>
>>>>> How do I generate DAG's? Is this something SymPy does automatically?
>>>>> Or are there other software that does it? Or is it easy to code up myself?
>>>>>
>>>>> How would I "help" the compiler with more information for better ILP?
>>>>>
>>>>>
>>>>> Jason
>>>>> moorepants.info
>>>>> +01 530-601-9791
>>>>>
>>>>>
>>>>> On Mon, Mar 17, 2014 at 11:49 AM, Max Hutchinson 
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> If you think about what the DAG would look like, your 'stacks' are
>>>>>> like horizontal layers in the graph.  The width of each layer (length of
>>>>>> each stack) gives an upper bound on the speedup, but it doesn't tell the
>>>>>> whole story: you need a way to deal with data locality.
>>>>>>
>>>>>> For example, let's look at stack #3.  You have 8 independent
>>>>>> expressions, so it would seem like you should be able to use 8 pieces of
>>>>>> computational hardware (let's call it core).  However, z_6, z_11, and 
>>>>>> z_19
>>>>>> all depend on z_5.  Therefore, either z_6, z_11, and z_19 need to be
>>>>>> computed local to z_5, or z_5 needs to be copied somewhere else.  The
>>>>>> copying is much more expensive than the computing (50-100 cycles [1]), so
>>>>>> if you only have 3 things that depend on z_5, you're going to want to 
>>>>>> just
>>>>>> compute them all on the same core as z_5.
>>>>>>
>>>>>> The complicated thing is that z_5 and z_10 both share a dependency,
>>>>>> z_4, so they should be computed locally.  Now, we have to compute
>>>>>> everything that depends on z_5 or z_10 on the same core.  If we don't 
>>>>>> break
>>>>>> locality anywhere, we won't have any available parallelism.  This is the
>>>>>> tension: copies are expensive but without them we can't expose any
>>>>>> parallelism and will be stuck with one core.  This is why we really need 
>>>>>> to
>>>>>> build a DAG, not just stacks, and then try to break it into chunks with 
>>>>>> the
>>>>>> fewest edges between them.  The number of chunks is the amount of
>>>>>> parallelism and the number of edges are the number of copies.
>>>>>>
>>>>>> Fortunately, even if the DAGs are strongly connected and you're stuck
>>>>>> with one core there is still ILP.  In a nutshell: each core can actually 
>>>>>> do
>>>>>> a couple operations at the same time.  The core uses a single cache, so 
>>>>>> the
>>>>>> data is local and doesn't require copies.  The compiler is supposed to
>>>>>> figure out ILP for you, but you might be able to help it out using all 
>>>>>> the
>>>>>> extra information sympy/theano knows about your computation.
>>>>>>
>>>>>> Max
>>>>>>
>>>>>> [1]
>>>>>> http://stackoverflow.com/questions/4087280/approximate-cost-to-access-various-caches-and-main-memory
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 17, 2014 at 10:21 AM, Jason Moore 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> I'm still digesting what Matthew and Max wrote. Lots of new words
>>>>>>> for me :) But here is a simple example taken from C code we generate 
>>>>>>> for a
>>>>>>> simple 2 link pendulum.
>>>>>>>
>>>>>>> First the C code with SymPy's CSE expressions automatically
>>>>>>> generated:
>>>>>>>
>>>>>>> #include <math.h>
>>>>>>> #include "multibody_system_c.h"
>>>>>>>
>>>>>>> void mass_forcing(double constants[6], // constants = [g, m0, l0,
>>>>>>> m1, l1, m2]
>>>>>>>                   double coordinates[3], // coordinates = [q0, q1,
>>>>>>> q2]
>>>>>>>                   double speeds[3], // speeds = [u0, u1, u2]
>>>>>>>                   double mass_matrix[36], // computed
>>>>>>>                   double forcing_vector[6]) // computed
>>>>>>> {
>>>>>>>     // common subexpressions
>>>>>>>     double z_0 = coordinates[1];
>>>>>>>     double z_1 = sin(z_0);
>>>>>>>     double z_2 = constants[2]*z_1;
>>>>>>>     double z_3 = -constants[3]*z_2 - constants[5]*z_2;
>>>>>>>     double z_4 = coordinates[2];
>>>>>>>     double z_5 = sin(z_4);
>>>>>>>     double z_6 = -constants[4]*constants[5]*z_5;
>>>>>>>     double z_7 = pow(constants[2], 2);
>>>>>>>     double z_8 = constants[2]*constants[4]*constants[5];
>>>>>>>     double z_9 = cos(z_0);
>>>>>>>     double z_10 = cos(z_4);
>>>>>>>     double z_11 = z_8*(z_1*z_5 + z_10*z_9);
>>>>>>>     double z_12 = speeds[1];
>>>>>>>     double z_13 = speeds[2];
>>>>>>>     double z_14 = pow(z_12, 2);
>>>>>>>     double z_15 = constants[2]*z_14*z_9;
>>>>>>>     double z_16 = pow(z_13, 2);
>>>>>>>     double z_17 = constants[4]*constants[5]*z_10;
>>>>>>>     double z_18 = constants[0]*constants[2]*z_9;
>>>>>>>     double z_19 = z_5*z_9;
>>>>>>>     double z_20 = z_1*z_10;
>>>>>>>
>>>>>>>     // mass matrix
>>>>>>>     mass_matrix[0] = 1;
>>>>>>>     mass_matrix[1] = 0;
>>>>>>>     mass_matrix[2] = 0;
>>>>>>>     mass_matrix[3] = 0;
>>>>>>>     mass_matrix[4] = 0;
>>>>>>>     mass_matrix[5] = 0;
>>>>>>>     mass_matrix[6] = 0;
>>>>>>>     mass_matrix[7] = 1;
>>>>>>>     mass_matrix[8] = 0;
>>>>>>>     mass_matrix[9] = 0;
>>>>>>>     mass_matrix[10] = 0;
>>>>>>>     mass_matrix[11] = 0;
>>>>>>>     mass_matrix[12] = 0;
>>>>>>>     mass_matrix[13] = 0;
>>>>>>>     mass_matrix[14] = 1;
>>>>>>>     mass_matrix[15] = 0;
>>>>>>>     mass_matrix[16] = 0;
>>>>>>>     mass_matrix[17] = 0;
>>>>>>>     mass_matrix[18] = 0;
>>>>>>>     mass_matrix[19] = 0;
>>>>>>>     mass_matrix[20] = 0;
>>>>>>>     mass_matrix[21] = constants[1] + constants[3] + constants[5];
>>>>>>>     mass_matrix[22] = z_3;
>>>>>>>     mass_matrix[23] = z_6;
>>>>>>>     mass_matrix[24] = 0;
>>>>>>>     mass_matrix[25] = 0;
>>>>>>>     mass_matrix[26] = 0;
>>>>>>>     mass_matrix[27] = z_3;
>>>>>>>     mass_matrix[28] = constants[3]*z_7 + constants[5]*z_7;
>>>>>>>     mass_matrix[29] = z_11;
>>>>>>>     mass_matrix[30] = 0;
>>>>>>>     mass_matrix[31] = 0;
>>>>>>>     mass_matrix[32] = 0;
>>>>>>>     mass_matrix[33] = z_6;
>>>>>>>     mass_matrix[34] = z_11;
>>>>>>>     mass_matrix[35] = pow(constants[4], 2)*constants[5];
>>>>>>>
>>>>>>>     // forcing vector
>>>>>>>     forcing_vector[0] = speeds[0];
>>>>>>>     forcing_vector[1] = z_12;
>>>>>>>     forcing_vector[2] = z_13;
>>>>>>>     forcing_vector[3] = constants[3]*z_15 + constants[5]*z_15 +
>>>>>>> z_16*z_17;
>>>>>>>     forcing_vector[4] = -constants[3]*z_18 - constants[5]*z_18 +
>>>>>>> z_16*z_8*(z_19 - z_20);
>>>>>>>     forcing_vector[5] = -constants[0]*z_17 + z_14*z_8*(-z_19 + z_20);
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> Now I manually group these expression evaluations into "stacks",
>>>>>>> i.e. those calls which could happen in parallel (there is of course a 
>>>>>>> bit
>>>>>>> more complicated dependency graph you can draw so that you maximize the
>>>>>>> time that your cores have a task).
>>>>>>>
>>>>>>> // These are not computations but just value assignments.
>>>>>>> z_0 = coordinates[1];
>>>>>>> z_4 = coordinates[2];
>>>>>>> z_12 = speeds[1];
>>>>>>> z_13 = speeds[2];
>>>>>>> mass_matrix[0] = 1;
>>>>>>> mass_matrix[1] = 0;
>>>>>>> mass_matrix[2] = 0;
>>>>>>> mass_matrix[3] = 0;
>>>>>>> mass_matrix[4] = 0;
>>>>>>> mass_matrix[5] = 0;
>>>>>>> mass_matrix[6] = 0;
>>>>>>> mass_matrix[7] = 1;
>>>>>>> mass_matrix[8] = 0;
>>>>>>> mass_matrix[9] = 0;
>>>>>>> mass_matrix[10] = 0;
>>>>>>> mass_matrix[11] = 0;
>>>>>>> mass_matrix[12] = 0;
>>>>>>> mass_matrix[13] = 0;
>>>>>>> mass_matrix[14] = 1;
>>>>>>> mass_matrix[15] = 0;
>>>>>>> mass_matrix[16] = 0;
>>>>>>> mass_matrix[17] = 0;
>>>>>>> mass_matrix[18] = 0;
>>>>>>> mass_matrix[19] = 0;
>>>>>>> mass_matrix[20] = 0;
>>>>>>> mass_matrix[24] = 0;
>>>>>>> mass_matrix[25] = 0;
>>>>>>> mass_matrix[26] = 0;
>>>>>>> mass_matrix[30] = 0;
>>>>>>> mass_matrix[31] = 0;
>>>>>>> mass_matrix[32] = 0;
>>>>>>> forcing_vector[0] = speeds[0];
>>>>>>> forcing_vector[1] = z_12;
>>>>>>> forcing_vector[2] = z_13;
>>>>>>>
>>>>>>> // These are computations that involve the initial values passed
>>>>>>> into the
>>>>>>> // function, i.e. stack #1.
>>>>>>> z_7 = pow(constants[2], 2);
>>>>>>> z_8 = constants[2]*constants[4]*constants[5];
>>>>>>> z_14 = pow(z_12, 2);
>>>>>>> z_16 = pow(z_13, 2);
>>>>>>> mass_matrix[21] = constants[1] + constants[3] + constants[5];
>>>>>>> mass_matrix[35] = pow(constants[4], 2)*constants[5];
>>>>>>>
>>>>>>> // Stack #2
>>>>>>> z_1 = sin(z_0);
>>>>>>> z_5 = sin(z_4);
>>>>>>> z_9 = cos(z_0);
>>>>>>> z_10 = cos(z_4);
>>>>>>> z_2 = constants[2]*z_1;
>>>>>>> mass_matrix[28] = constants[3]*z_7 + constants[5]*z_7;
>>>>>>>
>>>>>>> // Stack #3
>>>>>>> z_3 = -constants[3]*z_2 - constants[5]*z_2;
>>>>>>> z_6 = -constants[4]*constants[5]*z_5;
>>>>>>> z_11 = z_8*(z_1*z_5 + z_10*z_9);
>>>>>>> z_15 = constants[2]*z_14*z_9;
>>>>>>> z_17 = constants[4]*constants[5]*z_10;
>>>>>>> z_18 = constants[0]*constants[2]*z_9;
>>>>>>> z_19 = z_5*z_9;
>>>>>>> z_20 = z_1*z_10;
>>>>>>>
>>>>>>> // Stack #4
>>>>>>> mass_matrix[22] = z_3;
>>>>>>> mass_matrix[23] = z_6;
>>>>>>> mass_matrix[27] = z_3;
>>>>>>> mass_matrix[29] = z_11;
>>>>>>> mass_matrix[33] = z_6;
>>>>>>> mass_matrix[34] = z_11;
>>>>>>> forcing_vector[3] = constants[3]*z_15 + constants[5]*z_15 +
>>>>>>> z_16*z_17;
>>>>>>> forcing_vector[4] = -constants[3]*z_18 - constants[5]*z_18 +
>>>>>>> z_16*z_8*(z_19 - z_20);
>>>>>>> forcing_vector[5] = -constants[0]*z_17 + z_14*z_8*(-z_19 + z_20);
>>>>>>>
>>>>>>>
>>>>>>> So this simplified example of the dependencies in the CSE's shows
>>>>>>> that if I had enough cores available I could parallelize each stack,
>>>>>>> potentially increasing the execution speed. So instead of 31 
>>>>>>> evaluations,
>>>>>>> you could have 4 evaluations in parallel, ideally a 7.75x speedup. For 
>>>>>>> more
>>>>>>> complicated problems, there could be thousands and thousands of these 
>>>>>>> CSEs,
>>>>>>> but I'll need to generate their dependencies with code to see if things
>>>>>>> stack this nicely for the big problems. I suspect the dependency chain
>>>>>>> could be such that the higher number stacks could have hundreds of
>>>>>>> expressions whereas the lower stacks have fewer, or vice versa.
>>>>>>>
>>>>>>> How do I generate a DAG for long expressions in SymPy? Is this part
>>>>>>> of the internal architecture of SymPy expressions? I don't understand 
>>>>>>> how
>>>>>>> the cse() code works yet either, but it seems like this information 
>>>>>>> should
>>>>>>> be computed already. I just need to visualize the graph for some of our
>>>>>>> bigger problems.
>>>>>>>
>>>>>>> Also, the for the number of scalars and number of operations in
>>>>>>> each. Here is an bigger problem with 2000 or so CSE's:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/moorepants/dissertation/blob/master/src/extensions/arms/ArmsDynamics.c
>>>>>>>
>>>>>>> This problem has 12 scalars that have 2000+ CSE's and there are 5840
>>>>>>> additions and subtractions, 9847 multiplications and divisions, 14 
>>>>>>> cosines,
>>>>>>> and 14 sines. So roughly 1300 operations per scalar.
>>>>>>>
>>>>>>>
>>>>>>> Jason
>>>>>>> moorepants.info
>>>>>>> +01 530-601-9791
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 17, 2014 at 12:06 AM, Matthew Rocklin <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Response from Max follows (for some reason he was getting bounced
>>>>>>>> by the mailing list).
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Mar 16, 2014 at 8:55 PM, Max Hutchinson <[email protected]
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> tl;dr it depends on the DAG, but improved ILP is is likely
>>>>>>>>> possible (if difficult) and there could be room for multi-core 
>>>>>>>>> parallelism
>>>>>>>>> as well.
>>>>>>>>>
>>>>>>>>> As I understand it, we're talking about a long computation applied
>>>>>>>>> to short input vectors.  If the computation can be applied to many 
>>>>>>>>> input
>>>>>>>>> vectors at once, independent of each other, then all levels of 
>>>>>>>>> parallelism
>>>>>>>>> (multiple instructions, multiple cores, multiple sockets, multiple 
>>>>>>>>> nodes)
>>>>>>>>> can be used.  This is data-parallelism, which is great! However, it 
>>>>>>>>> doesn't
>>>>>>>>> sound like this is the case.
>>>>>>>>>
>>>>>>>>> It sounds like you're thinking of building a DAG of these CSEs and
>>>>>>>>> trying to use task-parallelism over independent parts of it 
>>>>>>>>> (automatically
>>>>>>>>> using sympy or theano or what have you).  The tension here is going 
>>>>>>>>> to be
>>>>>>>>> between locality and parallelism: how much compute hardware can you 
>>>>>>>>> spread
>>>>>>>>> your data across without losing the nice cache performance that your 
>>>>>>>>> small
>>>>>>>>> input vectors gain you.  I'd bet that going off-socket is way too 
>>>>>>>>> wide.
>>>>>>>>>  Modern multi-core architectures have core-local L2 and L1 caches, so 
>>>>>>>>> if
>>>>>>>>> your input data fits nicely into L2 and your DAG isn't really local, 
>>>>>>>>> you
>>>>>>>>> probably won't get anything out of multiple-cores.  Your last stand is
>>>>>>>>> single-core parallelism (instruction-level 
>>>>>>>>> parallelism<http://en.wikipedia.org/wiki/Instruction-level_parallelism>),
>>>>>>>>> which sympy et al may or may not be well equipped to influence.
>>>>>>>>>
>>>>>>>>> To start, I'd recommend that you take a look at your DAGs and try
>>>>>>>>> to figure out how large the independent chunks are.  Then, estimate 
>>>>>>>>> the
>>>>>>>>> amount of instruction level parallelism when you run in 'serial' 
>>>>>>>>> (which you
>>>>>>>>> can do with flop-counting).  If your demonstrated ILP is less than 
>>>>>>>>> your
>>>>>>>>> independent chunk size, then at least improved ILP should be possible.
>>>>>>>>>  Automatically splitting up these DAGs and expressing them in a 
>>>>>>>>> low-level
>>>>>>>>> enough way to affect ILP is a considerable task, though.
>>>>>>>>>
>>>>>>>>> To see if multi-core parallelism is worth it, you need to estimate
>>>>>>>>> how many extra L3 loads you'd incur by spreading your data of 
>>>>>>>>> multiple L2s.
>>>>>>>>>  I don't have great advice for that, maybe someone else here does.  
>>>>>>>>> The
>>>>>>>>> good news is that if your problem has this level of locality, then 
>>>>>>>>> you can
>>>>>>>>> probably get away with emitting C code with pthreads or even openmp.  
>>>>>>>>> Just
>>>>>>>>> bear in mind the thread creation/annihilation overhead (standing
>>>>>>>>> thread-pools are your friend) and pin them to cores.
>>>>>>>>>
>>>>>>>>> Good luck,
>>>>>>>>> Max
>>>>>>>>>
>>>>>>>>  --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "sympy" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>> Visit this group at http://groups.google.com/group/sympy.
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/sympy/CAJ8oX-Hc2y9C7FO07kkeraDAv7NNRGPkMJ2DvjgF2Oq7PzeS6g%40mail.gmail.com<https://groups.google.com/d/msgid/sympy/CAJ8oX-Hc2y9C7FO07kkeraDAv7NNRGPkMJ2DvjgF2Oq7PzeS6g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "sympy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/sympy.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/sympy/CAP7f1Ajyc6RSDmTecMk4P9GF8%2BjF6qYQ32rNj8wZPtFh-G5zfA%40mail.gmail.com<https://groups.google.com/d/msgid/sympy/CAP7f1Ajyc6RSDmTecMk4P9GF8%2BjF6qYQ32rNj8wZPtFh-G5zfA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "sympy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/sympy.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/sympy/CAJ8oX-GLamse8tVYw2WecjgQWbRxDUSCAKsRYmbC4qKBwWY%3D-w%40mail.gmail.com<https://groups.google.com/d/msgid/sympy/CAJ8oX-GLamse8tVYw2WecjgQWbRxDUSCAKsRYmbC4qKBwWY%3D-w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/CAP7f1AgODZ7c4dwL7HF6pXo%2BLpDFSFG3GhjAwmJSLjQcM-NQUg%40mail.gmail.com<https://groups.google.com/d/msgid/sympy/CAP7f1AgODZ7c4dwL7HF6pXo%2BLpDFSFG3GhjAwmJSLjQcM-NQUg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/CAJ8oX-HwT%2B2FDrXxe2DyHY6L09mCzUAYXain9unh-hfCCHAJeA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [sympy] Parallel evaluation of a mathematical expression

Reply via email to