Ok, if we had some concrete examples of that non-cancellation then we could do something about it. This is goes for everything that we've been implementing, if we don't have a good benchmark problem then it is hard to design something that solve it. We can look into this more when it becomes and issue, I guess.
Yes, we should have an option in the code generation to generate a function that can evaluate the Jacobian of the RHS. This can be used directly for implicit ODE integration routines that are best suited for for stiff problems. Jason moorepants.info +01 530-601-9791 On Mon, Mar 17, 2014 at 5:26 PM, Gilbert Gede <[email protected]> wrote: > Jason, I've definitely seen examples where small errors in subtraction can > lead to larger errors later on - especially if things are supposed to > cancel out. In our domain, I think it pops up in mass/inertia division > operations. I'm not sure how often it happens with real-world examples vs. > academic/idealized examples though. > > I would definitely agree that step-size (for stiff systems) is the bigger > challenge in practice - although I believe providing the Jacobian of your > RHS function can help with that. That might be something we want to > automate in the code-gen module. > > BTW, thanks for starting this discussion. Very informative. > > -Gilbert > > > On Mon, Mar 17, 2014 at 2:13 PM, Jason Moore <[email protected]> wrote: > >> For numerical ODE integration, I don't think double precision presents >> near as much an issue in accuracy as picking the correct step size. We >> usually fight the step size issue and once that is good the accuracy is >> adequate for the solutions we need. I've never run into an issue with >> double precision being limiting. So I don't really know. I think we can >> often get away with much lower accuracy than double precision for realistic >> simulations of typically multibody systems. >> >> Wow, so a minus sign can cause errors so big that you only get 1 or 2 >> digits of accuracy. That's new to me. I'm not sure what minus signs you are >> talking about. What would you do instead of minus signs? >> >> But like I said, numerical accuracy is generally not an issue for our >> systems, unless the ODE integration routine is bad. >> >> >> Jason >> moorepants.info >> +01 530-601-9791 >> >> >> On Mon, Mar 17, 2014 at 1:00 PM, Ondřej Čertík >> <[email protected]>wrote: >> >>> On Mon, Mar 17, 2014 at 9:21 AM, Jason Moore <[email protected]> >>> wrote: >>> > I'm still digesting what Matthew and Max wrote. Lots of new words for >>> me :) >>> > But here is a simple example taken from C code we generate for a >>> simple 2 >>> > link pendulum. >>> > >>> > First the C code with SymPy's CSE expressions automatically generated: >>> > >>> > #include <math.h> >>> > #include "multibody_system_c.h" >>> > >>> > void mass_forcing(double constants[6], // constants = [g, m0, l0, m1, >>> l1, >>> > m2] >>> > double coordinates[3], // coordinates = [q0, q1, q2] >>> > double speeds[3], // speeds = [u0, u1, u2] >>> > double mass_matrix[36], // computed >>> > double forcing_vector[6]) // computed >>> > { >>> > // common subexpressions >>> > double z_0 = coordinates[1]; >>> > double z_1 = sin(z_0); >>> > double z_2 = constants[2]*z_1; >>> > double z_3 = -constants[3]*z_2 - constants[5]*z_2; >>> > double z_4 = coordinates[2]; >>> > double z_5 = sin(z_4); >>> > double z_6 = -constants[4]*constants[5]*z_5; >>> > double z_7 = pow(constants[2], 2); >>> > double z_8 = constants[2]*constants[4]*constants[5]; >>> > double z_9 = cos(z_0); >>> > double z_10 = cos(z_4); >>> > double z_11 = z_8*(z_1*z_5 + z_10*z_9); >>> > double z_12 = speeds[1]; >>> > double z_13 = speeds[2]; >>> > double z_14 = pow(z_12, 2); >>> > double z_15 = constants[2]*z_14*z_9; >>> > double z_16 = pow(z_13, 2); >>> > double z_17 = constants[4]*constants[5]*z_10; >>> > double z_18 = constants[0]*constants[2]*z_9; >>> > double z_19 = z_5*z_9; >>> > double z_20 = z_1*z_10; >>> > >>> > // mass matrix >>> > mass_matrix[0] = 1; >>> > mass_matrix[1] = 0; >>> > mass_matrix[2] = 0; >>> > mass_matrix[3] = 0; >>> > mass_matrix[4] = 0; >>> > mass_matrix[5] = 0; >>> > mass_matrix[6] = 0; >>> > mass_matrix[7] = 1; >>> > mass_matrix[8] = 0; >>> > mass_matrix[9] = 0; >>> > mass_matrix[10] = 0; >>> > mass_matrix[11] = 0; >>> > mass_matrix[12] = 0; >>> > mass_matrix[13] = 0; >>> > mass_matrix[14] = 1; >>> > mass_matrix[15] = 0; >>> > mass_matrix[16] = 0; >>> > mass_matrix[17] = 0; >>> > mass_matrix[18] = 0; >>> > mass_matrix[19] = 0; >>> > mass_matrix[20] = 0; >>> > mass_matrix[21] = constants[1] + constants[3] + constants[5]; >>> > mass_matrix[22] = z_3; >>> > mass_matrix[23] = z_6; >>> > mass_matrix[24] = 0; >>> > mass_matrix[25] = 0; >>> > mass_matrix[26] = 0; >>> > mass_matrix[27] = z_3; >>> > mass_matrix[28] = constants[3]*z_7 + constants[5]*z_7; >>> > mass_matrix[29] = z_11; >>> > mass_matrix[30] = 0; >>> > mass_matrix[31] = 0; >>> > mass_matrix[32] = 0; >>> > mass_matrix[33] = z_6; >>> > mass_matrix[34] = z_11; >>> > mass_matrix[35] = pow(constants[4], 2)*constants[5]; >>> > >>> > // forcing vector >>> > forcing_vector[0] = speeds[0]; >>> > forcing_vector[1] = z_12; >>> > forcing_vector[2] = z_13; >>> > forcing_vector[3] = constants[3]*z_15 + constants[5]*z_15 + >>> z_16*z_17; >>> > forcing_vector[4] = -constants[3]*z_18 - constants[5]*z_18 + >>> > z_16*z_8*(z_19 - z_20); >>> > forcing_vector[5] = -constants[0]*z_17 + z_14*z_8*(-z_19 + z_20); >>> > } >>> > >>> > >>> > Now I manually group these expression evaluations into "stacks", i.e. >>> those >>> > calls which could happen in parallel (there is of course a bit more >>> > complicated dependency graph you can draw so that you maximize the >>> time that >>> > your cores have a task). >>> > >>> > // These are not computations but just value assignments. >>> > z_0 = coordinates[1]; >>> > z_4 = coordinates[2]; >>> > z_12 = speeds[1]; >>> > z_13 = speeds[2]; >>> > mass_matrix[0] = 1; >>> > mass_matrix[1] = 0; >>> > mass_matrix[2] = 0; >>> > mass_matrix[3] = 0; >>> > mass_matrix[4] = 0; >>> > mass_matrix[5] = 0; >>> > mass_matrix[6] = 0; >>> > mass_matrix[7] = 1; >>> > mass_matrix[8] = 0; >>> > mass_matrix[9] = 0; >>> > mass_matrix[10] = 0; >>> > mass_matrix[11] = 0; >>> > mass_matrix[12] = 0; >>> > mass_matrix[13] = 0; >>> > mass_matrix[14] = 1; >>> > mass_matrix[15] = 0; >>> > mass_matrix[16] = 0; >>> > mass_matrix[17] = 0; >>> > mass_matrix[18] = 0; >>> > mass_matrix[19] = 0; >>> > mass_matrix[20] = 0; >>> > mass_matrix[24] = 0; >>> > mass_matrix[25] = 0; >>> > mass_matrix[26] = 0; >>> > mass_matrix[30] = 0; >>> > mass_matrix[31] = 0; >>> > mass_matrix[32] = 0; >>> > forcing_vector[0] = speeds[0]; >>> > forcing_vector[1] = z_12; >>> > forcing_vector[2] = z_13; >>> > >>> > // These are computations that involve the initial values passed into >>> the >>> > // function, i.e. stack #1. >>> > z_7 = pow(constants[2], 2); >>> > z_8 = constants[2]*constants[4]*constants[5]; >>> > z_14 = pow(z_12, 2); >>> > z_16 = pow(z_13, 2); >>> > mass_matrix[21] = constants[1] + constants[3] + constants[5]; >>> > mass_matrix[35] = pow(constants[4], 2)*constants[5]; >>> > >>> > // Stack #2 >>> > z_1 = sin(z_0); >>> > z_5 = sin(z_4); >>> > z_9 = cos(z_0); >>> > z_10 = cos(z_4); >>> > z_2 = constants[2]*z_1; >>> > mass_matrix[28] = constants[3]*z_7 + constants[5]*z_7; >>> > >>> > // Stack #3 >>> > z_3 = -constants[3]*z_2 - constants[5]*z_2; >>> > z_6 = -constants[4]*constants[5]*z_5; >>> > z_11 = z_8*(z_1*z_5 + z_10*z_9); >>> > z_15 = constants[2]*z_14*z_9; >>> > z_17 = constants[4]*constants[5]*z_10; >>> > z_18 = constants[0]*constants[2]*z_9; >>> > z_19 = z_5*z_9; >>> > z_20 = z_1*z_10; >>> > >>> > // Stack #4 >>> > mass_matrix[22] = z_3; >>> > mass_matrix[23] = z_6; >>> > mass_matrix[27] = z_3; >>> > mass_matrix[29] = z_11; >>> > mass_matrix[33] = z_6; >>> > mass_matrix[34] = z_11; >>> > forcing_vector[3] = constants[3]*z_15 + constants[5]*z_15 + z_16*z_17; >>> > forcing_vector[4] = -constants[3]*z_18 - constants[5]*z_18 + >>> z_16*z_8*(z_19 >>> > - z_20); >>> > forcing_vector[5] = -constants[0]*z_17 + z_14*z_8*(-z_19 + z_20); >>> > >>> > >>> > So this simplified example of the dependencies in the CSE's shows that >>> if I >>> > had enough cores available I could parallelize each stack, potentially >>> > increasing the execution speed. So instead of 31 evaluations, you >>> could have >>> > 4 evaluations in parallel, ideally a 7.75x speedup. For more >>> complicated >>> > problems, there could be thousands and thousands of these CSEs, but >>> I'll >>> > need to generate their dependencies with code to see if things stack >>> this >>> > nicely for the big problems. I suspect the dependency chain could be >>> such >>> > that the higher number stacks could have hundreds of expressions >>> whereas the >>> > lower stacks have fewer, or vice versa. >>> > >>> > How do I generate a DAG for long expressions in SymPy? Is this part of >>> the >>> > internal architecture of SymPy expressions? I don't understand how the >>> cse() >>> > code works yet either, but it seems like this information should be >>> computed >>> > already. I just need to visualize the graph for some of our bigger >>> problems. >>> > >>> > Also, the for the number of scalars and number of operations in each. >>> Here >>> > is an bigger problem with 2000 or so CSE's: >>> > >>> > >>> https://github.com/moorepants/dissertation/blob/master/src/extensions/arms/ArmsDynamics.c >>> >>> One thing I wonder is ---- how accurate are your double precision >>> results from the C code? >>> Did you try to compare it with high accuracy (e.g. 100 digits) in >>> SymPy using Floats? >>> >>> The "minus" sign always causes some numerical cancellation. >>> >>> In my experience, if the symbolic expressions grow into hundrends of >>> terms and if you use few >>> "minus" signs in there, you always get numerical cancellations that >>> sometimes are just too big >>> for double precision to handle and you get bogus numbers, or only 1 or >>> 2 digits accuracy. >>> >>> One has to be very careful about this. >>> >>> >>> Otherwise I think it's a great idea to parallelize the evaluation. >>> Btw, with CSymPy I am also >>> interested in parallelizing the symbolic manipulation, including also >>> the numerical evaluation >>> (both double precision and higher accuracy). One can do it on the >>> "library" level as well >>> as on the code generation level, which is the problem you are tackling >>> now. >>> >>> Ondrej >>> >>> > >>> > This problem has 12 scalars that have 2000+ CSE's and there are 5840 >>> > additions and subtractions, 9847 multiplications and divisions, 14 >>> cosines, >>> > and 14 sines. So roughly 1300 operations per scalar. >>> > >>> > >>> > Jason >>> > moorepants.info >>> > +01 530-601-9791 >>> > >>> > >>> > On Mon, Mar 17, 2014 at 12:06 AM, Matthew Rocklin <[email protected]> >>> > wrote: >>> >> >>> >> Response from Max follows (for some reason he was getting bounced by >>> the >>> >> mailing list). >>> >> >>> >> >>> >> On Sun, Mar 16, 2014 at 8:55 PM, Max Hutchinson <[email protected]> >>> >> wrote: >>> >>> >>> >>> tl;dr it depends on the DAG, but improved ILP is is likely possible >>> (if >>> >>> difficult) and there could be room for multi-core parallelism as >>> well. >>> >>> >>> >>> As I understand it, we're talking about a long computation applied to >>> >>> short input vectors. If the computation can be applied to many input >>> >>> vectors at once, independent of each other, then all levels of >>> parallelism >>> >>> (multiple instructions, multiple cores, multiple sockets, multiple >>> nodes) >>> >>> can be used. This is data-parallelism, which is great! However, it >>> doesn't >>> >>> sound like this is the case. >>> >>> >>> >>> It sounds like you're thinking of building a DAG of these CSEs and >>> trying >>> >>> to use task-parallelism over independent parts of it (automatically >>> using >>> >>> sympy or theano or what have you). The tension here is going to be >>> between >>> >>> locality and parallelism: how much compute hardware can you spread >>> your data >>> >>> across without losing the nice cache performance that your small >>> input >>> >>> vectors gain you. I'd bet that going off-socket is way too wide. >>> Modern >>> >>> multi-core architectures have core-local L2 and L1 caches, so if >>> your input >>> >>> data fits nicely into L2 and your DAG isn't really local, you >>> probably won't >>> >>> get anything out of multiple-cores. Your last stand is single-core >>> >>> parallelism (instruction-level parallelism), which sympy et al may >>> or may >>> >>> not be well equipped to influence. >>> >>> >>> >>> To start, I'd recommend that you take a look at your DAGs and try to >>> >>> figure out how large the independent chunks are. Then, estimate the >>> amount >>> >>> of instruction level parallelism when you run in 'serial' (which you >>> can do >>> >>> with flop-counting). If your demonstrated ILP is less than your >>> independent >>> >>> chunk size, then at least improved ILP should be possible. >>> Automatically >>> >>> splitting up these DAGs and expressing them in a low-level enough >>> way to >>> >>> affect ILP is a considerable task, though. >>> >>> >>> >>> To see if multi-core parallelism is worth it, you need to estimate >>> how >>> >>> many extra L3 loads you'd incur by spreading your data of multiple >>> L2s. I >>> >>> don't have great advice for that, maybe someone else here does. The >>> good >>> >>> news is that if your problem has this level of locality, then you can >>> >>> probably get away with emitting C code with pthreads or even openmp. >>> Just >>> >>> bear in mind the thread creation/annihilation overhead (standing >>> >>> thread-pools are your friend) and pin them to cores. >>> >>> >>> >>> Good luck, >>> >>> Max >>> >> >>> >> -- >>> >> You received this message because you are subscribed to the Google >>> Groups >>> >> "sympy" group. >>> >> To unsubscribe from this group and stop receiving emails from it, >>> send an >>> >> email to [email protected]. >>> >> To post to this group, send email to [email protected]. >>> >> Visit this group at http://groups.google.com/group/sympy. >>> >> To view this discussion on the web visit >>> >> >>> https://groups.google.com/d/msgid/sympy/CAJ8oX-Hc2y9C7FO07kkeraDAv7NNRGPkMJ2DvjgF2Oq7PzeS6g%40mail.gmail.com >>> . >>> >> >>> >> For more options, visit https://groups.google.com/d/optout. >>> > >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups >>> > "sympy" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> an >>> > email to [email protected]. >>> > To post to this group, send email to [email protected]. >>> > Visit this group at http://groups.google.com/group/sympy. >>> > To view this discussion on the web visit >>> > >>> https://groups.google.com/d/msgid/sympy/CAP7f1AggyuHprs7H%2B_PSVP37kG%2BWQv%3DuzSsBpiExh9U4HQe0dA%40mail.gmail.com >>> . >>> > >>> > For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "sympy" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/sympy. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/sympy/CADDwiVC2Zd4tdJRoAxK7iBin9PWc%2BEihSvf-6X5DTW9m9ph7XA%40mail.gmail.com >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "sympy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/sympy. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/sympy/CAP7f1AiZccNH3DdtCvqi7FaeZx5ARXJMAzkPFJ2nci9ChvEXpg%40mail.gmail.com<https://groups.google.com/d/msgid/sympy/CAP7f1AiZccNH3DdtCvqi7FaeZx5ARXJMAzkPFJ2nci9ChvEXpg%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "sympy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/sympy. > To view this discussion on the web visit > https://groups.google.com/d/msgid/sympy/CAGZqWdr6ye5LMvVfvMjDYDterj5oVEA_qObyLQx96Wq9c8SbiQ%40mail.gmail.com<https://groups.google.com/d/msgid/sympy/CAGZqWdr6ye5LMvVfvMjDYDterj5oVEA_qObyLQx96Wq9c8SbiQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sympy. To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAP7f1AhznhOHzKBNrRqxHsuKAWHfdDjF2Bwv0HJuL29HVXR1Jg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
