Roman Pearce said he would put up some code to explain this. (The 
conditions Daniel uses are quite complex in comparison.) I therefore think 
that it's ok to quote the following from correspondence with Roman (since I 
put up my blog):

[The idea is based on ] "Ellis Horowitz, A Sorting Algorithm for Polynomial 
Multiplication
Journal of the ACM, Volume 22 Issue 4, Oct. 1975, Pages 450-462
http://dl.acm.org/citation.cfm?doid=321906.321908

Before inserting f_i * g_j into the heap, check if there exists
f_k * g_j in the heap where k < i, and if so, do not insert the
product.  We simply check if f_{i-1} * g_j is the next unmerged
product for f_{i-1} whether it is in the heap or not.  Then you
have to put the dropped products back in the heap later.  After
extracting f_i * g_j from the heap, check if f_{i+1} has a term
in the heap, and if not, insert f_{i+1} * g_j."

This is what I had remembered from what Roman explained at ISSAC. But 
Daniel wasn't able to figure out something along those lines that improved 
timings. Clearly it does though, since Roman uses it to great effect. And 
thanks very much to Roman for suggesting this very important improvement!

I haven't yet looked in detail at Daniel's conditions. It will be 
interesting to compare. You can see then in fmpz_mpoly/mul_heap_threaded.c 
in the Flint repository [1]. To my eye, they don't look that complicated. I 
think it is the lines 187 and following. Perhaps they will end up being 
equivalent.

[1] https://github.com/wbhart/flint2/blob/trunk/fmpz_mpoly/mul_heap_threaded.c
 
On Tuesday, 26 September 2017 07:52:27 UTC+2, parisse wrote:
>
>
>
> Le lundi 25 septembre 2017 18:56:26 UTC+2, Bill Hart a écrit :
>>
>> Do you use anything special for memory management? Mickael Gastineau 
>> recommended jemalloc, which we haven't tried yet. I assume you expected to 
>> see better times for the threaded benchmarks with giac? 
>>
>
> I'm not using anything specific for memory management. Thread-efficient 
> memory management would certainly help improve performances, because more 
> than half of the time is done in conversions inside giac, and conversions 
> to mpz types are not parallelized because it would be slower to wait for 
> locks. I was asking about the compiler because I looked at my code and saw 
> that some optimizations are disabled for clang. 
> Can you give some precisions about what you call delayed insertion in the 
> heap? I can delay insertion of the initial product monomials f[i]*g[0] but 
> that does not improve much the timings, at least with 1 thread. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Reply via email to