Re: [sympy] Contribution to SymPy Projects + Polynomials representation ideas

Sangyub Lee Thu, 18 Jan 2024 05:42:38 -0800

I have some questions following the results because the benchmarks of AVL 
implementation seems like having high variance,
and also figure 3,4 looks like there AVL implementation is slower, which 
doesn’t seem like having explanation
On Thursday, January 18, 2024 at 4:25:38 PM UTC+3 Sangyub Lee wrote:


> I’d like to see experiments done in Python implementation. Similarly as 
> noted above, Python objects can’t take advantage of data structures very 
> well, because Python objects have relatively high overhead, because they 
> use __dict__ implementation under the hood.
> I have seen many issues that beating Python builtin data structure is very 
> hard problem, and people doing it should better spending time with 
> working with C/C++ with Python wrapper.
>
> I think that other factor that can affect the success is that SymPy 
> polynomial library is very mature, and it’s hard to do foundational changes 
> to it because of it.
>
> Because you have to re implement add and multiply from scratch, I’m 
> concerned about the coverage and amount of work to follow. 
>
> I suggest that it would be much more easier for continuing the direction 
> from incomplete PR of gcd first.
>
> On Thursday, January 18, 2024 at 3:38:48 PM UTC+3 Spiros Maggioros wrote:
>
>> Yes, here is the paper. Note that i wrote the paper in the last semester 
>> of my uni and i was also writing another paper for a very serious 
>> conference so i have not yet submitted the paper we are discussing right 
>> now, but i'm planning to do in the near future.
>> I would love to listen to your feedback
>>
>> Spiros.
>>
>> Στις Πέμπτη 18 Ιανουαρίου 2024 στις 1:47:20 μ.μ. UTC+2, ο χρήστης 
>> syle...@gmail.com έγραψε:
>>
>>> Do you have any link to publication or draft about your paper?
>>>
>>> On Wednesday, January 17, 2024 at 10:45:35 PM UTC+3 Spiros Maggioros 
>>> wrote:
>>>
>>>> I understand, hash-table(unordered_map in c++) is the only data 
>>>> structures that beats the tree representation in c++, there's drawbacks 
>>>> though, as you mentioned, and one more drawback is that you can't really 
>>>> sort the polynomial using this data structure, cause it's a "1-1" 
>>>> function, 
>>>> the only way to do that is by sorting the pairs from the beginning, which 
>>>> require O(nlogn) computational time.Note that in the tree representation i 
>>>> can traverse the tree in inorder fashion and return it sorted so there's 
>>>> no 
>>>> need for further functions.
>>>>
>>>> Yes, i compared it to other data structures like arrays(list in python) 
>>>> and linked lists which is the most common way to represent polynomials.
>>>> If the user gives the coefficients and exponents ordered, let's say [ 
>>>> {1,3}, {3,2}, {1,1}  ] which is P(x) = X^3 + 3X^2 + X, then the list wins 
>>>> as we just have to push_back(O(1) time complexity) the pairs.But, if i 
>>>> want 
>>>> to add more and more pairs as i continue, lets say i want to add {2, 5} 
>>>> which is 2X^5 to my polynomial, then i can't just push back the pair, 
>>>> cause 
>>>> i will lose the order. So in this case, the AVL tree wins.In order to have 
>>>> a sorted polynomial with a linked list i must have O(n) insertion time 
>>>> complexity.
>>>>
>>>> Now if you use lists in python, let's say i want to represent a 
>>>> polynomial P(x) = X^1000 + X, then i'll need max(exponent(P)) slots in my 
>>>> list.But with an AVL tree i'll just need 2 nodes.
>>>>
>>>> I understand what is happening in python, that's why intense testing is 
>>>> needed.Because something in theory seems faster does not mean that's 
>>>> always 
>>>> the case.
>>>>
>>>> Spiros.
>>>>
>>>> Στις Τετάρτη 17 Ιανουαρίου 2024 στις 9:29:47 μ.μ. UTC+2, ο χρήστης 
>>>> Oscar έγραψε:
>>>>
>>>>> On Wed, 17 Jan 2024 at 15:54, Spiros Maggioros <maggior...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> So we showed that, using AVL trees instead of arrays is much 
>>>>>> better(note that even linked lists is slower cause the insertion time 
>>>>>> complexity is O(n)).
>>>>>>
>>>>>
>>>>> Interesting. Did you compare the AVL tree with other sparse data 
>>>>> structures?
>>>>>  
>>>>>
>>>>>> I have not seen the data structure that is used in SymPy, but i'm 
>>>>>> planning to check what i need to see cause right now i'm in exam period 
>>>>>> and 
>>>>>> i have no time at all.
>>>>>>
>>>>>
>>>>> No problem. If you want to learn more about how these things are 
>>>>> implemented in SymPy then I recommend starting by learning how to use the 
>>>>> lower-level data structures. This doc page is a little out of date since 
>>>>> (as of current master) SymPy can make use of python-flint in some places 
>>>>> but it shows how to access things:
>>>>>
>>>>> https://docs.sympy.org/latest/modules/polys/domainsintro.html
>>>>>
>>>>> The DUP representation is what you describe as an "array" (a "list" in 
>>>>> Python terminology). The DMP representation uses this recursively for 
>>>>> multivariate polynomials. Sparse polynomials are implemented using 
>>>>> hash-tables (dicts). The doc page I just linked explains how to create 
>>>>> and 
>>>>> introspect these data structures and how they are used within SymPy.
>>>>>
>>>>> The situation in Python is a bit different from C or other languages 
>>>>> with lower interpreter overhead because the downsides of using say a 
>>>>> hash-table vs an array are much lower in a relative sense. This is a 
>>>>> quick 
>>>>> and dirty measurement of the time to lookup an item in a dict vs a list 
>>>>> using ipython:
>>>>>
>>>>> In [28]: hashtable = dict(zip(range(100000), range(1, 100000+1)))
>>>>>
>>>>> In [29]: array = list(range(100000))
>>>>>
>>>>> In [30]: %timeit hashtable[1000]
>>>>> 56.2 ns ± 1.03 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 
>>>>> loops each)
>>>>>
>>>>> In [31]: %timeit array[1000]
>>>>> 22.2 ns ± 0.12 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 
>>>>> loops each)
>>>>>
>>>>> In C the difference in lookup time between a hash-table and an array 
>>>>> would be much more than 2.5x (an array lookup would be more like 1ns). 
>>>>> The 
>>>>> reason they are not so different in Python is because there is so much 
>>>>> interpreter overhead in both cases that the real underlying operation 
>>>>> does 
>>>>> not really take a majority of the runtime. I think that probably tends to 
>>>>> shift what data structures seem fastest in the context of SymPy when 
>>>>> compared to implementations of the same operations in other languages.
>>>>>
>>>>> --
>>>>> Oscar
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sympy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/4712416b-cb33-42ab-9af6-6db58e880c1dn%40googlegroups.com.

Re: [sympy] Contribution to SymPy Projects + Polynomials representation ideas

Reply via email to