Dear Oscar,

Thank you so much for such a detailed response! I hope it will be useful to 
others as well. (I had a hard time trying to google out the answer.) For me 
and for now the DomainMatrix “recipe” seems to be sufficient. But it's good 
to know that FLINT lets us take it to the next level of efficiency.

Thanks again and good luck with developing the excellent library even 
further!

Cheers!
Oleg

On Monday, June 13, 2022 at 3:39:38 AM UTC+3 Oscar wrote:

> On Mon, 13 Jun 2022 at 00:51, Pure Virtual <1mi...@gmail.com> wrote:
>
>> Hi there!
>
>
> Hi!
>  
>
>> I've got a really simple task of finding the rank of a matrix of 
>> *integers*. NumPy's matrix_rank() appeared to use floats internally (and 
>> gave a clearly wrong answers when the matrix contained both small and huge 
>> coefficients like 1 and 10**16), so I decided to try SymPy instead. 
>> Unfortunately, I found it pretty much impossible to find the rank of a 
>> dense matrix of size 30×30 (and above), since it took forever. I timed it 
>> and found out that runtime grows *exponentially* (which is really 
>> mysterious for me).
>>
>> Am I doing something wrong? Is there a better way?
>>
>
> You're not doing anything wrong. This is just something that can (and 
> will) be improved in SymPy.
>
> However it is important to understand that exact arbitrary precision 
> numerics do not necessarily have the same computational complexity as 
> numerics over fixed precision types. In this case you see that NumPy fails 
> because it uses fixed precision floating point and your question about rank 
> is really a question that needs exact numerics to be answered robustly.
>
> Computing the rank of a matrix is done essentially through something like 
> Gaussian elimination (GE) but there are many different types of Gaussian 
> elimination. In the case of a matrix of integers we need to decide how to 
> handle division because the basic form of GE uses exact division but with 
> integers that leads to rational numbers. There are several approaches:
>
> 1. Use rational numbers (i.e. work over a field).
> 2. Use division free GE
> 3. Use fraction free GE (still use divisions but only carefully chosen 
> exact integer divisions).
>
> Using division free GE leads to exponential costs so that the time taken 
> for an NxN matrix has worst case complexity bounds like O(2^N). This is 
> because although the *number* of arithmetic operations is bounded like 
> O(N^3)) the *cost* of those operations depends on the size of those 
> integers and they get big really fast during division free GE.
>
> Another approach is to use fraction-free methods and these can bring the 
> bound down from O(2^N) to something like O(N^5) or potentially better if 
> combined with other techniques like fast matrix multiplication. Still 
> though, O(N^5) is a lot worse than the O(N^3) that you might have expected 
> if you were thinking about how this works with fixed precision types.
>
> If you want to work with something more complicated than integers or 
> rationals then the complexity bounds can be much worse!
>
> Here's a trivial example of what I do.
>>
>> PS C:\> python -m timeit -s "import sympy" "sympy.*randMatrix(10).rank()*
>> "
>> 20 loops, best of 5: *13 msec* per loop
>> PS C:\> python -m timeit -s "import sympy" "sympy.*randMatrix(15).rank()*
>> "
>> 5 loops, best of 5: *47.7 msec* per loop
>> PS C:\> python -m timeit -s "import sympy" "sympy.*randMatrix(20).rank()*
>> "
>> 1 loop, best of 5: *2.26 sec* per loop
>>
>
> SymPy has a newer faster implementation of matrices that works for 
> matrices with simple expression elements. In this case your elements are 
> all integers so we can use it with the ZZ domain:
> https://docs.sympy.org/latest/modules/polys/domainmatrix.html
> https://docs.sympy.org/latest/modules/polys/domainsintro.html
>
> This faster implementation is already present internally in any Matrix but 
> is only used by default for certain limited operations and only when the 
> matrix has only integer or rational elements:
>
> In [*1*]: Matrix([[1, 2], [3, 4]])._rep
> Out[*1*]: DomainMatrix({0: {0: 1, 1: 2}, 1: {0: 3, 1: 4}}, (2, 2), ZZ)
>
> Note that the _rep attribute should be considered private but it gives a 
> more efficient representation for a matrix like this. You can use it to 
> compute the rank like this:
>
> In [*6*]: M = Matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>
>
> In [*7*]: M.rank()
>
> Out[*7*]: 2
>
>
> In [*8*]: M._rep
>
> Out[*8*]: DomainMatrix({0: {0: 1, 1: 2, 2: 3}, 1: {0: 4, 1: 5, 2: 6}, 2: 
> {0: 7, 1: 8, 2: 9}}, (3, 3), ZZ)
>
>
> In [*9*]: rank = len(M._rep.to_field().rref()[-1])
>
>
> In [*10*]: rank
>
> Out[*10*]: 2
>
>
> Here are some timings:
>
>
> In [*16*]: *for* N *in* [1, 2, 5, 10, 20, 50, 100]:
>
>     ...:     M = randMatrix(N)
>
>     ...:     print('N =', N)
>
>     ...:     %time len(M._rep.to_field().rref()[-1])
>
>     ...: 
>
> N = 1
>
> CPU times: user 143 µs, sys: 1e+03 ns, total: 144 µs
>
> Wall time: 151 µs
>
> N = 2
>
> CPU times: user 204 µs, sys: 39 µs, total: 243 µs
>
> Wall time: 207 µs
>
> N = 5
>
> CPU times: user 547 µs, sys: 38 µs, total: 585 µs
>
> Wall time: 618 µs
>
> N = 10
>
> CPU times: user 2.77 ms, sys: 68 µs, total: 2.83 ms
>
> Wall time: 3.02 ms
>
> N = 20
>
> CPU times: user 21.8 ms, sys: 136 µs, total: 21.9 ms
>
> Wall time: 28.7 ms
>
> N = 50
>
> CPU times: user 545 ms, sys: 2.48 ms, total: 547 ms
>
> Wall time: 550 ms
>
> N = 100
>
> CPU times: user 7.37 s, sys: 58.9 ms, total: 7.43 s
>
> Wall time: 7.51 s
>
>
> That scales a lot better than the current Matrix rank method (I gave up at 
> N=20 here):
>
>
> In [*15*]: *for* N *in* [1, 2, 5, 10, 20, 50, 100]:
>
>     ...:     M = randMatrix(N)
>
>     ...:     print('N =', N)
>
>     ...:     %time M.rank()
>
>     ...: 
>
> N = 1
>
> CPU times: user 270 µs, sys: 0 ns, total: 270 µs
>
> Wall time: 277 µs
>
> N = 2
>
> CPU times: user 1.12 ms, sys: 724 µs, total: 1.85 ms
>
> Wall time: 1.18 ms
>
> N = 5
>
> CPU times: user 8.01 ms, sys: 96 µs, total: 8.1 ms
>
> Wall time: 8.2 ms
>
> N = 10
>
> CPU times: user 23.3 ms, sys: 110 µs, total: 23.4 ms
>
> Wall time: 23.5 ms
>
> N = 20
>
> CPU times: user 5.56 s, sys: 46.5 ms, total: 5.61 s
>
> Wall time: 5.66 s
>
> N = 50
>
> ^C
> ---------------------------------------------------------------------------
>
> KeyboardInterrupt
>
>
>
> This can all be made more efficient and the plan is to do so in future 
> SymPy versions. For now you can use DomainMatrix directly if these examples 
> are representative of what you want to do. It's also faster to use flint 
> through the python_flint library. I plan to make it possible to use flint 
> internally to speed SymPy up but that hasn't happened yet:
>
>
> In [*1*]: *import* *flint*
>
>
> In [*2*]: M = randMatrix(100)
>
>
> In [*3*]: fM = flint.fmpz_mat([[int(i) *for* i *in* row] *for* row *in* 
> M.tolist()])
>
>
> In [*4*]: %time fM.rank()
>
> CPU times: user 3.96 ms, sys: 3.48 ms, total: 7.44 ms
>
> Wall time: 18 ms
>
> Out[*4*]: 100
>
>
> In [*5*]: M = randMatrix(1000)
>
>
> In [*6*]: fM = flint.fmpz_mat([[int(i) *for* i *in* row] *for* row *in* 
> M.tolist()])
>
>
> In [*7*]: %time fM.rank()
>
> CPU times: user 632 ms, sys: 19.9 ms, total: 652 ms
>
> Wall time: 655 ms
>
> Out[*7*]: 1000
>
>
> --
>
> Oscar
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sympy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/b4ca3007-1667-4a8d-b2eb-80d30057ae3dn%40googlegroups.com.

Reply via email to