Re: Mir vs. Numpy: Reworked!

2020-12-10 Thread Igor Shirkalin via Digitalmars-d-announce

On Thursday, 10 December 2020 at 14:49:08 UTC, jmh530 wrote:
On Thursday, 10 December 2020 at 11:07:06 UTC, Igor Shirkalin 
wrote:

On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:

"no need to calculate inverse matrix" What? Since when?


Since when highly optimized algorithms are required. This does 
not mean that you should not know the algorithms for 
calculating the inverse matrix.


I still find myself inverting large matrices from time to time. 
Maybe there are ways to reduce the number of times I do it, but 
it still needs to get done for some types of problems.


It is what I do for science purposes too, from time to time.


Re: Mir vs. Numpy: Reworked!

2020-12-10 Thread jmh530 via Digitalmars-d-announce
On Thursday, 10 December 2020 at 11:07:06 UTC, Igor Shirkalin 
wrote:

On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:

"no need to calculate inverse matrix" What? Since when?


Since when highly optimized algorithms are required. This does 
not mean that you should not know the algorithms for 
calculating the inverse matrix.


I still find myself inverting large matrices from time to time. 
Maybe there are ways to reduce the number of times I do it, but 
it still needs to get done for some types of problems.


Re: Mir vs. Numpy: Reworked!

2020-12-10 Thread Igor Shirkalin via Digitalmars-d-announce

On Monday, 7 December 2020 at 13:07:23 UTC, 9il wrote:
On Monday, 7 December 2020 at 12:28:39 UTC, data pulverizer 
wrote:

On Monday, 7 December 2020 at 02:14:41 UTC, 9il wrote:
I don't know. Tensors aren't so complex. The complex part is 
a design that allows Mir to construct and iterate various 
kinds of lazy tensors of any complexity and have quite a 
universal API, and all of these are boosted by the fact that 
the user-provided kernel(lambda) function is optimized by the 
compiler without the overhead.


I agree that a basic tensor is not hard to implement, but the 
specific design to choose is not always obvious. Your 
benchmarks shows that design choices have a large impact on 
performance, and performance is certainly a very important 
consideration in tensor design.


For example I had no idea that your ndslice variant was using 
more than one array internally to achieve its performance - it 
wasn't obvious to me.


ndslice tensor type uses exactly one iterator. However, the 
iterator is generic and lazy iterators may contain any number 
of other iterators and pointers.


How does the iterator of Mir differ from the concept of an 
iterator in D and the use of your own design of tensors and the 
actions that need to be performed on them then in terms of speed 
of execution if we know how to achive it?





Re: Mir vs. Numpy: Reworked!

2020-12-10 Thread Igor Shirkalin via Digitalmars-d-announce
On Monday, 7 December 2020 at 13:54:26 UTC, Ola Fosheim Grostad 
wrote:

On Monday, 7 December 2020 at 13:48:51 UTC, jmh530 wrote:
On Monday, 7 December 2020 at 13:41:17 UTC, Ola Fosheim 
Grostad wrote:

On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:

[snip]

"no need to calculate inverse matrix" What? Since when?


I dont know what he meant in this context, but a common 
technique in computer graphics is to build the inverse as as 
you apply computations.


Ah, well if you have a small matrix, then it's not so hard to 
calculate the inverse anyway.


It is an optimization, maybe also for accuracy, dunno.
So, instead of ending up with a transform from coordinate 
system A to B, you also get the transform from B to A for 
cheap. This may matter when the next step is to go from B to 
C... And so on...


A good example is a Simplex method for linear programming. It can 
be done such as you have to calculate inverse [m x m] matrix 
every step. Better, make a transform from one inverse matrix to 
another, that speeds up algorithm from O(3) to O(2) and even 
more. You don't even need to calculate the first inverse matrix 
if the algorithm is built in such a way that it is trivial. It is 
just one example.


Re: Mir vs. Numpy: Reworked!

2020-12-10 Thread Igor Shirkalin via Digitalmars-d-announce
On Monday, 7 December 2020 at 13:41:17 UTC, Ola Fosheim Grostad 
wrote:

On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:
On Monday, 7 December 2020 at 11:21:16 UTC, Igor Shirkalin 
wrote:

[snip]

Agreed. As a matter of fact the simplest convolutions of 
tensors are out of date. It is like there's no need to 
calculate inverse matrix. Mir is the usefull work for author, 
of course, and practically almost not used. Every one who 
needs something fast in his own tasks should make same things 
again in D.


"no need to calculate inverse matrix" What? Since when?


I dont know what he meant in this context, but a common 
technique in computer graphics is to build the inverse as as 
you apply computations.


It makes sense for simplest cases if matrices are small (2x2, 3x3 
or even 4x4) and you have to multiply them at least thousands of 
times.


Re: Mir vs. Numpy: Reworked!

2020-12-10 Thread Igor Shirkalin via Digitalmars-d-announce
On Monday, 7 December 2020 at 13:54:26 UTC, Ola Fosheim Grostad 
wrote:

On Monday, 7 December 2020 at 13:48:51 UTC, jmh530 wrote:
On Monday, 7 December 2020 at 13:41:17 UTC, Ola Fosheim 
Grostad wrote:

On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:

[snip]

"no need to calculate inverse matrix" What? Since when?


I dont know what he meant in this context, but a common 
technique in computer graphics is to build the inverse as as 
you apply computations.


Ah, well if you have a small matrix, then it's not so hard to 
calculate the inverse anyway.


It is an optimization, maybe also for accuracy, dunno.


Exactly. Optimization plus accuracy.



Re: Mir vs. Numpy: Reworked!

2020-12-10 Thread Igor Shirkalin via Digitalmars-d-announce

On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:
On Monday, 7 December 2020 at 11:21:16 UTC, Igor Shirkalin 
wrote:

[snip]

Agreed. As a matter of fact the simplest convolutions of 
tensors are out of date. It is like there's no need to 
calculate inverse matrix. Mir is the usefull work for author, 
of course, and practically almost not used. Every one who 
needs something fast in his own tasks should make same things 
again in D.


"no need to calculate inverse matrix" What? Since when?


Since when highly optimized algorithms are required. This does 
not mean that you should not know the algorithms for calculating 
the inverse matrix.


Re: Mir vs. Numpy: Reworked!

2020-12-07 Thread Ola Fosheim Grostad via Digitalmars-d-announce

On Monday, 7 December 2020 at 13:48:51 UTC, jmh530 wrote:
On Monday, 7 December 2020 at 13:41:17 UTC, Ola Fosheim Grostad 
wrote:

On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:

[snip]

"no need to calculate inverse matrix" What? Since when?


I dont know what he meant in this context, but a common 
technique in computer graphics is to build the inverse as as 
you apply computations.


Ah, well if you have a small matrix, then it's not so hard to 
calculate the inverse anyway.


It is an optimization, maybe also for accuracy, dunno.
So, instead of ending up with a transform from coordinate system 
A to B, you also get the transform from B to A for cheap. This 
may matter when the next step is to go from B to C... And so on...


Re: Mir vs. Numpy: Reworked!

2020-12-07 Thread jmh530 via Digitalmars-d-announce
On Monday, 7 December 2020 at 13:41:17 UTC, Ola Fosheim Grostad 
wrote:

On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:

[snip]

"no need to calculate inverse matrix" What? Since when?


I dont know what he meant in this context, but a common 
technique in computer graphics is to build the inverse as as 
you apply computations.


Ah, well if you have a small matrix, then it's not so hard to 
calculate the inverse anyway.




Re: Mir vs. Numpy: Reworked!

2020-12-07 Thread Ola Fosheim Grostad via Digitalmars-d-announce

On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:
On Monday, 7 December 2020 at 11:21:16 UTC, Igor Shirkalin 
wrote:

[snip]

Agreed. As a matter of fact the simplest convolutions of 
tensors are out of date. It is like there's no need to 
calculate inverse matrix. Mir is the usefull work for author, 
of course, and practically almost not used. Every one who 
needs something fast in his own tasks should make same things 
again in D.


"no need to calculate inverse matrix" What? Since when?


I dont know what he meant in this context, but a common technique 
in computer graphics is to build the inverse as as you apply 
computations.


Re: Mir vs. Numpy: Reworked!

2020-12-07 Thread jmh530 via Digitalmars-d-announce

On Monday, 7 December 2020 at 11:21:16 UTC, Igor Shirkalin wrote:

[snip]

Agreed. As a matter of fact the simplest convolutions of 
tensors are out of date. It is like there's no need to 
calculate inverse matrix. Mir is the usefull work for author, 
of course, and practically almost not used. Every one who needs 
something fast in his own tasks should make same things again 
in D.


"no need to calculate inverse matrix" What? Since when?


Re: Mir vs. Numpy: Reworked!

2020-12-07 Thread 9il via Digitalmars-d-announce

On Monday, 7 December 2020 at 12:28:39 UTC, data pulverizer wrote:

On Monday, 7 December 2020 at 02:14:41 UTC, 9il wrote:
I don't know. Tensors aren't so complex. The complex part is a 
design that allows Mir to construct and iterate various kinds 
of lazy tensors of any complexity and have quite a universal 
API, and all of these are boosted by the fact that the 
user-provided kernel(lambda) function is optimized by the 
compiler without the overhead.


I agree that a basic tensor is not hard to implement, but the 
specific design to choose is not always obvious. Your 
benchmarks shows that design choices have a large impact on 
performance, and performance is certainly a very important 
consideration in tensor design.


For example I had no idea that your ndslice variant was using 
more than one array internally to achieve its performance - it 
wasn't obvious to me.


ndslice tensor type uses exactly one iterator. However, the 
iterator is generic and lazy iterators may contain any number of 
other iterators and pointers.


Re: Mir vs. Numpy: Reworked!

2020-12-07 Thread data pulverizer via Digitalmars-d-announce

On Monday, 7 December 2020 at 02:14:41 UTC, 9il wrote:
I don't know. Tensors aren't so complex. The complex part is a 
design that allows Mir to construct and iterate various kinds 
of lazy tensors of any complexity and have quite a universal 
API, and all of these are boosted by the fact that the 
user-provided kernel(lambda) function is optimized by the 
compiler without the overhead.


I agree that a basic tensor is not hard to implement, but the 
specific design to choose is not always obvious. Your benchmarks 
shows that design choices have a large impact on performance, and 
performance is certainly a very important consideration in tensor 
design.


For example I had no idea that your ndslice variant was using 
more than one array internally to achieve its performance - it 
wasn't obvious to me.


I think literature that discuss various design choices and 
approaches would be useful and informative. There is plenty of 
literature on creating tree structures, linked lists, stacks, 
queues, hash tables and so forth, but virtually nothing on tensor 
data structures. It isn't as if implementing a linked list is any 
more complex than a tensor. I just think it's a bit strange that 
there is so little on the topic - given the widespread use of 
tensors in computational science.




Re: Mir vs. Numpy: Reworked!

2020-12-07 Thread Igor Shirkalin via Digitalmars-d-announce

On Monday, 7 December 2020 at 02:14:41 UTC, 9il wrote:
On Sunday, 6 December 2020 at 17:30:13 UTC, data pulverizer 
wrote:

On Saturday, 5 December 2020 at 07:44:33 UTC, 9il wrote:


sweep_ndslice uses (2*N - 1) arrays to index U, this allows 
LDC to unroll the loop.




I don't know. Tensors aren't so complex. The complex part is a 
design that allows Mir to construct and iterate various kinds 
of lazy tensors of any complexity and have quite a universal 
API, and all of these are boosted by the fact that the 
user-provided kernel(lambda) function is optimized by the 
compiler without the overhead.


Agreed. As a matter of fact the simplest convolutions of tensors 
are out of date. It is like there's no need to calculate inverse 
matrix. Mir is the usefull work for author, of course, and 
practically almost not used. Every one who needs something fast 
in his own tasks should make same things again in D.




Re: Mir vs. Numpy: Reworked!

2020-12-06 Thread 9il via Digitalmars-d-announce

On Sunday, 6 December 2020 at 17:30:13 UTC, data pulverizer wrote:

On Saturday, 5 December 2020 at 07:44:33 UTC, 9il wrote:


sweep_ndslice uses (2*N - 1) arrays to index U, this allows 
LDC to unroll the loop.


For example, for 2D case, withNeighboursSum [2] will store the 
pointer to the result, and the pointer at rows above and below.


matrix:
--
--a--- above iterator
--r--- the result
--b--- below iterator
--

Also, for AVX-512 targets it allows vectorizing the loop [1]. 
The benchmark has been run on the AVX2 CPU.


[1] https://github.com/typohnebild/numpy-vs-mir/issues/4
[2] 
http://mir-algorithm.libmir.org/mir_ndslice_topology.html#.withNeighboursSum


Very interesting, thank you for the explanations. Are there 
journal/book other implementation references for these 
approaches to implementing tensor-like multidimensional arrays?


I don't know. Tensors aren't so complex. The complex part is a 
design that allows Mir to construct and iterate various kinds of 
lazy tensors of any complexity and have quite a universal API, 
and all of these are boosted by the fact that the user-provided 
kernel(lambda) function is optimized by the compiler without the 
overhead.





Re: Mir vs. Numpy: Reworked!

2020-12-06 Thread data pulverizer via Digitalmars-d-announce

On Saturday, 5 December 2020 at 07:44:33 UTC, 9il wrote:


sweep_ndslice uses (2*N - 1) arrays to index U, this allows LDC 
to unroll the loop.


For example, for 2D case, withNeighboursSum [2] will store the 
pointer to the result, and the pointer at rows above and below.


matrix:
--
--a--- above iterator
--r--- the result
--b--- below iterator
--

Also, for AVX-512 targets it allows vectorizing the loop [1]. 
The benchmark has been run on the AVX2 CPU.


[1] https://github.com/typohnebild/numpy-vs-mir/issues/4
[2] 
http://mir-algorithm.libmir.org/mir_ndslice_topology.html#.withNeighboursSum


Very interesting, thank you for the explanations. Are there 
journal/book other implementation references for these approaches 
to implementing tensor-like multidimensional arrays? Tensor-like 
multidimensional arrays data structures is one of the worst 
covered in research/conventional literature compared to almost 
anything else, which can be rather frustrating.




Re: Mir vs. Numpy: Reworked!

2020-12-05 Thread Andre Pany via Digitalmars-d-announce

On Saturday, 5 December 2020 at 07:04:59 UTC, 9il wrote:

On Thursday, 3 December 2020 at 16:50:39 UTC, Andre Pany wrote:

On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:

[...]


Hi Ilya,

Thanks a lot for sharing the update. I am currently working on 
porting a python package called FMPY to D. This package makes 
usage of numpy and I hope I can use MIR here.


Probably you may want to express FMI entities as Algebraic 
types rather than classes. mir.algebraic can be really helpful 
here


http://mir-core.libmir.org/mir_algebraic.html

Somehow it is hard to get started to learn MIR. What maybe 
could help python developers is to have some articles showing 
numpy coding and side by side the equivalent MIR coding.


It is hard for me to write articles. I will try to write a 
small one this year, but it would be Mir only. Maybe this 
benchmark can be used as an example and if one wishes to write 
a side-by-side comparison with NumPy I would be happy to 
comment and explain the D implementation and what it is doing 
internally.


What I miss in MIR is a function to read and write CSV files. 
Is s.th. like numpy.genfromtxt planned?


Unlikely I would add it but can do a code review.

Currently, we can load/safe NumPy binary data with numir

https://libmir.github.io/numir/io.html

Kind regards,
Ilya


Thanks a lot for the tipps. I will work through the documentation 
of mir_algebraic. Currently I follow the strategy to have the D 
coding as similiar as possible to the python coding. Fmpy is in 
active development and I want to backport changes easily.


I totally missed the fact that MIR can load / safe numpy binary 
data. This is really great.


Kind regards
Andre


Re: Mir vs. Numpy: Reworked!

2020-12-05 Thread 9il via Digitalmars-d-announce

On Friday, 4 December 2020 at 20:26:17 UTC, data pulverizer wrote:

On Friday, 4 December 2020 at 14:48:32 UTC, jmh530 wrote:


It looks like all the `sweep_XXX` functions are only defined 
for contiguous slices, as that would be the default if define 
a Slice!(T, N).


How the functions access the data is a big difference. If you 
compare the `sweep_field` version with the `sweep_naive` 
version, the `sweep_field` function is able to access through 
one index, whereas the `sweep_naive` function has to use two 
in the 2d version and 3 in the 3d version.


Also, the main difference in the NDSlice version is that it 
uses *built-in* MIR functionality, like how `sweep_ndslice` 
uses the `each` function from MIR, whereas `sweep_field` uses 
a for loop. I think this is partially to show that the 
built-in MIR functionality is as fast as if you tried to do it 
with a for loop yourself.


I see, looking at some of the code, field case is literally 
doing the indexing calculation right there. I guess ndslice is 
doing the same thing just with "Mir magic" an in the background?


sweep_ndslice uses (2*N - 1) arrays to index U, this allows LDC 
to unroll the loop.


More details here
https://forum.dlang.org/post/qejwviqovawnuniua...@forum.dlang.org

I'm still not sure why slice is so slow. Doesn't that 
completely rely on the opSlice implementations? The choice of 
indexing method and underlying data structure?


sweep_slice is slower because it iterates data in few loops 
rather than in a single one. For small matrices this makes 
JMP/FLOP ratio higher, for large matrices that can't feet into 
the CPU cache, it is less memory efficient.





Re: Mir vs. Numpy: Reworked!

2020-12-04 Thread 9il via Digitalmars-d-announce

On Friday, 4 December 2020 at 03:48:15 UTC, Walter Bright wrote:

On 12/3/2020 8:27 AM, 9il wrote:
Since the first announcement [0] the original benchmark [1] 
has been boosted [2] with Mir-like implementations.


This is really great! Can you write an article about it? Such 
would be really helpful in letting people know about it.


Thanks! The README is really great as the benchmark description. 
I will do a small article about Mir this year.


Re: Mir vs. Numpy: Reworked!

2020-12-04 Thread 9il via Digitalmars-d-announce

On Friday, 4 December 2020 at 02:35:49 UTC, data pulverizer wrote:

On Thursday, 3 December 2020 at 21:28:04 UTC, jmh530 wrote:

Am I correct in assuming that the data in the NDSlice is also a 
single array?


sweep_ndslice uses (2*N - 1) arrays to index U, this allows LDC 
to unroll the loop.


For example, for 2D case, withNeighboursSum [2] will store the 
pointer to the result, and the pointer at rows above and below.


matrix:
--
--a--- above iterator
--r--- the result
--b--- below iterator
--

Also, for AVX-512 targets it allows vectorizing the loop [1]. The 
benchmark has been run on the AVX2 CPU.


[1] https://github.com/typohnebild/numpy-vs-mir/issues/4
[2] 
http://mir-algorithm.libmir.org/mir_ndslice_topology.html#.withNeighboursSum


Re: Mir vs. Numpy: Reworked!

2020-12-04 Thread 9il via Digitalmars-d-announce

On Thursday, 3 December 2020 at 17:08:58 UTC, jmh530 wrote:

On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:
Looks good, but a few typos:


Thanks!


Re: Mir vs. Numpy: Reworked!

2020-12-04 Thread 9il via Digitalmars-d-announce

On Thursday, 3 December 2020 at 16:50:39 UTC, Andre Pany wrote:

On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:

Hi all,

Since the first announcement [0] the original benchmark [1] 
has been boosted [2] with Mir-like implementations.


D+Mir:
 1. is more abstract than NumPy
 2. requires less code for multidimensional algorithms
 3. doesn't require indexing
 4. uses recursion across dimensions
 5. a few times faster than NumPy for non-trivial real-world 
applications.


Why Mir is faster than NumPy?

1. Mir allows the compiler to generate specialized kernels 
while NumPy constraints a user to write code that needs to 
access memory twice or more times.


Another Mir killer feature is the ability to write generalized 
N-dimensional implementations, while Numpy code needs to have 
separate implementations for 1D, 2D, and 3D cases. For 
example, the main D loop in the benchmark can compile for 4D, 
5D, and higher dimensional optimizations.


2. @nogc iteration loop. @nogc helps when you need to control 
what is going on with your memory allocations in the critical 
code part.


[0] 
https://forum.dlang.org/post/pemharpztorlqkxdo...@forum.dlang.org

[1] https://github.com/typohnebild/numpy-vs-mir
[2] https://github.com/typohnebild/numpy-vs-mir/pull/1

The benchmark [1] has been created by Christoph Alt and Tobias 
Schmidt.


Kind regards,
Ilya


Hi Ilya,

Thanks a lot for sharing the update. I am currently working on 
porting a python package called FMPY to D. This package makes 
usage of numpy and I hope I can use MIR here.


Probably you may want to express FMI entities as Algebraic types 
rather than classes. mir.algebraic can be really helpful here


http://mir-core.libmir.org/mir_algebraic.html

Somehow it is hard to get started to learn MIR. What maybe 
could help python developers is to have some articles showing 
numpy coding and side by side the equivalent MIR coding.


It is hard for me to write articles. I will try to write a small 
one this year, but it would be Mir only. Maybe this benchmark can 
be used as an example and if one wishes to write a side-by-side 
comparison with NumPy I would be happy to comment and explain the 
D implementation and what it is doing internally.


What I miss in MIR is a function to read and write CSV files. 
Is s.th. like numpy.genfromtxt planned?


Unlikely I would add it but can do a code review.

Currently, we can load/safe NumPy binary data with numir

https://libmir.github.io/numir/io.html

Kind regards,
Ilya


Re: Mir vs. Numpy: Reworked!

2020-12-04 Thread jmh530 via Digitalmars-d-announce

On Friday, 4 December 2020 at 20:26:17 UTC, data pulverizer wrote:

[snip]

I see, looking at some of the code, field case is literally 
doing the indexing calculation right there. I guess ndslice is 
doing the same thing just with "Mir magic" an in the 
background? Still, ndslice is able to get a consistent higher 
rate of flops than the field case - interesting. One thing I 
discovered about these kinds of plots is that introducing log 
scale or two particularly for timed comparisons can make the 
differences between different methods that look close clearer. 
A log plot might show some consistent difference between the 
timings of ndslice and the field case. Underneath they should 
be doing essentially the same thing so teasing out what is 
causing the difference would be interesting. Is Mir doing some 
more efficient form of the indexing calculation than naked 
field calculations?


I'm still not sure why slice is so slow. Doesn't that 
completely rely on the opSlice implementations? The choice of 
indexing method and underlying data structure? Isn't it just a 
symbolic interface that you write whatever you want?


Ilya might have a better ability to answer that than me.


Re: Mir vs. Numpy: Reworked!

2020-12-04 Thread data pulverizer via Digitalmars-d-announce

On Friday, 4 December 2020 at 14:48:32 UTC, jmh530 wrote:


It looks like all the `sweep_XXX` functions are only defined 
for contiguous slices, as that would be the default if define a 
Slice!(T, N).


How the functions access the data is a big difference. If you 
compare the `sweep_field` version with the `sweep_naive` 
version, the `sweep_field` function is able to access through 
one index, whereas the `sweep_naive` function has to use two in 
the 2d version and 3 in the 3d version.


Also, the main difference in the NDSlice version is that it 
uses *built-in* MIR functionality, like how `sweep_ndslice` 
uses the `each` function from MIR, whereas `sweep_field` uses a 
for loop. I think this is partially to show that the built-in 
MIR functionality is as fast as if you tried to do it with a 
for loop yourself.


I see, looking at some of the code, field case is literally doing 
the indexing calculation right there. I guess ndslice is doing 
the same thing just with "Mir magic" an in the background? Still, 
ndslice is able to get a consistent higher rate of flops than the 
field case - interesting. One thing I discovered about these 
kinds of plots is that introducing log scale or two particularly 
for timed comparisons can make the differences between different 
methods that look close clearer. A log plot might show some 
consistent difference between the timings of ndslice and the 
field case. Underneath they should be doing essentially the same 
thing so teasing out what is causing the difference would be 
interesting. Is Mir doing some more efficient form of the 
indexing calculation than naked field calculations?


I'm still not sure why slice is so slow. Doesn't that completely 
rely on the opSlice implementations? The choice of indexing 
method and underlying data structure? Isn't it just a symbolic 
interface that you write whatever you want?




Re: Mir vs. Numpy: Reworked!

2020-12-04 Thread jmh530 via Digitalmars-d-announce

On Friday, 4 December 2020 at 02:35:49 UTC, data pulverizer wrote:

[snip]
NDSlice is even faster for this case - cool. Am I correct in 
assuming that the data in the NDSlice is also a single array?


It looks like all the `sweep_XXX` functions are only defined for 
contiguous slices, as that would be the default if define a 
Slice!(T, N).


How the functions access the data is a big difference. If you 
compare the `sweep_field` version with the `sweep_naive` version, 
the `sweep_field` function is able to access through one index, 
whereas the `sweep_naive` function has to use two in the 2d 
version and 3 in the 3d version.


Also, the main difference in the NDSlice version is that it uses 
*built-in* MIR functionality, like how `sweep_ndslice` uses the 
`each` function from MIR, whereas `sweep_field` uses a for loop. 
I think this is partially to show that the built-in MIR 
functionality is as fast as if you tried to do it with a for loop 
yourself.


Re: Mir vs. Numpy: Reworked!

2020-12-03 Thread Walter Bright via Digitalmars-d-announce

On 12/3/2020 8:27 AM, 9il wrote:
Since the first announcement [0] the original benchmark [1] has been boosted [2] 
with Mir-like implementations.


This is really great! Can you write an article about it? Such would be really 
helpful in letting people know about it.


Re: Mir vs. Numpy: Reworked!

2020-12-03 Thread data pulverizer via Digitalmars-d-announce

On Thursday, 3 December 2020 at 21:28:04 UTC, jmh530 wrote:

The document says:
Slice: Python like. Uses D Slices and Strides for grouping 
(Red-Black).
Naive: one for-loop for each dimension. Matrix-Access via 
multi-dimensional Array.
Field: one for-loop. Matrix is flattened. Access via 
flattened index.

NdSlice: D like. Uses just MIR functionalities.


It's quite interesting because it says that it's well worth 
implementing a field index as supposed to naive access - at least 
for this algorithm. It makes sense because in the field case at 
least you know that all the data is on the same array - and 
therefore in close proximity in memory, whereas individual arrays 
in multidimensional array could be far apart in the memory. 
NDSlice is even faster for this case - cool. Am I correct in 
assuming that the data in the NDSlice is also a single array?




Re: Mir vs. Numpy: Reworked!

2020-12-03 Thread data pulverizer via Digitalmars-d-announce

On Thursday, 3 December 2020 at 21:28:04 UTC, jmh530 wrote:

The document says:
Slice: Python like. Uses D Slices and Strides for grouping 
(Red-Black).
Naive: one for-loop for each dimension. Matrix-Access via 
multi-dimensional Array.
Field: one for-loop. Matrix is flattened. Access via 
flattened index.

NdSlice: D like. Uses just MIR functionalities.


Thanks, evidently I should have been more thorough in reading the 
document. :-)


Re: Mir vs. Numpy: Reworked!

2020-12-03 Thread mw via Digitalmars-d-announce

On Thursday, 3 December 2020 at 21:28:04 UTC, jmh530 wrote:
On Thursday, 3 December 2020 at 20:25:11 UTC, data pulverizer 
wrote:

[snip]

Very interesting work. What is the difference between Mir's 
field, slice, native and ndslice? [...]


The document says:
Slice: Python like. Uses D Slices and Strides for grouping 
(Red-Black).
Naive: one for-loop for each dimension. Matrix-Access via 
multi-dimensional Array.
Field: one for-loop. Matrix is flattened. Access via 
flattened index.

NdSlice: D like. Uses just MIR functionalities.



As Andre said:

"""
What maybe could help python developers is to have some articles 
showing numpy coding and side by side the equivalent MIR coding.

"""

I think such Numpy v.s Mir side-by-side equivalent (or 
improvement) document will greatly boost the adoption of Mir.





Re: Mir vs. Numpy: Reworked!

2020-12-03 Thread jmh530 via Digitalmars-d-announce
On Thursday, 3 December 2020 at 20:25:11 UTC, data pulverizer 
wrote:

[snip]

Very interesting work. What is the difference between Mir's 
field, slice, native and ndslice? [...]


The document says:
Slice: Python like. Uses D Slices and Strides for grouping 
(Red-Black).
Naive: one for-loop for each dimension. Matrix-Access via 
multi-dimensional Array.
Field: one for-loop. Matrix is flattened. Access via 
flattened index.

NdSlice: D like. Uses just MIR functionalities.



Re: Mir vs. Numpy: Reworked!

2020-12-03 Thread data pulverizer via Digitalmars-d-announce

On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:

Hi all,

Since the first announcement [0] the original benchmark [1] has 
been boosted [2] with Mir-like implementations.


... [SNIP]

Kind regards,
Ilya


Very interesting work. What is the difference between Mir's 
field, slice, native and ndslice? Looks like your packages are 
really hotting up. I wonder how the performance compares with an 
equivalent Julia implementation.


Re: Mir vs. Numpy: Reworked!

2020-12-03 Thread jmh530 via Digitalmars-d-announce

On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:

Hi all,

Since the first announcement [0] the original benchmark [1] has 
been boosted [2] with Mir-like implementations.


D+Mir:
 1. is more abstract than NumPy
 2. requires less code for multidimensional algorithms
 3. doesn't require indexing
 4. uses recursion across dimensions
 5. a few times faster than NumPy for non-trivial real-world 
applications.


Why Mir is faster than NumPy?

1. Mir allows the compiler to generate specialized kernels 
while NumPy constraints a user to write code that needs to 
access memory twice or more times.


Another Mir killer feature is the ability to write generalized 
N-dimensional implementations, while Numpy code needs to have 
separate implementations for 1D, 2D, and 3D cases. For example, 
the main D loop in the benchmark can compile for 4D, 5D, and 
higher dimensional optimizations.


2. @nogc iteration loop. @nogc helps when you need to control 
what is going on with your memory allocations in the critical 
code part.


[0] 
https://forum.dlang.org/post/pemharpztorlqkxdo...@forum.dlang.org

[1] https://github.com/typohnebild/numpy-vs-mir
[2] https://github.com/typohnebild/numpy-vs-mir/pull/1

The benchmark [1] has been created by Christoph Alt and Tobias 
Schmidt.


Kind regards,
Ilya


Looks good, but a few typos:

"The big difference is especially visible in this figures."

"For bigger prolem sizes the FLOP/s slightly drop and finally 
level out."


"Propably this is mainly caused by the overhead of the Python 
interpreter and might be reduced by more optimization efforts."


Re: Mir vs. Numpy: Reworked!

2020-12-03 Thread Andre Pany via Digitalmars-d-announce

On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:

Hi all,

Since the first announcement [0] the original benchmark [1] has 
been boosted [2] with Mir-like implementations.


D+Mir:
 1. is more abstract than NumPy
 2. requires less code for multidimensional algorithms
 3. doesn't require indexing
 4. uses recursion across dimensions
 5. a few times faster than NumPy for non-trivial real-world 
applications.


Why Mir is faster than NumPy?

1. Mir allows the compiler to generate specialized kernels 
while NumPy constraints a user to write code that needs to 
access memory twice or more times.


Another Mir killer feature is the ability to write generalized 
N-dimensional implementations, while Numpy code needs to have 
separate implementations for 1D, 2D, and 3D cases. For example, 
the main D loop in the benchmark can compile for 4D, 5D, and 
higher dimensional optimizations.


2. @nogc iteration loop. @nogc helps when you need to control 
what is going on with your memory allocations in the critical 
code part.


[0] 
https://forum.dlang.org/post/pemharpztorlqkxdo...@forum.dlang.org

[1] https://github.com/typohnebild/numpy-vs-mir
[2] https://github.com/typohnebild/numpy-vs-mir/pull/1

The benchmark [1] has been created by Christoph Alt and Tobias 
Schmidt.


Kind regards,
Ilya


Hi Ilya,

Thanks a lot for sharing the update. I am currently working on 
porting a python package called FMPY to D. This package makes 
usage of numpy and I hope I can use MIR here.


Somehow it is hard to get started to learn MIR. What maybe could 
help python developers is to have some articles showing numpy 
coding and side by side the equivalent MIR coding.


What I miss in MIR is a function to read and write CSV files. Is 
s.th. like numpy.genfromtxt planned?


Kind regards
Andre


Mir vs. Numpy: Reworked!

2020-12-03 Thread 9il via Digitalmars-d-announce

Hi all,

Since the first announcement [0] the original benchmark [1] has 
been boosted [2] with Mir-like implementations.


D+Mir:
 1. is more abstract than NumPy
 2. requires less code for multidimensional algorithms
 3. doesn't require indexing
 4. uses recursion across dimensions
 5. a few times faster than NumPy for non-trivial real-world 
applications.


Why Mir is faster than NumPy?

1. Mir allows the compiler to generate specialized kernels while 
NumPy constraints a user to write code that needs to access 
memory twice or more times.


Another Mir killer feature is the ability to write generalized 
N-dimensional implementations, while Numpy code needs to have 
separate implementations for 1D, 2D, and 3D cases. For example, 
the main D loop in the benchmark can compile for 4D, 5D, and 
higher dimensional optimizations.


2. @nogc iteration loop. @nogc helps when you need to control 
what is going on with your memory allocations in the critical 
code part.


[0] 
https://forum.dlang.org/post/pemharpztorlqkxdo...@forum.dlang.org

[1] https://github.com/typohnebild/numpy-vs-mir
[2] https://github.com/typohnebild/numpy-vs-mir/pull/1

The benchmark [1] has been created by Christoph Alt and Tobias 
Schmidt.


Kind regards,
Ilya



Re: MIR vs. Numpy

2020-11-20 Thread Tobias Schmidt via Digitalmars-d-announce

Thanks for all of your feedback!

On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:
It's been a while since I've used numba, so I was a little 
confused on the numba 1 and numba 8 runs.


The number was meant as the number of used threads in our runs. 
The prefix 'numba' is indicating if numba was used (numba) or not 
(nonumba).

We have added a section to clarify this. Thanks for the hint.





Re: MIR vs. Numpy

2020-11-18 Thread Max Haughton via Digitalmars-d-announce

On Wednesday, 18 November 2020 at 15:20:19 UTC, 9il wrote:

On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
wrote:


It also looks like you are compiling on ldc with -mcpu=native 
--boundscheck=off. Why not -O as well?


-O is added by DUB


Just -O? LDC is quite impressive with lto and 
cross-module-inlining turned on


Re: MIR vs. Numpy

2020-11-18 Thread jmh530 via Digitalmars-d-announce

On Wednesday, 18 November 2020 at 15:20:19 UTC, 9il wrote:

[snip]

-O is added by DUB


Ah, the -release-nobounds


Re: MIR vs. Numpy

2020-11-18 Thread 9il via Digitalmars-d-announce
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
wrote:

Dear all,

to compare MIR and Numpy in the HPC context, we implemented a 
multigrid solver in Python using Numpy and in D using Mir and 
perforemd some benchmarks with them.


You can find our code and results here:
https://github.com/typohnebild/numpy-vs-mir

Feedback is very welcome. Please feel free to open issues, pull 
requests or simply post your thoughts below.


Kind regards,
Tobias


Thank you a lot! It is a huge benefit for Mir and D to have so 
quality benchmarks.


Python's sweep_3D access memory only once for one element 
computation, while old D's sweep_slice access it 7 times.


A PR [1] for new version of sweep_slice was added, I expect it 
will be at least twice faster. The new sweep_slice uses a more 
D'sh approach and single memory access to the computation element.


[1] https://github.com/typohnebild/numpy-vs-mir/pull/1

Cheers,
Ilya


Re: MIR vs. Numpy

2020-11-18 Thread 9il via Digitalmars-d-announce

On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
wrote:


It also looks like you are compiling on ldc with -mcpu=native 
--boundscheck=off. Why not -O as well?


-O is added by DUB


Re: MIR vs. Numpy

2020-11-18 Thread John Colvin via Digitalmars-d-announce
On Wednesday, 18 November 2020 at 13:01:42 UTC, Bastiaan Veelo 
wrote:
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
wrote:

Dear all,

to compare MIR and Numpy in the HPC context, we implemented a 
multigrid solver in Python using Numpy and in D using Mir and 
perforemd some benchmarks with them.


You can find our code and results here:
https://github.com/typohnebild/numpy-vs-mir


Nice numbers. I’m not a Python guy but I was under the 
impression that Numpy actually is written in C, so that when 
you benchmark Numpy you’re mostly benchmarking C, not Python. 
Therefore I had expected the Numpy performance to be much 
closer to D’s. An important factor I think, which I’m not sure 
you have discussed (didn’t look too closely), is the compiler 
backend that was used to compile D and Numpy. Then again, as a 
user one is mostly interested in the out-of-the-box 
performance, which this seems to be a good measure of.


— Bastiaan.


A lot of numpy is in C, C++, fortran, asm etc

But when you chain a bunch of things together, you are going via 
python. The language boundary (and python being slow) means that 
internal iteration in native code is a requirement for 
performance, which leads to eager allocation for composability 
via python, which then hurts performance. Numpy makes a very good 
effort, but is always constrained by this. Clever schemes with 
laziness where operations in python are actually just composing 
operations for execution later/on-demand can work as an 
alternative, but a) that's hard and b) even if you can completely 
avoid calling back in to python during iteration you would still 
need JIT to really unlock the performance.


Julia fixes this by having all/most in one language which is JIT'd

D can do the same with templates AOT, like C++/Eigen does but 
more flexible and less terrifying code. That's (one part of) what 
mir provides.


Re: MIR vs. Numpy

2020-11-18 Thread jmh530 via Digitalmars-d-announce
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
wrote:

Dear all,

to compare MIR and Numpy in the HPC context, we implemented a 
multigrid solver in Python using Numpy and in D using Mir and 
perforemd some benchmarks with them.


You can find our code and results here:
https://github.com/typohnebild/numpy-vs-mir

Feedback is very welcome. Please feel free to open issues, pull 
requests or simply post your thoughts below.


Kind regards,
Tobias


Very nice write up.

It's been a while since I've used numba, so I was a little 
confused on the numba 1 and numba 8 runs.


It also looks like you are compiling on ldc with -mcpu=native 
--boundscheck=off. Why not -O as well?


Re: MIR vs. Numpy

2020-11-18 Thread Bastiaan Veelo via Digitalmars-d-announce
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
wrote:

Dear all,

to compare MIR and Numpy in the HPC context, we implemented a 
multigrid solver in Python using Numpy and in D using Mir and 
perforemd some benchmarks with them.


You can find our code and results here:
https://github.com/typohnebild/numpy-vs-mir


Nice numbers. I’m not a Python guy but I was under the impression 
that Numpy actually is written in C, so that when you benchmark 
Numpy you’re mostly benchmarking C, not Python. Therefore I had 
expected the Numpy performance to be much closer to D’s. An 
important factor I think, which I’m not sure you have discussed 
(didn’t look too closely), is the compiler backend that was used 
to compile D and Numpy. Then again, as a user one is mostly 
interested in the out-of-the-box performance, which this seems to 
be a good measure of.


— Bastiaan.


MIR vs. Numpy

2020-11-18 Thread Tobias Schmidt via Digitalmars-d-announce

Dear all,

to compare MIR and Numpy in the HPC context, we implemented a 
multigrid solver in Python using Numpy and in D using Mir and 
perforemd some benchmarks with them.


You can find our code and results here:
https://github.com/typohnebild/numpy-vs-mir

Feedback is very welcome. Please feel free to open issues, pull 
requests or simply post your thoughts below.


Kind regards,
Tobias