Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-10-06 Thread Juha Jeronen

Hi all,

Thanks Jon and Ryan for the suggestions. Both asanyarray() or 
atleast_1d() sound good.


There's the technical detail that Cython needs to know the object type 
(e.g. in the parameters to quartic_z_array()), so I think atleast_1d() 
may be safer. I've taken this option for now.


This simplified the code somewhat. The *_scalar() functions have been 
removed, as they are no longer needed. The *_array() versions have been 
renamed, removing the _array suffix.


The return values have stayed as they were - if there is only one 
problem to solve, the singleton dimension is dropped, and otherwise a 2D 
array is returned.


(The exception is linear(), which does not need the second dimension, 
since the solution for each problem is unique. It will return a scalar 
in the case of a single problem, or a 1D array in the case of multiple 
problems.)



I've pushed the new version. It's available from the same repository:

https://yousource.it.jyu.fi/jjrandom2/miniprojects/trees/master/misc/polysolve_for_numpy


Other comments? The next step?

 -J


P.S.:

I'm not sure how exact the object type must be - i.e. whether Cython 
wants to know that the object stores its data somewhat like an ndarray, 
or that its C API exactly matches that of an ndarray.


In Cython there are some surprising details regarding this kind of 
things, such as the ctypedef'ing of scalar types. For example, see the 
comments about complex support near the beginning of polysolve2.pyx, and 
the commented-out SSE2 intrinsics experiment in 
https://yousource.it.jyu.fi/jjrandom2/miniprojects/blobs/master/misc/tworods/tworods.pyx 
.


In the latter, it was somewhat tricky to get Cython to recognize __m128d 
- turns out it's close enough for Cython to know that it behaves like a 
double, although it actually contains two doubles.


(Note that these ctypedefs never end up in the generated C code; Cython 
uses them as extra context information when mapping the Python code into C.)


(And no need to worry, I'm not planning to put any SSE into polysolve2 :) )



On 02.10.2015 17:18, Ryan May wrote:
numpy.asanyarray() would be my preferred goto, as it will leave 
subclasses of ndarray untouched; asarray() and atleast_1d() force 
ndarray. It's nice to do the whenever possible.


Ryan

On Fri, Oct 2, 2015 at 6:52 AM, Slavin, Jonathan 
<jsla...@cfa.harvard.edu <mailto:jsla...@cfa.harvard.edu>> wrote:


​ Personally I like atleast_1d, which will convert a scalar into a
1d array but will leave arrays untouched (i.e. won't change the
dimensions.  Not sure what the advantages/disadvantages are
relative to asarray.

Jon​


On Fri, Oct 2, 2015 at 7:05 AM,
<numpy-discussion-requ...@scipy.org
<mailto:numpy-discussion-requ...@scipy.org>> wrote:

    From: Juha Jeronen <juha.jero...@jyu.fi
<mailto:juha.jero...@jyu.fi>>
To: Discussion of Numerical Python <numpy-discussion@scipy.org
<mailto:numpy-discussion@scipy.org>>
Cc:
Date: Fri, 2 Oct 2015 13:31:47 +0300
Subject: Re: [Numpy-discussion] Cython-based
OpenMP-accelerated quartic polynomial solver

On 02.10.2015 13:07, Daπid wrote:


On 2 October 2015 at 11:58, Juha Jeronen <juha.jero...@jyu.fi
<mailto:juha.jero...@jyu.fi>> wrote:



First version done and uploaded:


https://yousource.it.jyu.fi/jjrandom2/miniprojects/trees/master/misc/polysolve_for_numpy


Small comment: now you are checking if the input is a scalar
or a ndarray, but it should also accept any array-like. If I
pass a list, I expect it to work, internally converting it
into an array.


Good catch.

Is there an official way to test for array-likes? Or should I
always convert with asarray()? Or something else?


 -J





-- 


Jonathan D. Slavin Harvard-Smithsonian CfA
jsla...@cfa.harvard.edu <mailto:jsla...@cfa.harvard.edu>60
Garden Street, MS 83
phone: (617) 496-7981 <tel:%28617%29%20496-7981>   Cambridge,
MA 02138-1516
cell: (781) 363-0035 <tel:%28781%29%20363-0035> USA



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
https://mail.scipy.org/mailman/listinfo/numpy-discussion




--
Ryan May



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-10-02 Thread Juha Jeronen

On 01.10.2015 03:32, Sturla Molden wrote:

On 01/10/15 02:20, Juha Jeronen wrote:


Then again, the matter is further complicated by considering codes that
run on a single machine, versus codes that run on a cluster.Threads
being local to each node in a cluster,


You can run MPI programs on a single machine and you get OpenMP 
implementations for clusters. Just pick an API and stick with it.


Mm. I've quite often run MPI locally (it's nice for multicore scientific 
computing on Python), but I had no idea that OpenMP had cluster 
implementations. Thanks for the tip.


I've got the impression that the way these APIs market themselves is 
that MPI is for processes, while OpenMP is for threads, even if this is 
not completely true across all implementations.


(If I wanted maximal control over what each process/thread is doing, I'd 
go for ZeroMQ :) )



 -J

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-10-02 Thread Juha Jeronen

On 02.10.2015 13:07, Daπid wrote:


On 2 October 2015 at 11:58, Juha Jeronen <juha.jero...@jyu.fi 
<mailto:juha.jero...@jyu.fi>> wrote:




First version done and uploaded:


https://yousource.it.jyu.fi/jjrandom2/miniprojects/trees/master/misc/polysolve_for_numpy


Small comment: now you are checking if the input is a scalar or a 
ndarray, but it should also accept any array-like. If I pass a list, I 
expect it to work, internally converting it into an array.


Good catch.

Is there an official way to test for array-likes? Or should I always 
convert with asarray()? Or something else?



 -J

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-10-02 Thread Juha Jeronen

On 01.10.2015 03:52, Sturla Molden wrote:

On 01/10/15 02:32, Juha Jeronen wrote:


Sounds good. Out of curiosity, are there any standard fork-safe
threadpools, or would this imply rolling our own?


You have to roll your own.

Basically use pthreads_atfork to register a callback that shuts down 
the threadpool before a fork and another callback that restarts it. 
Python's threading module does not expose the pthreads_atfork 
function, so you must call it from Cython.


I believe there is also a tiny atfork module in PyPI.


Ok. Thanks. This approach fixes the issue of the threads not being there 
for the child process.


I think it still leaves open the issue of creating the correct number of 
threads in the pools for each of the processes when the pool is 
restarted (so that in total there will be as many threads as cores 
(physical or virtual, whichever the user desires)). But this is again 
something that requires context...




So maybe it would be better, at least at first, to make a pure-Cython
version with no attempt at multithreading?


I would start by making a pure Cython version that works correctly. 
The next step would be to ensure that it releases the GIL. After that 
you can worry about parallel processing, or just tell the user to use 
threads or joblib.


First version done and uploaded:

https://yousource.it.jyu.fi/jjrandom2/miniprojects/trees/master/misc/polysolve_for_numpy

OpenMP support removed; this version uses only Cython.

The example program has been renamed to main.py, and setup.py has been 
cleaned, removing the irrelevant module.


This folder contains only the files for the polynomial solver.


As I suspected, removing OpenMP support only required changing a few 
lines, and dropping the import for Cython.parallel. The "prange"s have 
been replaced with "with nogil" and "range".


Note that both the original version and this version release the GIL 
when running the processing loops.



It may be better to leave this single-threaded for now. Using Python 
threads isn't that difficult and joblib sounds nice, too.


What's the next step?

 -J

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

Hi,

On 30.09.2015 03:48, Chris Barker - NOAA Federal wrote:

This sounds pretty cool -- and I've had a use case. So it would be
nice to get into Numpy.

But: I doubt we'd want OpenMP dependence in Numpy itself.


It is indeed a good point not to add new dependencies for a small 
feature such as this. From a software engineering point of view, I fully 
agree.


Note however that with the current version of the code, not having 
OpenMP will severely limit the performance, especially on quad-core 
machines, since an important factor in the speed comes from the parallel 
processing of the independent problem instances.


But I suppose there may be another technical option to support multiple 
cores (doesn't NumPy already do this in some of its routines?). OpenMP 
was just the most convenient solution from the implementation viewpoint.




But maybe a pure Cython non-MP version?


That is of course possible. IIRC, changing the processing to occur on a 
single core should only require replacing Cython.parallel.prange() with 
range() in the array processing loops.


(Note range(), not xrange(), even for Python 2.x - Cython compiles a 
loop using range() into an efficient C loop if some simple compile-time 
conditions are fulfilled.)




Are we putting Cuthon in Numpy now? I lost track.


I thought so? But then again, I'm just a user :)


 -J

-
Juha Jeronen, Ph.D.
juha.jero...@jyu.fi
University of Jyväskylä
Department of Mathematical Information Technology
-

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

Hi,

On 30.09.2015 04:37, Charles R Harris wrote:



On Tue, Sep 29, 2015 at 6:48 PM, Chris Barker - NOAA Federal 
<chris.bar...@noaa.gov <mailto:chris.bar...@noaa.gov>> wrote:


This sounds pretty cool -- and I've had a use case. So it would be
nice to get into Numpy.

But: I doubt we'd want OpenMP dependence in Numpy itself.

But maybe a pure Cython non-MP version?

Are we putting Cuthon in Numpy now? I lost track.


Yes, but we need to be careful of Numeric Recipes.


True.

As I said, only the algorithms come from NR, and the code was written 
from scratch. I haven't even looked at any of their code, precisely due 
to the restrictive license.



In this particular case, the referred pages (pp. 227-229, 3rd ed.) 
contain only a description of the solution process in English, with 
equations - there is no example code printed into the book in section 
5.6, Quadratic and Cubic Equations, which in its entirety is contained 
on pp. 227-229.


Furthermore, on p. 228, equation (5.6.12), which contains the solutions 
of the cubic equation, is attributed to Francois Viete. The reference 
given is the treatise "De emendatione", published in 1615.


The same solution, also with attribution to Francois Viete, is given in 
Wikipedia, but without especially defining temporary variables to 
eliminate some common subexpressions:


https://en.wikipedia.org/wiki/Cubic_function#Trigonometric_.28and_hyperbolic.29_method


The solution to the quadratic equation is the standard one, but with 
special care taken to avoid catastrophic cancellation in computing the 
other root.


Wikipedia mentions that in the standard formula, this issue exists:

https://en.wikipedia.org/wiki/Quadratic_equation#Avoiding_loss_of_significance

but does not give an explicit remedy. References given are

Kahan, Willian (November 20, 2004), On the Cost of Floating-Point 
Computation Without Extra-Precise Arithmetic. ( 
http://www.cs.berkeley.edu/~wkahan/Qdrtcs.pdf , see esp. p. 8, which 
provides the same remedy as NR )


Higham, Nicholas (2002), Accuracy and Stability of Numerical Algorithms 
(2nd ed.), SIAM, p. 10, ISBN 978-0-89871-521-7.



(Browsing through Kahan's text, I see now that I could improve the 
discriminant computation to handle marginal cases better. Might do this 
in a future version.)



For both of these cases, I've used NR as a reference, because the 
authors have taken special care to choose only numerically stable 
approaches - which is absolutely critical, yet something that sadly not 
all mathematics texts care about.



The quartic equation is not treated in NR. The reduction trick is 
reported in Wikipedia:


https://en.wikipedia.org/wiki/Quartic_function#Solving_by_factoring_into_quadratics

where the original reference given is

Brookfield, G. (2007). "Factoring quartic polynomials: A lost art". 
Mathematics Magazine 80 (1): 67–70.


(This seemed an elegant approach, given stable solvers for cubics and 
quadratics.)



 -J

---------
Juha Jeronen, Ph.D.
juha.jero...@jyu.fi
University of Jyväskylä
Department of Mathematical Information Technology
-

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

Hi,

What qualifies as an NR part?

(See my previous message concerning the references; NR is not the only 
source which has these algorithms. Again, the code itself is original to 
this solver, I haven't even looked at the code provided with NR.)


 -J


-
Juha Jeronen, Ph.D.
juha.jero...@jyu.fi
University of Jyväskylä
Department of Mathematical Information Technology
-


On 30.09.2015 12:35, Matthieu Brucher wrote:

Yes, obviously, the code has NR parts, so it can't be licensed as BSD
as it is...

Matthieu

2015-09-30 2:37 GMT+01:00 Charles R Harris <charlesr.har...@gmail.com>:


On Tue, Sep 29, 2015 at 6:48 PM, Chris Barker - NOAA Federal
<chris.bar...@noaa.gov> wrote:

This sounds pretty cool -- and I've had a use case. So it would be
nice to get into Numpy.

But: I doubt we'd want OpenMP dependence in Numpy itself.

But maybe a pure Cython non-MP version?

Are we putting Cuthon in Numpy now? I lost track.


Yes, but we need to be careful of Numeric Recipes.



Chuck

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

On 30.09.2015 19:20, Chris Barker wrote:
On Wed, Sep 30, 2015 at 6:57 AM, Nathan Goldbaum 
> wrote:


Note however that with the current version of the code, not
having OpenMP will severely limit the performance, especially
on quad-core machines, since an important factor in the speed
comes from the parallel processing of the independent problem
instances.


It seems you got much more than 4 times performance improvement


True, good point. That's the power of a short explicit algorithm for a 
specific problem :)


(And maybe some Cython, too.)


-- which is the most you'll get from four cores, presumably :-). not 
that another 4 times or so isn't a good thing.


True in general :)

To be precise, in this case, HyperThreading seems to offer some extra 
performance.


And there seems to be something wonky with the power management on my 
laptop. The number I posted (1.2e7 quartics per second) seemed slightly 
low compared to earlier informal benchmarking I had done, and indeed, 
benchmarking with 8 threads again I'm now getting 1.6e7 quartics per 
second (i7-4710MQ at 2.5 GHz, RAM at 1333 MHz, running on Linux Mint 17.2).


Here are some more complete benchmark results. Because the polynomials 
to be solved are randomly generated during each run of the program, 
unless otherwise noted, I've rerun the benchmark three times for each 
number of threads and picked the lowest-performance result, rounded down.


Number of threads vs. quartics solved per second:

1: 3.0e6
2: 6.1e6
4: 1.1e7
8: 1.6e7  (note: higher than previously!)

Threads vs. cubics:

1: 8.4e6
2: 1.6e7
4: 3.0e7
8: 4.6e7

Threads vs. quadratics:

1: 2.5e7
2: 4.8e7
4: 8.5e7 (typical across 9 benchmarks; lowest 5.7e7)
8: 9.5e7

From these, in general the weak scaling seems pretty good, and for 
quadratics and cubics, HyperThreading offers some extra performance, 
namely about 50% over the 4-thread result in both cases. The total 
speedup for quartics is 5.3x (3.6x without HT), and for cubics, 5.4x 
(3.5x without HT).


"About 4x, as expected" is still a good characterization, though :)


For quadratics, 4 threads seems to be the useful maximum; improvement 
with HT is only 11%, with a total speedup of 3.8x (vs. 3.4x without HT). 
These require much fewer arithmetic operations than the higher-order 
cases, so a possible explanation is that this case is memory-bound. 
Running the numbers on a simplistic estimate, the total of input and 
output is far from hitting the DDR3 bandwidth limit, but of course 
that's ignoring (at least) latencies and the pattern of data accesses.


Testing the completely silly "linear" case that is there for documenting 
code structure only, I get:


1: 1.9e8
2: 3.1e8
4: 3.6e8
8: 2.9e8

which should give a reasonable first approximation for the maximum 
practical throughput on this machine, when performing almost no arithmetic.


Anyway, returning from the sidetrack...



But I suppose there may be another technical option to support
multiple cores


python threads with nogil?


...maybe?

I hadn't thought about this option. I suppose that in pseudocode, the 
code structure would need to be something like this?


---8<---8<---8<---

from threading import Thread

cdef fast_solver(...) nogil:
# ...do stuff without touching Python objects...

def my_thread_entry(j_start, j_end):
cdef unsigned int j
with nogil:
for j in range(j_start, j_end):
fast_solver(...)  # process local slice

t1 = Thread( target=my_thread_entry, args=(0, j_split) )
t2 = Thread( target=my_thread_entry, args=(j_split, j_end_global) )
t1.start()
t2.start()
t1.join()
t2.join()

---8<---8<---8<---


 -J

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

On 30.09.2015 19:20, Nathaniel Smith wrote:
The challenges to providing transparent multithreading in numpy 
generally are:


- gcc + OpenMP on linux still breaks multiprocessing. There's a patch 
to fix this but they still haven't applied it; alternatively there's a 
workaround you can use in multiprocessing (not using fork mode), but 
this requires every user update their code and the workaround has 
other limitations. We're unlikely to use OpenMP while this is the case.




Ah, I didn't know this. Thanks.


- parallel code in general is not very composable. If someone is 
calling a numpy operation from one thread, great, transparently using 
multiple threads internally is a win. If they're exploiting some 
higher-level structure in their problem to break it into pieces and 
process each in parallel, and then using numpy on each piece, then 
numpy spawning threads internally will probably destroy performance. 
And numpy is too low-level to know which case it's in. This problem 
exists to some extent already with multi-threaded BLAS, so people use 
various BLAS-specific knobs to manage it in ad hoc ways, but this 
doesn't scale.


Very good point. I've had both kinds of use cases myself.

It would be nice if there was some way to tell NumPy to either use 
additional threads or not, but that adds complexity. It's also not a 
good solution, considering that any higher-level code building on NumPy, 
if it is designed to be at all reusable, may find *itself* in either 
role. Only the code that, at any particular point of time in the 
development of a software project, happens to form the top level at that 
time, has the required context...


Then again, the matter is further complicated by considering codes that 
run on a single machine, versus codes that run on a cluster. Threads 
being local to each node in a cluster, it may make sense in a solver 
targeted for a cluster to split the problem at the process level, 
distribute the processes across the network, and use the threading 
capability to accelerate computation on each node.


A complex issue with probably no easy solutions :)


 -J

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

On 01.10.2015 01:04, Sturla Molden wrote:

There are two principal problems with using OpenMP in NumPy:

One is that the GNU OpenMP threadpool is not fork-safe, and can break 
multiprocessing on some platforms (e.g. when using Python 2.7 on 
Linux). Anything that uses GCD has this nasty effect on Apple and 
FreeBSD as well. Note that the problem is actually in multiprocessing. 
It is not present on Windows (there is no fork system call) and it is 
avoidable even on Linux with Python 3.4 or later. Also the default 
builds of NumPy and SciPy on MacOSX uses GCD by default.


The other problem is that NumPy with its iterator objects and gufuncs 
is not particularly fit for multithreading. There will be a lot of 
contention for the iterator object. Without a major redesign, one 
thread would do useful work while most of the others would be 
busy-waiting on a spinlock and fighting for iterator access. Not to 
mention that the iterator object would be false shared between the 
processors, which would trash the performance if you have more than 
one CPU, even when compared to using a single thread. This means that 
for multithreading to be useful in NumPy, the loops will have to be 
redesigned so that the work sharing between the threads is taken care 
of ahead of creating the iterator, i.e. allowing us to use one 
iterator per thread. Each iterator would then iterate over a slice of 
the original array. This is ok, but we cannot simply do this by adding 
OpenMP pragmas in the C code.


Thanks for the details.


Given the two problems mentioned above, it would likely be better to 
use a fork-safe threadpool instead of OpenMP.


Sounds good. Out of curiosity, are there any standard fork-safe 
threadpools, or would this imply rolling our own?



Also, after Chris's, Nathaniel's and your replies, I'm no longer sure an 
attempt at transparent multithreading belongs in the polynomial solver. 
Although the additional performance would be (very) nice when running it 
from a single-threaded program - after all, quad-cores are common these 
days - it is not just this specific solver that encounters the issue, 
but instead the same considerations apply to basically all NumPy operations.


So maybe it would be better, at least at first, to make a pure-Cython 
version with no attempt at multithreading?



 -J

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

On 30.09.2015 13:21, Pauli Virtanen wrote:

Juha Jeronen  jyu.fi> writes:

I recently developed a Cython-based, OpenMP-accelerated quartic (and
cubic, quadratic) polynomial solver to address a personal research need
for quickly solving a very large number of independent low-degree
polynomial equations for both real and complex coefficients.

My 2c in this context would be to also think how this best fits
in how collections of polynomials are represented in Numpy.

AFAICS, Numpy's polynomial module supports evaluation of polynomial
collections (cf. numpy.polynomial.polynomial.polyval),
but the corresponding root finding routine
(numpy.polynomial.polynomial.polyroots) only deals with one
polynomial at a time.

The present code in principle could be used to speed up the latter
routine after it is generalized to multiple polynomials. The general
case is probably doable just by coding up the companion matrix
approach using low-level routines (or maybe with the new vectorized
np.linalg routines).

Some relevant code elsewhere for similar purposes can be
found in scipy.inteprolate.PPoly/BPoly (and in the future BSpline).

https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.PPoly.html

However, since it's piecewise, there's purposefully support only
for real-valued roots.


Thanks.

It sounds like numpy.polynomial.polynomial.polyroots() is a good 
candidate place for this, if as you suggest, it is first generalized to 
handle multiple polynomials.


 -J

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-29 Thread Juha Jeronen

Hi all,

I recently developed a Cython-based, OpenMP-accelerated quartic (and 
cubic, quadratic) polynomial solver to address a personal research need 
for quickly solving a very large number of independent low-degree 
polynomial equations for both real and complex coefficients.


For example, on an Intel i7-4710MQ, running with 8 threads this code 
solves approximately 1.2e7 quartic polynomial equations per second. 
(With OMP_NUM_THREADS=4 the solver gives approximately 8.9e6 solutions 
per second, so it seems HyperThreading in this case helps about 30%.)


The algorithms for cubics and quadratics come from Numerical Recipes 
(3rd ed.), and the quartic problem is internally reduced to a cubic and 
two quadratics, using well-known standard tricks.



Since to my understanding, for solving polynomial equations NumPy only 
provides roots(), which works one problem at a time, I'm writing to 
inquire whether there would be interest to include this functionality to 
NumPy, if I contribute the code (and clean it up for integration)?



I have some previous open source contribution experience. I have 
authored the IVTC and Phosphor deinterlacers for VLC, and a modular 
postprocessing filter framework for the Panda3D game engine (posted at 
the forums on panda3d.org, projected for version 1.10).


Currently the polynomial solver is available in a git repository hosted 
by our university, the relevant files being:


https://yousource.it.jyu.fi/jjrandom2/miniprojects/blobs/master/misc/tworods/polysolve2.pyx
https://yousource.it.jyu.fi/jjrandom2/miniprojects/blobs/master/misc/tworods/polysolve.py
https://yousource.it.jyu.fi/jjrandom2/miniprojects/blobs/master/misc/tworods/polysolve2example.py
https://yousource.it.jyu.fi/jjrandom2/miniprojects/blobs/master/misc/tworods/setup.py
https://yousource.it.jyu.fi/jjrandom2/miniprojects/blobs/master/misc/tworods/compile.sh

I'm working on a research grant, which allows me to hold the copyright; 
license is BSD.


Polysolve2 is the fast Cython+OpenMP version, while polysolve is an 
earlier (slower) experimental version using only NumPy array operations. 
The example module contains a usage example, and setup (in the standard 
manner) instructs distutils and Cython as to how to build polysolve2 
(including setting the relevant math flags and enabling OpenMP).


(The setup script includes also some stuff specific to my solver for the 
Ziegler problem; basically, the "tworods" module can be ignored. I 
apologize for the inconvenience.)



Thoughts?


Best regards,

 -J

-
Juha Jeronen, Ph.D.
juha.jero...@jyu.fi
University of Jyväskylä
Department of Mathematical Information Technology
-

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic, polynomial solver

2015-09-29 Thread Juha Jeronen

Hi,

On 29.09.2015 19:16, Matti Picus wrote:

The algorithms for cubics and quadratics come from Numerical Recipes
(3rd ed.), and the quartic problem is internally reduced to a cubic and
two quadratics, using well-known standard tricks.

Nice, wll documented code.
Just to be sure you are on safe ground, you took the *algorithms* but no *code* 
from Numerical Recipes, right? They are pretty clear about their licensing, see
http://www.nr.com/licenses/redistribute.html
This bit from the site might be relevant:
"We never give permission for Numerical Recipes source code to be posted on any 
public server, or distributed with any freeware or shareware package."
Matti


Yes, the code is original (written from scratch), only the algorithms 
come from the book.



 -J

-
Juha Jeronen, Ph.D.
juha.jero...@jyu.fi
University of Jyväskylä
Department of Mathematical Information Technology
-

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion