Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Sturla Molden
Feng Yu  wrote:
 
> In most (half?) situations the result can be directly write back via
> preallocated shared array before works are spawned. Then there is no
> need to pass data back with named segments.

You can work around it in various ways, this being one of them.

Personally I prefer a parallel programming style with queues – either to
scatter arrays to workers and collecting arrays from workers, or to chain
workers together in a pipeline (without using coroutines). But exactly how
you program is a matter of taste. I want to make it as inexpensive as
possible to pass a NumPy array through a queue. If anyone else wants to
help improve parallel programming with NumPy using a different paradigm,
that is fine too. I just wanted to clarify why I stopped working on shared
memory arrays.

(As for the implementation, I am also experimenting with platform dependent
asynchronous I/O (IOCP, GCD or kqueue, epoll) to pass NumPy arrays though a
queue as inexpensively and scalably as possible. And no, there is no public
repo, as I like to experiment with my pet project undisturbed before I let
it out in the wild.)


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Dave
Antoine Pitrou  pitrou.net> writes:

> 
> On Thu, 12 May 2016 06:27:43 + (UTC)
> Sturla Molden  gmail.com> wrote:
> 
> > Allan Haldane  gmail.com> wrote:
> > 
> > > You probably already know this, but I just wanted to note that the
> > > mpi4py module has worked around pickle too. They discuss how they
> > > efficiently transfer numpy arrays in mpi messages here:
> > > http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-
python-objects-and-array-data
> > 
> > Unless I am mistaken, they use the PEP 3118 buffer interface to 
support
> > NumPy as well as a number of other Python objects. However, this 
protocol
> > makes buffer aquisition an expensive operation.
> 
> Can you define "expensive"?
> 
> > You can see this in Cython
> > if you use typed memory views. Assigning a NumPy array to a typed
> > memoryview (i,e, buffer acqisition) is slow.
> 
> You're assuming this is the cost of "buffer acquisition", while most
> likely it's the cost of creating the memoryview object itself.
> 
> Buffer acquisition itself only calls a single C callback and uses a
> stack-allocated C structure. It shouldn't be "expensive".
> 
> Regards
> 
> Antoine.
> 


When I looked at it, using a typed memoryview was between 7-50 times 
slower than using numpy directly:

http://thread.gmane.org/gmane.comp.python.cython.devel/14626


It looks like there was some improvement since then:

https://github.com/numpy/numpy/pull/3779


...and repeating my experiment shows the deficit is down to 3-11 times 
slower.


In [5]: x = randn(1)

In [6]: %timeit echo_memview(x)
The slowest run took 14.98 times longer than the fastest. This could 
mean that an intermediate result is being cached.
10 loops, best of 3: 5.31 µs per loop

In [7]: %timeit echo_memview_nocast(x)
The slowest run took 10.80 times longer than the fastest. This could 
mean that an intermediate result is being cached.
100 loops, best of 3: 1.58 µs per loop

In [8]: %timeit echo_numpy(x)
The slowest run took 58.81 times longer than the fastest. This could 
mean that an intermediate result is being cached.
100 loops, best of 3: 474 ns per loop



-Dave
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Sturla Molden
Niki Spahiev  wrote:

> Apparently next Win10 will have fork as part of bash integration.

That would be great. The lack of fork on Windows is very annoying.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Sturla Molden
Antoine Pitrou  wrote:

> Can you define "expensive"?

Slow enough to cause complaints on the Cython mailing list.

> You're assuming this is the cost of "buffer acquisition", while most
> likely it's the cost of creating the memoryview object itself.

Constructing a typed memoryview from a typed memoryview or a slice is fast.
Numerical code doing this intensively is still within 80-90% of the speed
of plain c code using pointer arithmetics.

 
> Buffer acquisition itself only calls a single C callback and uses a
> stack-allocated C structure. It shouldn't be "expensive".

I don't know the reason, only that buffer acquisition from NumPy arrays
with typed memoryviews is very expensive compared to assigning a typed
memoryview to another or slicing a typed memoryview.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Feng Yu
> Again, not everyone uses Unix.
>
> And on Unix it is not trival to pass data back from the child process. I
> solved that problem with Sys V IPC (pickling the name of the segment).
>

I wonder if it is neccessary insist being able to pass large amount of data
back from child to the parent process.

In most (half?) situations the result can be directly write back via
preallocated shared array before works are spawned. Then there is no
need to pass data back with named segments.

Here I am just doodling some possible use cases along the OpenMP line.
The sample would just copy the data from s to r, in two different
ways. On systems that does not support multiprocess + fork, the
semantics is still well preserved if threading is used.

```
import .. as mp

# the access attribute of inherited variables is at least 'privatecopy'
# but with threading backend it becomes 'shared'
s = numpy.arange(1)

with mp.parallel(num_threads=8) as section:
r = section.empty(1) # variables defined via section.empty
will always be 'shared'
def work():
 # variables defined in the body is 'private'
 tid = section.get_thread_num()
 size = section.get_num_threads()
 sl = slice(tid * r.size // size, (tid + 1) * r.size // size)
 r[sl] = s[sl]

status = section.run(work)
assert not any(status.errors)

# the support to the following could be implemented with section.run

chunksize = 1000
def work(i):
  sl = slice(i, i + chunksize)
  r[sl] = s[sl]
  return s[sl].sum()
status = section.loop(work, range(0, r.size, chunksize), schedule='static')
assert not any(status.errors)
total = sum(status.results)
```

>> 6. If we are to define a set of operations I would recommend take a
>> look at OpenMP as a reference -- It has been out there for decades and
>> used widely. An equiavlant to the 'omp parallel for' construct in
>> Python will be a very good starting point and immediately useful.
>
> If you are on Unix, you can just use a context manager. Call os.fork in
> __enter__ and os.waitpid in __exit__.
>
> Sturla
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Antoine Pitrou
On Thu, 12 May 2016 06:27:43 + (UTC)
Sturla Molden  wrote:

> Allan Haldane  wrote:
> 
> > You probably already know this, but I just wanted to note that the
> > mpi4py module has worked around pickle too. They discuss how they
> > efficiently transfer numpy arrays in mpi messages here:
> > http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-python-objects-and-array-data
> 
> Unless I am mistaken, they use the PEP 3118 buffer interface to support
> NumPy as well as a number of other Python objects. However, this protocol
> makes buffer aquisition an expensive operation.

Can you define "expensive"?

> You can see this in Cython
> if you use typed memory views. Assigning a NumPy array to a typed
> memoryview (i,e, buffer acqisition) is slow.

You're assuming this is the cost of "buffer acquisition", while most
likely it's the cost of creating the memoryview object itself.

Buffer acquisition itself only calls a single C callback and uses a
stack-allocated C structure. It shouldn't be "expensive".

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] scipy 0.17.1 release

2016-05-12 Thread Evgeni Burovski
Hi,

On behalf of the scipy development team, I'm pleased to announce the
availability of scipy 0.17.1.

This is a bugfix release with no new features compared to 0.17.0.

Source tarballs and OS X wheels are available from PyPI or from GitHub
releases at https://github.com/scipy/scipy/releases/tag/v0.17.1

We recommend that all users upgrade from scipy 0.17.0.

Cheers,

Evgeni



==
SciPy 0.17.1 Release Notes
==

SciPy 0.17.1 is a bug-fix release with no new features compared to 0.17.0.


Issues closed for 0.17.1


- `#5817 `__: BUG: skew,
kurtosis return np.nan instead of "propagate"
- `#5850 `__: Test failed
with sgelsy
- `#5898 `__:
interpolate.interp1d crashes using float128
- `#5953 `__: Massive
performance regression in cKDTree.query with L_inf distance...
- `#6062 `__: mannwhitneyu
breaks backward compatibility in 0.17.0
- `#6134 `__: T test does
not handle nans


Pull requests for 0.17.1


- `#5902 `__: BUG:
interpolate: make interp1d handle np.float128 again
- `#5957 `__: BUG: slow down
with p=np.inf in 0.17 cKDTree.query
- `#5970 `__: Actually
propagate nans through stats functions with nan_policy="propagate"
- `#5971 `__: BUG: linalg:
fix lwork check in *gelsy
- `#6074 `__: BUG: special:
fixed violation of strict aliasing rules.
- `#6083 `__: BUG: Fix dtype
for sum of linear operators
- `#6100 `__: BUG: Fix
mannwhitneyu to be backward compatible
- `#6135 `__: Don't pass
null pointers to LAPACK, even during workspace queries.
- `#6148 `__: stats: fix
handling of nan values in T tests and kendalltau
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: SfePy 2016.2

2016-05-12 Thread Robert Cimrman

I am pleased to announce release 2016.2 of SfePy.

Description
---

SfePy (simple finite elements in Python) is a software for solving systems of
coupled partial differential equations by the finite element method or by the
isogeometric analysis (preliminary support). It is distributed under the new
BSD license.

Home page: http://sfepy.org
Mailing list: http://groups.google.com/group/sfepy-devel
Git (source) repository, issue tracker, wiki: http://github.com/sfepy

Highlights of this release
--

- partial shell10x element implementation
- parallel computation of homogenized coefficients
- clean up of elastic terms
- read support for msh file mesh format of gmsh

For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1
(rather long and technical).

Best regards,
Robert Cimrman on behalf of the SfePy development team

---

Contributors to this release in alphabetical order:

Robert Cimrman
Vladimir Lukes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Niki Spahiev

On 12.05.2016 02:02, Sturla Molden wrote:

Feng Yu  wrote:


1. If we are talking about shared memory and copy-on-write
inheritance, then we are using 'fork'.


Not available on Windows. On Unix it only allows one-way communication,
from parent to child.


Apparently next Win10 will have fork as part of bash integration.

Niki

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Sturla Molden
Allan Haldane  wrote:

> You probably already know this, but I just wanted to note that the
> mpi4py module has worked around pickle too. They discuss how they
> efficiently transfer numpy arrays in mpi messages here:
> http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-python-objects-and-array-data

Unless I am mistaken, they use the PEP 3118 buffer interface to support
NumPy as well as a number of other Python objects. However, this protocol
makes buffer aquisition an expensive operation. You can see this in Cython
if you use typed memory views. Assigning a NumPy array to a typed
memoryview (i,e, buffer acqisition) is slow. They are correct that avoiding
pickle means we save some memory. It also avoids creating and destroying
temporary Python objects, and associated reference counting. However,
because of the expensive buffer acquisition, I am not sure how much faster
their apporach will be. I prefer to use the NumPy C API, and bypass any
unneccesary overhead. The idea is to make IPC of NumPy arrays fast, and
then we cannot have an expensive buffer acquisition in there.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion