Re: [Numpy-discussion] Fast decrementation of indices

2014-02-03 Thread Mads Ipsen
Hi,

Thanks to everybody for all you valuable responses. This approach by
Rick White seems to nail it all down:

>> b = np.array([
>>  [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3,  4, 4, 4, 5,  5, 5, 6, 7, 8, 9, 10, 
>> 11],
>>  [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3,  4,  5]
>> ])
>> 
>> a = [1,2,3,7,8]
>> 
>> keepdata = np.ones(12, dtype=np.bool)
>> keepdata[a] = False
>> w = np.where(keepdata[b[0]] & keepdata[b[1]])
>> newindex = keepdata.cumsum()-1
>> c = newindex[b[:,w[0]]]

Also, I'd like to mention that I did think about using the graph module
from SciPy. But the index bookkeeping done by numpy is in fact index
pointers to memory location in a C++ driver - and not just labels.

An when atoms are deleted, there memory chunks are also cleared, and
therefore all pointers to these must be decremented. So using numpy for
the bookkeeping seems a natural choice.

Best regards,

Mads

On 02/03/2014 02:36 PM, Rick White wrote:
> b = np.array([
>  [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3,  4, 4, 4, 5,  5, 5, 6, 7, 8, 9, 10, 11],
>  [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3,  4,  5]
> ])
> 
> a = [1,2,3,7,8]
> 
> keepdata = np.ones(12, dtype=np.bool)
> keepdata[a] = False
> w = np.where(keepdata[b[0]] & keepdata[b[1]])
> newindex = keepdata.cumsum()-1
> c = newindex[b[:,w[0]]]

-- 
+-+
| Mads Ipsen  |
+--+--+
| Gåsebæksvej 7, 4. tv | phone:  +45-29716388 |
| DK-2500 Valby| email:  mads.ip...@gmail.com |
| Denmark  | map  :   www.tinyurl.com/ns52fpa |
+--+--+
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: Scipy 0.13.3 release

2014-02-03 Thread Ralf Gommers
Hi,

I'm happy to announce the availability of the scipy 0.13.3 release. This is
a bugfix only release; it contains fixes for regressions in ndimage and
weave.

Source tarballs can be found at
https://sourceforge.net/projects/scipy/files/scipy/0.13.3/ and on PyPi.
Release notes copied below, binaries will follow later (the regular build
machine is not available for the next two weeks).

Cheers,
Ralf



==
SciPy 0.13.3 Release Notes
==

SciPy 0.13.3 is a bug-fix release with no new features compared to 0.13.2.
Both the weave and the ndimage.label bugs were severe regressions in 0.13.0,
hence this release.

Issues fixed

- 3148: fix a memory leak in ``ndimage.label``.
- 3216: fix weave issue with too long file names for MSVC.

Other changes
-
- Update Sphinx theme used for html docs so ``>>>`` in examples can be
toggled.

Checksums
=
0547c1f8e8afad4009cc9b5ef17a2d4d  release/installers/scipy-0.13.3.tar.gz
20ff3a867cc5925ef1d654aed2ff7e88  release/installers/scipy-0.13.3.zip
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Indexing changes in 1.9

2014-02-03 Thread Charles R Harris
On Mon, Feb 3, 2014 at 2:32 PM, Travis Oliphant  wrote:

> Hey Sebastien,
>
> I didn't mean to imply that you would need to necessarily work on it.
> But the work Jay has done could use review.
>
> There are also conversations to have about what to do to resolve the
> ambiguity that led to the current behavior.
>
> Thank you or all the great work on the indexing code paths.
>
> Is their a roadmap for 1.9?
>

We don't have a roadmap, but the indexing code is something I'd like to see
go in. Are you asking in relation to Jay's work? I have a loose idea to
branch 1.9 sometime in late April/May. The other important bit to fix up,
which isn't started yet, is datetime.



Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Indexing changes in 1.9

2014-02-03 Thread Travis Oliphant
Hey Sebastien,

I didn't mean to imply that you would need to necessarily work on it.   But
the work Jay has done could use review.

There are also conversations to have about what to do to resolve the
ambiguity that led to the current behavior.

Thank you or all the great work on the indexing code paths.

Is their a roadmap for 1.9?

Travis
 On Feb 3, 2014 1:26 PM, "Sebastian Berg" 
wrote:

> On Sun, 2014-02-02 at 13:11 -0600, Travis Oliphant wrote:
> > This sounds like a great and welcome work and improvements.
> >
> > Does it make sense to also do something about the behavior of advanced
> > indexing when slices are interleaved between lists and integers.
> >
> > I know that jay borque has some  preliminary work to fix this.  There
> > are a some straightforward fixes -- like doing iterative application
> > of indexing in those cases which would be more sensical in the cases
> > where current code gets tripped up.
> >
>
> I guess you are talking about the funky transposing logic and maybe the
> advanced indexing logic as such? I didn't really think about changing
> any of that, not sure if we easily can?
> Personally, I always wondered if it would make sense to add some new
> type of indexing mechanism to switch to R/matlab style non-advanced
> integer-array indexing. I don't think this will make it substantially
> easier to do (the basic logic remains the same -- we need an
> extra/different preparation and then transpose the result differently),
> though it might be a bit more obvious where/how to plug it in.
>
> But it seems very unlikely I will look into that in the near future (but
> if someone wants hints on how to go about it, just ask).
>
> - Sebastian
>
> > Travis
> >
> > On Feb 2, 2014 11:07 AM, "Charles R Harris"
> >  wrote:
> > Sebastian has done a lot of work to refactor/rationalize numpy
> > indexing. The changes are extensive enough that it would be
> > good to have more public review, so here is the release note.
> >
> > The NumPy indexing has seen a complete rewrite in this
> > version. This makes
> > most advanced integer indexing operations much faster
> > and should have no
> > other implications.
> > However some subtle changes and deprecations were
> > introduced in advanced
> > indexing operations:
> >
> >   * Boolean indexing into scalar arrays will always
> > return a new 1-d array.
> > This means that ``array(1)[array(True)]`` gives
> > ``array([1])`` and
> > not the original array.
> >   * Advanced indexing into one dimensional arrays used
> > to have (undocumented)
> > special handling regarding repeating the value
> > array in assignments
> > when the shape of the value array was too small or
> > did not match.
> > Code using this will raise an error. For
> > compatibility you can use
> > ``arr.flat[index] = values``, which uses the old
> > code branch.
> >   * The iteration order over advanced indexes used to
> > be always C-order.
> > In NumPy 1.9. the iteration order adapts to the
> > inputs and is not
> > guaranteed (with the exception of a *single*
> > advanced index which is
> > never reversed for compatibility reasons). This
> > means that the result is
> > undefined if multiple values are assigned to the
> > same element.
> > An example for this is ``arr[[0, 0], [1, 1]] = [1,
> > 2]``, which may
> > set ``arr[0, 1]`` to either 1 or 2.
> >   * Equivalent to the iteration order, the memory
> > layout of the advanced
> > indexing result is adapted for faster indexing and
> > cannot be predicted.
> >   * All indexing operations return a view or a copy.
> > No indexing operation
> > will return the original array object.
> >   * In the future Boolean array-likes (such as lists
> > of python bools)
> > will always be treated as Boolean indexes and
> > Boolean scalars (including
> > python `True`) will be a legal *boolean* index. At
> > this time, this is
> > already the case for scalar arrays to allow the
> > general
> > ``positive = a[a > 0]`` to work when ``a`` is zero
> > dimensional.
> >   * In NumPy 1.8 it was possible to use 

Re: [Numpy-discussion] Indexing changes in 1.9

2014-02-03 Thread Sebastian Berg
On Sun, 2014-02-02 at 13:11 -0600, Travis Oliphant wrote:
> This sounds like a great and welcome work and improvements.
> 
> Does it make sense to also do something about the behavior of advanced
> indexing when slices are interleaved between lists and integers.
> 
> I know that jay borque has some  preliminary work to fix this.  There
> are a some straightforward fixes -- like doing iterative application
> of indexing in those cases which would be more sensical in the cases
> where current code gets tripped up.
> 

I guess you are talking about the funky transposing logic and maybe the
advanced indexing logic as such? I didn't really think about changing
any of that, not sure if we easily can?
Personally, I always wondered if it would make sense to add some new
type of indexing mechanism to switch to R/matlab style non-advanced
integer-array indexing. I don't think this will make it substantially
easier to do (the basic logic remains the same -- we need an
extra/different preparation and then transpose the result differently),
though it might be a bit more obvious where/how to plug it in.

But it seems very unlikely I will look into that in the near future (but
if someone wants hints on how to go about it, just ask).

- Sebastian

> Travis 
> 
> On Feb 2, 2014 11:07 AM, "Charles R Harris"
>  wrote:
> Sebastian has done a lot of work to refactor/rationalize numpy
> indexing. The changes are extensive enough that it would be
> good to have more public review, so here is the release note.
> 
> The NumPy indexing has seen a complete rewrite in this
> version. This makes
> most advanced integer indexing operations much faster
> and should have no
> other implications.
> However some subtle changes and deprecations were
> introduced in advanced
> indexing operations:
> 
>   * Boolean indexing into scalar arrays will always
> return a new 1-d array.
> This means that ``array(1)[array(True)]`` gives
> ``array([1])`` and
> not the original array.
>   * Advanced indexing into one dimensional arrays used
> to have (undocumented)
> special handling regarding repeating the value
> array in assignments
> when the shape of the value array was too small or
> did not match.
> Code using this will raise an error. For
> compatibility you can use
> ``arr.flat[index] = values``, which uses the old
> code branch.
>   * The iteration order over advanced indexes used to
> be always C-order.
> In NumPy 1.9. the iteration order adapts to the
> inputs and is not
> guaranteed (with the exception of a *single*
> advanced index which is
> never reversed for compatibility reasons). This
> means that the result is
> undefined if multiple values are assigned to the
> same element.
> An example for this is ``arr[[0, 0], [1, 1]] = [1,
> 2]``, which may
> set ``arr[0, 1]`` to either 1 or 2.
>   * Equivalent to the iteration order, the memory
> layout of the advanced
> indexing result is adapted for faster indexing and
> cannot be predicted.
>   * All indexing operations return a view or a copy.
> No indexing operation
> will return the original array object.
>   * In the future Boolean array-likes (such as lists
> of python bools)
> will always be treated as Boolean indexes and
> Boolean scalars (including
> python `True`) will be a legal *boolean* index. At
> this time, this is
> already the case for scalar arrays to allow the
> general
> ``positive = a[a > 0]`` to work when ``a`` is zero
> dimensional.
>   * In NumPy 1.8 it was possible to use `array(True)`
> and `array(False)`
> equivalent to 1 and 0 if the result of the
> operation was a scalar.
> This will raise an error in NumPy 1.9 and, as
> noted above, treated as a
> boolean index in the future.
>   * All non-integer array-likes are deprecated, object
> arrays of custom
> integer like objects may have to be cast
> explicitly.
> 

Re: [Numpy-discussion] Indexing changes in 1.9

2014-02-03 Thread Sebastian Berg
On Mon, 2014-02-03 at 00:41 -0800, Dinesh Vadhia wrote:
> Does the numpy indexing refactorizing address the performance of fancy
> indexing highlighted in wes mckinney's blog some years back -
> http://wesmckinney.com/blog/?p=215 - where numpy.take() was shown to
> be preferable than fancy indexing?

Well, technically there is a pretty big difference in how those two
approach the problem and the indexing is much more general[1] and
(especially now) optimized for large subspaces[2]. So while the new code
is faster even for small subspaces, the linked test hits the performance
wise worst spot (of course make the thing 1x2 will be a bit worse):

In [4]: arr = np.random.randn(1, 5)
In [5]: indexer = np.arange(1)
In [6]: random.shuffle(indexer)

In [7]: %timeit arr[indexer]
1000 loops, best of 3: 300 us per loop

In [8]: %timeit arr.take(indexer, axis=0)
1 loops, best of 3: 95.4 us per loop

# With a larger subspace the result is (now) different though:
In [9]: arr = np.random.randn(1, 100)

In [10]: %timeit arr[indexer]
100 loops, best of 3: 4.85 ms per loop

In [11]: %timeit arr.take(indexer, axis=0)
100 loops, best of 3: 5.02 ms per loop


So the performance in this use case improved (maybe a factor of 3, which
is neither shabby nor sensational), but the subspace design and no big
special cases for this operation leads to overhead. You could likely
squeeze out a bit, but squeezing out a lot is non-trivial [3].

However this should be one of the few cases where take should still
outperform advanced indexing.

- Sebastian


[1] Mostly multiple integer array indices, which take does not support
and arbitrary memory order.
[2] subspace here means the non-advanced indexing part. In the case of
arr[integer_array, :] the (or a) subspace is arr[0, :].
[3] Basically, we have the value and the indexing arrays to iterate,
since they are arbitrary (unlike take which assumes contiguous) and
broadcasting etc. I did not try to squeeze them into a single iterator
when there is a subspace (it is complicating, basically I want buffering
for the index arrays, but if you use buffering on the value array you
can't easily get the pointer into the subspace, etc.).
This means you have two iterators, which again means that the external
loop optimization is not quite trivial (since the two inner loops can
have different sizes). So *if* you were to just add the logic for the
different sized inner loops, then you could use the external loop
optimization of NpyIter and probably remove the overhead causing (most
of) this discrepancy. Though I would suggest timing first to be sure
this is the reason ^^.

>  
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast decrementation of indices

2014-02-03 Thread Rick White
I think you'll find the algorithm below to be a lot faster, especially if the 
arrays are big.  Checking each array index against the list of included or 
excluded elements is must slower than simply creating a secondary array and 
looking up whether the elements are included or not.

b = np.array([
 [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3,  4, 4, 4, 5,  5, 5, 6, 7, 8, 9, 10, 11],
 [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3,  4,  5]
])

a = [1,2,3,7,8]

keepdata = np.ones(12, dtype=np.bool)
keepdata[a] = False
w = np.where(keepdata[b[0]] & keepdata[b[1]])
newindex = keepdata.cumsum()-1
c = newindex[b[:,w[0]]]

Cheers,
Rick

On 2 February 2014 20:58, Mads Ipsen  wrote:

> Hi,
> 
> I have run into a potential 'for loop' bottleneck. Let me outline:
> 
> The following array describes bonds (connections) in a benzene molecule
> 
>b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3,  4, 4, 4, 5,  5, 5, 6, 7,
> 8, 9, 10, 11],
> [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1,
> 2, 3,  4,  5]]
> 
> ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In
> practical examples, the list can be much larger (N > 100.000 connections.
> 
> Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds
> connecting those atoms must be deleted. I achieve this doing
> 
> i_0 = numpy.in1d(b[0], a)
> i_1 = numpy.in1d(b[1], a)
> b_i = numpy.where(i_0 | i_1)[0]
> b = b[:,~(i_0 | i_1)]
> 
> If you find this approach lacking, feel free to comment.
> 
> This results in the following updated bond list
> 
> b = [[0,  0,  4,  4,  5,  5,  5,  6, 10, 11]
> [5,  6, 10,  5,  4, 11,  0,  0,  4,  5]]
> 
> This list is however not correct: Since atoms [1,2,3,7,8] have been
> deleted, the remaining atoms with indices larger than the deleted atoms
> must be decremented. I do this as follows:
> 
> for i in a:
>b = numpy.where(b > i, bonds-1, bonds)  (*)
> 
> yielding the correct result
> 
> b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6],
> [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]]
> 
> The Python for loop in (*) may easily contain 50.000 iteration. Is there
> a smart way to utilize numpy functionality to avoid this?
> 
> Thanks and best regards,
> 
> Mads
> 
> -- 
> +-+
> | Mads Ipsen  |
> +--+--+
> | G?seb?ksvej 7, 4. tv | phone:  +45-29716388 |
> | DK-2500 Valby| email:  mads.ip...@gmail.com |
> | Denmark  | map  :   www.tinyurl.com/ns52fpa |
> +--+--+

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast decrementation of indices

2014-02-03 Thread Eelco Hoogendoorn
Seconding Jaime; I use this trick in mesh manipulations a lot as well.
There are a lot of graph-type manipulations you can express effectively in
numpy using np.unique and related functionality.


On Sun, Feb 2, 2014 at 11:57 PM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> Cannot test right now, but np.unique(b, return_inverse=True)[1].reshape(2,
> -1) should do what you are after, I think.
>  On Feb 2, 2014 11:58 AM, "Mads Ipsen"  wrote:
>
>> Hi,
>>
>> I have run into a potential 'for loop' bottleneck. Let me outline:
>>
>> The following array describes bonds (connections) in a benzene molecule
>>
>> b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3,  4, 4, 4, 5,  5, 5, 6, 7,
>> 8, 9, 10, 11],
>>  [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1,
>> 2, 3,  4,  5]]
>>
>> ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In
>> practical examples, the list can be much larger (N > 100.000 connections.
>>
>> Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds
>> connecting those atoms must be deleted. I achieve this doing
>>
>> i_0 = numpy.in1d(b[0], a)
>> i_1 = numpy.in1d(b[1], a)
>> b_i = numpy.where(i_0 | i_1)[0]
>> b = b[:,~(i_0 | i_1)]
>>
>> If you find this approach lacking, feel free to comment.
>>
>> This results in the following updated bond list
>>
>> b = [[0,  0,  4,  4,  5,  5,  5,  6, 10, 11]
>>  [5,  6, 10,  5,  4, 11,  0,  0,  4,  5]]
>>
>> This list is however not correct: Since atoms [1,2,3,7,8] have been
>> deleted, the remaining atoms with indices larger than the deleted atoms
>> must be decremented. I do this as follows:
>>
>> for i in a:
>> b = numpy.where(b > i, bonds-1, bonds)  (*)
>>
>> yielding the correct result
>>
>> b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6],
>>  [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]]
>>
>> The Python for loop in (*) may easily contain 50.000 iteration. Is there
>> a smart way to utilize numpy functionality to avoid this?
>>
>> Thanks and best regards,
>>
>> Mads
>>
>> --
>> +-+
>> | Mads Ipsen  |
>> +--+--+
>> | Gåsebæksvej 7, 4. tv | phone:  +45-29716388 |
>> | DK-2500 Valby| email:  mads.ip...@gmail.com |
>> | Denmark  | map  :   www.tinyurl.com/ns52fpa |
>> +--+--+
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast decrementation of indices

2014-02-03 Thread Daπid
On 2 February 2014 20:58, Mads Ipsen  wrote:

> ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In
> practical examples, the list can be much larger (N > 100.000 connections.
>


Perhaps you should consider an alternative approach. You could consider it
a graph, and you could use Networkx or Scipy to work with them (provided it
actually works well with the rest of your problem)

In the case of Scipy, the graph is described by its adjacency matrix, and
you just want to delete a row and a column.

But, in any case, not knowing at all what is your overall project,
renumbering nodes is not something one has to usually do when working with
graphs, except for final results. The labels are that, labels, with no
further meaning.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Indexing changes in 1.9

2014-02-03 Thread Dinesh Vadhia
Does the numpy indexing refactorizing address the performance of fancy indexing 
highlighted in wes mckinney's blog some years back - 
http://wesmckinney.com/blog/?p=215 - where numpy.take() was shown to be 
preferable than fancy indexing?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion