date:20091016

Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Charles R Harris

On Fri, Oct 16, 2009 at 9:35 PM, Travis Oliphant wrote:

>
> On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote:
>
> I recently committed a regression test and bugfix for object pointers in
> record arrays of unaligned size (meaning where each record is not a
> multiple of sizeof(PyObject **)).
>
> For example:
>
>a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
>a2 = np.zeros((10,), 'S10')
># This copying would segfault
>a1['o'] = a2
>
> http://projects.scipy.org/numpy/ticket/1198
>
> Unfortunately, this unit test has opened up a whole hornet's nest of
> alignment issues on Solaris.  The various reference counting functions
> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers,
> for instance.  Interestingly, there are comments in there saying
> "handles misaligned data" (eg. line 190), but in fact it doesn't, and
> doesn't look to me like it would.  But I won't rule out a mistake in
> building it on my part.
>
>
> Thanks for this bug report.  It would be very helpful if you could
> provide the line number where the code is giving a bus error and explain why
> you think the code in question does not handle misaligned data (it still
> seems like it should to me --- but perhaps I must be missing something --- I
> don't have a Solaris box to test on).   Perhaps, the real problem is
> elsewhere (such as other places where the mistake of forgetting about
> striding needing to be aligned also before pursuing the fast alignment path
> that you pointed out in another place of code).
>
> This was the thinking for why the code (that I think is in question) should
> handle mis-aligned data:
>
> 1) pointers that are not aligned to the correct size need to be copied to
> an aligned memory area before being de-referenced.
> 2) static variables defined in a function will be aligned by the C
> compiler.
>
> So, what the code in refcnt.c does is to copy the value in the NumPy
> data-area (i.e. pointed to by it->dataptr) to another memory location (the
> stack variable temp), dereference it and then increment it's reference
> count.
>
> 196:  temp = (PyObject **)it->dataptr;
> 197:  Py_XINCREF(*temp);
>

Doesn't it->dataptr need to be copied to temp, not just assigned?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Travis Oliphant



On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote:

I recently committed a regression test and bugfix for object  
pointers in

record arrays of unaligned size (meaning where each record is not a
multiple of sizeof(PyObject **)).

For example:

   a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
   a2 = np.zeros((10,), 'S10')
   # This copying would segfault
   a1['o'] = a2

http://projects.scipy.org/numpy/ticket/1198

Unfortunately, this unit test has opened up a whole hornet's nest of
alignment issues on Solaris.  The various reference counting functions
(PyArray_INCREF etc.) in refcnt.c all fail on unaligned object  
pointers,

for instance.  Interestingly, there are comments in there saying
"handles misaligned data" (eg. line 190), but in fact it doesn't, and
doesn't look to me like it would.  But I won't rule out a mistake in
building it on my part.


Thanks for this bug report.  It would be very helpful if you could  
provide the line number where the code is giving a bus error and  
explain why you think the code in question does not handle misaligned  
data (it still seems like it should to me --- but perhaps I must be  
missing something --- I don't have a Solaris box to test on).
Perhaps, the real problem is elsewhere (such as other places where the  
mistake of forgetting about striding needing to be aligned also before  
pursuing the fast alignment path that you pointed out in another place  
of code).


This was the thinking for why the code (that I think is in question)  
should handle mis-aligned data:


1) pointers that are not aligned to the correct size need to be copied  
to an aligned memory area before being de-referenced.
2) static variables defined in a function will be aligned by the C  
compiler.


So, what the code in refcnt.c does is to copy the value in the NumPy  
data-area (i.e. pointed to by it->dataptr) to another memory location  
(the stack variable temp), dereference it and then increment it's  
reference count.


196:  temp = (PyObject **)it->dataptr;
197:  Py_XINCREF(*temp);

I'm puzzled why this should fail.The stack trace showing where  
this fails would be very useful in figuring out what to fix.



This is all independent of defining a variable to decide whether or  
not to even care about worrying about un-aligned data (which we could  
avoid worrying about on Intel and AMD).


I'm all in favor of such a flag if it would speed up code, but I don't  
see it as the central issue here.


Any more details about the bug you have found would be greatly  
appreciated.


-Travis




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] intersect1d for N input arrays

2009-10-16 Thread Martin Spacek

Robert Cimrman  ntc.zcu.cz> writes:

> 
> Hi Martin,
> 
> thanks for your ideas and contribution.
> 
> A few notes: I would let intersect1d as it is, and created a new function with
another name for that (any
> proposals?). Considering that most of arraysetops functions are based on sort,
and in particular here
> that an intersection array is (usually) smaller than each of the input arrays,
it might be better just to
> call intersect1d repeatedly for each array and the result of the previous
call, accumulating the intersection.
> 
> r.

Hi Robert,

Yeah, I suppose sorting will get progressively slower the more input arrays
there are, and the longer each one gets. There's probably some crossover point
where the cost of doing a Python loop over the input arrays to accumulate the
intersection is less than the cost of doing a big sort. That would take some
benchmarking...

I forgot to handle the cases where the number of arrays passed is 0 or 1. Here's
an updated version:

def intersect1d(arrays, assume_unique=False):
"""Find the intersection of any number of 1D arrays.
Return the sorted, unique values that are in all of the input arrays.
Adapted from numpy.lib.arraysetops.intersect1d"""
N = len(arrays)
if N == 0:
return np.asarray(arrays)
arrays = list(arrays) # allow assignment
if not assume_unique:
for i, arr in enumerate(arrays):
arrays[i] = np.unique(arr)
aux = np.concatenate(arrays) # one long 1D array
aux.sort() # sorted
if N == 1:
return aux
shift = N-1
return aux[aux[shift:] == aux[:-shift]]

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] genfromtxt documentation : review needed

2009-10-16 Thread Pierre GM

On Oct 16, 2009, at 8:29 AM, Skipper Seabold wrote:

> Great work!  I am especially glad to see the better documentation on
> missing values, as I didn't fully understand how to do this.  A few
> small comments and a small attached diff with a few nitpicking
> grammatical changes and some of what's proposed below.

Thanks. I took your modifications into account.

> On the actual function, I am wondering if white space shouldn't be
> stripped by default, or at least if we have fixed width columns.

Well, I'd do the opposite: `autostrip=False` if we work with fixed- 
length delimiters, `autostrip=True` if we work with character  
delimiters.

>  I also can't think of a case where I'd ever care about
> leading or trailing white space.

having `autostrip=False` when dealing with spaces as delimiters is a  
feature that was explicitly requested a while ago, when I started  
working on the function.

> I always get confused going back and forth from zero-indexed to non
> zero-indexed, which might not be a good enough reason to worry about
> this, but it might be helpful to explicitly say that skip_header is
> not zero-indexed, though it doesn't raise an exception if you try.

Took your comment into account, but I did state that `skip_header`  
expects a number of lines, not a line index.

> Also, I don't know if this is even something that should be worried
> about in the io, but recarray names also can't start with a number to
> preserve attribute names look up, but I thought I would bring it up
> anyway, since I ran across this recently.

Good point. I'll patch NameValidator for that.

> I didn't know about being able to specify the dtype as a dict.  That
> might be handy.  Is there any way to cross-link to the dtype
> documentation in rst?  I can't remember.  That might be helpful to
> have.

Hence my call to the doc specialists.

> I never did figure out what the loose keyword did, but I guess it's
> not that important to me if I've never needed it.

Oh yes, this one. Well, a StringConverter can either returns the  
default if it can't convert the string (loose=True) or raise an  
exception if it can't convert the string and the string is not part of  
the missing_values list of this StringConverter (loose=False). I need  
to add a couple of examples here.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Sturla Molden

Francesc Alted skrev:
> The response is clear: avoid memcpy() if you can.  It is true that memcpy() 
> performance has improved quite a lot in latest gcc (it has been quite good in 
> Win versions since many years ago), but working with data in-place (i.e. 
> avoiding a memory copy) is always faster (and most specially for large arrays 
> that don't fit in cache processors).
>
> My own experiments says that, with an Intel Core2 processor the typical speed-
> ups for avoiding memcpy() are 2x. 
If the underlying array is strided, I have seen the opposite as well. 
"Copy-in copy-out" is a common optimization used by Fortran compilers 
when working with strided arrays. The catch is that the work array has 
to fit in cache for this to make any sence. Anyhow, you cannot use 
memcpy for this kind of optimization - it assumes both buffers are 
contiguous. But working with arrays directly instead of copies is not 
always the faster option.

S.M.















>  And I've read somewhere that both AMD and 
> Intel are trying to make unaligned operations to go even faster in next 
> architectures (the goal is that there should be no speed difference in 
> accessing aligned or unaligned data).
>
>   
>> I believe the memcpy approach is used for other unaligned parts of void
>> types. There is an inherent performance penalty there, but I don't see how
>> it can be avoided when using what are essentially packed structures. As to
>> memcpy, it's performance seems to depend on the compiler/compiler version,
>> old versions of gcc had *horrible* implementations of memcpy. I believe the
>> situation has since improved. However, I'm not sure we should be coding to
>> compiler issues unless it is unavoidable or the gain is huge.
>> 
>
> IMO, NumPy can be improved for unaligned data handling.  For example, Numexpr 
> is using this small snippet:
>
> from cpuinfo import cpu
> if cpu.is_AMD() or cpu.is_Intel():
> is_cpu_amd_intel = True
> else:
> is_cpu_amd_intel = False
>
> for detecting AMD/Intel architectures and allowing the code to avoid memcpy() 
> calls for the unaligned arrays.
>
> The above code uses the excellent ``cpuinfo.py`` module from Pearu Peterson, 
> which is distributed under NumPy, so it should not be too difficult to take 
> advantage of this for avoiding unnecessary copies in this scenario.
>
>   

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Michael Droettboom

On 10/16/2009 07:53 AM, Pauli Virtanen wrote:
> Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote:
> [clip]
>
>> IMO, NumPy can be improved for unaligned data handling.  For example,
>> Numexpr is using this small snippet:
>>
>> from cpuinfo import cpu
>> if cpu.is_AMD() or cpu.is_Intel():
>>  is_cpu_amd_intel = True
>> else:
>>  is_cpu_amd_intel = False
>>
>> for detecting AMD/Intel architectures and allowing the code to avoid
>> memcpy() calls for the unaligned arrays.
>>
>> The above code uses the excellent ``cpuinfo.py`` module from Pearu
>> Peterson, which is distributed under NumPy, so it should not be too
>> difficult to take advantage of this for avoiding unnecessary copies in
>> this scenario.
>>  
> I suppose this kind of check is easiest to do at compile-time, and
> defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for
> those architectures for which they are not necessary.
>
>
That's close to the solution I'm arriving at.

I'm thinking of adding a macro "DEREF_UNALIGNED_PYOBJECT_PTR" which 
would do the right thing depending on the type of architecture.  There 
should be no impact on architectures that handle unaligned pointers, and 
slightly slower (but correct) performance on other architectures.

Mike

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] genfromtxt documentation : review needed

2009-10-16 Thread Skipper Seabold

On Thu, Oct 15, 2009 at 7:08 PM, Pierre GM  wrote:
> All,
> Here's a first draft for the documentation of np.genfromtxt.
> It took me longer than I thought, but that way I uncovered and fix some
> bugs.
> Please send me your comments/reviews/etc
> I count especially on our documentation specialist to let me know where to
> put it.
> Thx in advance
> P.
>

Great work!  I am especially glad to see the better documentation on
missing values, as I didn't fully understand how to do this.  A few
small comments and a small attached diff with a few nitpicking
grammatical changes and some of what's proposed below.

On the actual function, I am wondering if white space shouldn't be
stripped by default, or at least if we have fixed width columns.  I
ran into a problem recently, where I was reading in a lot of strings
that were in a fixed width format and my 4 gb of memory were soon
consumed.  I also can't think of a case where I'd ever care about
leading or trailing white space.

I always get confused going back and forth from zero-indexed to non
zero-indexed, which might not be a good enough reason to worry about
this, but it might be helpful to explicitly say that skip_header is
not zero-indexed, though it doesn't raise an exception if you try.

data = "junk1,junk2,junk3\n1.2,1.5,1"
from StringIO import StringIO
import numpy as np
d = np.genfromtxt(StringIO(data), delimiter=",", skip_header=0)

In [5]: d
Out[5]:
array([[ NaN,  NaN,  NaN],
   [ 1.2,  1.5,  1. ]])

d = np.genfromtxt(StringIO(data), delimiter=",", skip_header=1)

In [7]: d
Out[7]: array([ 1.2,  1.5,  1. ])

d = np.genfromtxt(StringIO(data), delimiter=",", skip_header=-1)

In [9]: d
Out[9]:
array([[ NaN,  NaN,  NaN],
   [ 1.2,  1.5,  1. ]])

Also, I don't know if this is even something that should be worried
about in the io, but recarray names also can't start with a number to
preserve attribute names look up, but I thought I would bring it up
anyway, since I ran across this recently.

data = "1var1,var2,var3\n1.2,1.5,1"
d = np.recfromtxt(StringIO(data), dtype=float, delimiter=",", names=True)

In [36]: d
Out[36]:
rec.array((1.2, 1.5, 1.0),
  dtype=[('1var1', '", line 1
 d.1var1
   ^
SyntaxError: invalid syntax

In [38]: d.var2
Out[38]: array(1.5)

In [39]: d['1var1']
Out[39]: array(1.2)

I didn't know about being able to specify the dtype as a dict.  That
might be handy.  Is there any way to cross-link to the dtype
documentation in rst?  I can't remember.  That might be helpful to
have.

I never did figure out what the loose keyword did, but I guess it's
not that important to me if I've never needed it.

Cheers,

Skipper
57c57
< By default, :func:`genfromtxt` assumes ``delimiter=None``, meaning that the 
line is splitted along white-spaces (including tabs) and that consecutive 
white-spaces are considered as a single white-space.
---
> By default, :func:`genfromtxt` assumes ``delimiter=None``, meaning that the 
> line is split along white spaces (including tabs) and that consecutive white 
> spaces are considered as a single white space.
76c76
< By default, when a line is decomposed into a series of strings, the 
individual entries are not stripped of leading or tailing white spaces.
---
> By default, when a line is decomposed into a series of strings, the 
> individual entries are not stripped of leading or trailing white spaces.
129c129
< The values of this argument must be an integer which corresponds to the 
number of lines to skip at the beginning of the file, before any other action 
is performed.
---
> The values of this argument must be an integer which corresponds to the 
> number of lines to skip at the beginning of the file, before any other action 
> is performed.  Note that this is not zero-indexed so that the first line is 1.
147c147
< Acceptable values for the argument are a single integer or a sequence of 
integers corresponding to the indices of the columns to import.
---
> An acceptable values for the argument is a single integer or a sequence of 
> integers corresponding to the indices of the columns to import.
195c195
< This behavior may be changed by modifying the default mapper of the 
:class:`~numpi.lib._iotools.StringConverter` class
---
> This behavior may be changed by modifying the default mapper of the 
> :class:`~numpy.lib._iotools.StringConverter` class
343c343
< .. However, user-defined converters may rapidly become cumbersome to manage 
when
---
> .. However, user-defined converters may rapidly become cumbersome to manage.
389c389
<   Each key can be a column index or a column name, and the corresponding 
value should eb a single object.
---
>   Each key can be a column index or a column name, and the corresponding 
> value should be a single object.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Francesc Alted

A Friday 16 October 2009 14:02:03 David Cournapeau escrigué:
> On Fri, Oct 16, 2009 at 8:53 PM, Pauli Virtanen  wrote:
> > Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote:
> > [clip]
> >
> >> IMO, NumPy can be improved for unaligned data handling.  For example,
> >> Numexpr is using this small snippet:
> >>
> >> from cpuinfo import cpu
> >> if cpu.is_AMD() or cpu.is_Intel():
> >>     is_cpu_amd_intel = True
> >> else:
> >>     is_cpu_amd_intel = False
> >>
> >> for detecting AMD/Intel architectures and allowing the code to avoid
> >> memcpy() calls for the unaligned arrays.
> >>
> >> The above code uses the excellent ``cpuinfo.py`` module from Pearu
> >> Peterson, which is distributed under NumPy, so it should not be too
> >> difficult to take advantage of this for avoiding unnecessary copies in
> >> this scenario.
> >
> > I suppose this kind of check is easiest to do at compile-time, and
> > defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for
> > those architectures for which they are not necessary.
>
> I wonder whether we could switch at runtime (import time) - it could
> be useful for testing.
>
> That being said, I agree that the cpu checks should be done at compile
> time - we had quite a few problems with cpuinfo in the past with new
> cpu/unhandled cpu, I think a compilation-based method is much more
> robust (and simpler) here. There are things where C is just much
> easier than python :)

Agreed.  I'm relaying in ``cpuinfo.py`` just because it provides what I need 
in an easy way.  BTW, the detection of AMD/Intel (just the vendor) processors 
seems to work flawlessly for the platforms that I've checked (but I suppose 
that you are talking about other characteristics, like SSE version, etc).

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread David Cournapeau

On Fri, Oct 16, 2009 at 8:53 PM, Pauli Virtanen  wrote:
> Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote:
> [clip]
>> IMO, NumPy can be improved for unaligned data handling.  For example,
>> Numexpr is using this small snippet:
>>
>> from cpuinfo import cpu
>> if cpu.is_AMD() or cpu.is_Intel():
>>     is_cpu_amd_intel = True
>> else:
>>     is_cpu_amd_intel = False
>>
>> for detecting AMD/Intel architectures and allowing the code to avoid
>> memcpy() calls for the unaligned arrays.
>>
>> The above code uses the excellent ``cpuinfo.py`` module from Pearu
>> Peterson, which is distributed under NumPy, so it should not be too
>> difficult to take advantage of this for avoiding unnecessary copies in
>> this scenario.
>
> I suppose this kind of check is easiest to do at compile-time, and
> defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for
> those architectures for which they are not necessary.

I wonder whether we could switch at runtime (import time) - it could
be useful for testing.

That being said, I agree that the cpu checks should be done at compile
time - we had quite a few problems with cpuinfo in the past with new
cpu/unhandled cpu, I think a compilation-based method is much more
robust (and simpler) here. There are things where C is just much
easier than python :)

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Pauli Virtanen

Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote:
[clip]
> IMO, NumPy can be improved for unaligned data handling.  For example,
> Numexpr is using this small snippet:
> 
> from cpuinfo import cpu
> if cpu.is_AMD() or cpu.is_Intel():
> is_cpu_amd_intel = True
> else:
> is_cpu_amd_intel = False
> 
> for detecting AMD/Intel architectures and allowing the code to avoid
> memcpy() calls for the unaligned arrays.
> 
> The above code uses the excellent ``cpuinfo.py`` module from Pearu
> Peterson, which is distributed under NumPy, so it should not be too
> difficult to take advantage of this for avoiding unnecessary copies in
> this scenario.

I suppose this kind of check is easiest to do at compile-time, and 
defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for 
those architectures for which they are not necessary.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Francesc Alted

A Thursday 15 October 2009 19:00:04 Charles R Harris escrigué:
> > So, how to fix this?
> >
> > One obvious workaround is for users to pass "align=True" to the dtype
> > constructor.  This works if the dtype descriptor is a dictionary or
> > comma-separated string.  Is there a reason it couldn't be made to work
> > with the string-of-tuples form that I'm missing?  It would be marginally
> > more convenient from my application, but that's just a finesse issue.
> >
> > However, perhaps we should try to fix the underlying alignment
> > problems?  Unfortunately, it's not clear to me how to resolve them
> > without at least some performance penalty.  You either do an alignment
> > check of the pointer, and then memcpy if unaligned, or just always use
> > memcpy.  Not sure which is faster, as memcpy may have a fast path
> > already. These are object arrays anyway, so there's plenty of overhead
> > already, and I don't think this would affect regular numerical arrays.

The response is clear: avoid memcpy() if you can.  It is true that memcpy() 
performance has improved quite a lot in latest gcc (it has been quite good in 
Win versions since many years ago), but working with data in-place (i.e. 
avoiding a memory copy) is always faster (and most specially for large arrays 
that don't fit in cache processors).

My own experiments says that, with an Intel Core2 processor the typical speed-
ups for avoiding memcpy() are 2x.  And I've read somewhere that both AMD and 
Intel are trying to make unaligned operations to go even faster in next 
architectures (the goal is that there should be no speed difference in 
accessing aligned or unaligned data).

> I believe the memcpy approach is used for other unaligned parts of void
> types. There is an inherent performance penalty there, but I don't see how
> it can be avoided when using what are essentially packed structures. As to
> memcpy, it's performance seems to depend on the compiler/compiler version,
> old versions of gcc had *horrible* implementations of memcpy. I believe the
> situation has since improved. However, I'm not sure we should be coding to
> compiler issues unless it is unavoidable or the gain is huge.

IMO, NumPy can be improved for unaligned data handling.  For example, Numexpr 
is using this small snippet:

from cpuinfo import cpu
if cpu.is_AMD() or cpu.is_Intel():
is_cpu_amd_intel = True
else:
is_cpu_amd_intel = False

for detecting AMD/Intel architectures and allowing the code to avoid memcpy() 
calls for the unaligned arrays.

The above code uses the excellent ``cpuinfo.py`` module from Pearu Peterson, 
which is distributed under NumPy, so it should not be too difficult to take 
advantage of this for avoiding unnecessary copies in this scenario.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] intersect1d for N input arrays

2009-10-16 Thread Robert Cimrman

Hi Martin,

thanks for your ideas and contribution.

A few notes: I would let intersect1d as it is, and created a new function with 
another name for that (any proposals?). Considering that most of arraysetops 
functions are based on sort, and in particular here that an intersection array 
is (usually) smaller than each of the input arrays, it might be better just to 
call intersect1d repeatedly for each array and the result of the previous call, 
accumulating the intersection.

r.

Martin Spacek wrote:
> I have a list of many arrays (in my case each is unique, ie has no repeated
> elements), and I'd like to extract the intersection of all of them, all in one
> go. I'm running numpy 1.3.0, but looking at today's rev of 
> numpy.lib.arraysetops
> (http://svn.scipy.org/svn/numpy/trunk/numpy/lib/arraysetops.py), I see
> intersect1d has changed. Just a note: the example used in the docstring 
> implies
> that the two arrays need to be the same length, which isn't the case. Maybe it
> would be good to change the example to two arrays of different lengths.
> 
> intersect1d takes exactly 2 arrays. I've modified it a little to take the
> intersection of any number of 1D arrays (of any length), in a list or tuple. 
> It
> seems to work fine, but could use more testing. Here it is with most of the 
> docs
> stripped. Feel free to use it, although I suppose for symmetry, many of the
> other functions in arraysetops.py would also have to be modified to work with 
> N
> arrays:
> 
> 
> def intersect1d(arrays, assume_unique=False):
> """Find the intersection of any number of 1D arrays.
> Return the sorted, unique values that are in all of the input arrays.
> Adapted from numpy.lib.arraysetops.intersect1d"""
> N = len(arrays)
> arrays = list(arrays) # allow assignment
> if not assume_unique:
> for i, arr in enumerate(arrays):
> arrays[i] = np.unique(arr)
> aux = np.concatenate(arrays) # one long 1D array
> aux.sort() # sorted
> shift = N-1
> return aux[aux[shift:] == aux[:-shift]]
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Google Groups archive?

2009-10-16 Thread Pauli Virtanen

Fri, 16 Oct 2009 00:23:34 -0400, josef.pktd wrote:
[clip]
>> This seems exceedingly odd. Does anyone know _how_ we violated the ToS?
> 
> adult material on front page
> 
> Who's the owner? Creating a new group would require a different name,
> since the old name is blocked, I tried.

Maybe it's best just not to use Google Groups. IMO, gmane.org offers an 
equivalent if not superior service.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] object array alignment issues

Re: [Numpy-discussion] object array alignment issues

Re: [Numpy-discussion] intersect1d for N input arrays

Re: [Numpy-discussion] genfromtxt documentation : review needed

Re: [Numpy-discussion] object array alignment issues

Re: [Numpy-discussion] object array alignment issues

Re: [Numpy-discussion] genfromtxt documentation : review needed

Re: [Numpy-discussion] object array alignment issues

Re: [Numpy-discussion] object array alignment issues

Re: [Numpy-discussion] object array alignment issues

Re: [Numpy-discussion] object array alignment issues

Re: [Numpy-discussion] intersect1d for N input arrays

Re: [Numpy-discussion] Google Groups archive?

13 matches

Site Navigation

Mail list logo

Footer information