Re: [Numpy-discussion] How to get Boolean matrix for similar lists in two different-size numpy arrays of lists

2021-03-14 Thread zoj613
The following seems to produce what you want using the data provided

```
In [31]: dF = np.genfromtxt('/home/F.csv', delimiter=',').tolist()

In [32]: dS = np.genfromtxt('/home/S.csv', delimiter=',').tolist()

In [33]: r =  [True if i in lS else False for i in dF]

In [34]: sum(r)

Out[34]: 300
```

I hope this helps.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to get Boolean matrix for similar lists in two different-size numpy arrays of lists

2021-03-14 Thread Andras Deak
On Sun, Mar 14, 2021 at 8:35 PM Robert Kern  wrote:
>
> On Sun, Mar 14, 2021 at 3:06 PM Ali Sheikholeslam 
>  wrote:
>>
>> I have written a question in:
>> https://stackoverflow.com/questions/66623145/how-to-get-boolean-matrix-for-similar-lists-in-two-different-size-numpy-arrays-o
>> It was recommended by numpy to send this subject to the mailing lists.
>>
>> The question is as follows. I would be appreciated if you could advise me to 
>> solve the problem:
>>
>> At first, I write a small example of to lists:
>>
>> F = [[1,2,3],[3,2,7],[4,4,1],[5,6,3],[1,3,7]]  # (1*5) 5 lists
>> S = [[1,3,7],[6,8,1],[3,2,7]]  # (1*3) 3 lists
>>
>> I want to get Boolean matrix for the same 'list's in two F and S:
>>
>> [False, True, False, False, True]  #  (1*5)5 
>> Booleans for 5 lists of F
>>
>> By using IM = reduce(np.in1d, (F, S)) it gives results for each number in 
>> each lists of F:
>>
>> [ True  True  True  True  True  True False False  True False  True  True
>>   True  True  True]   # (1*15)
>>
>> By using IM = reduce(np.isin, (F, S)) it gives results for each number in 
>> each lists of F, too, but in another shape:
>>
>> [[ True  True  True]
>>  [ True  True  True]
>>  [False False  True]
>>  [False  True  True]
>>  [ True  True  True]]   # (5*3)
>>
>> The true result will be achieved by code IM = [i in S for i in F] for the 
>> example lists, but when I'm using this code for my two main bigger numpy 
>> arrays of lists:
>>
>> https://drive.google.com/file/d/1YUUdqxRu__9-fhE1542xqei-rjB3HOxX/view?usp=sharing
>>
>> numpy array: 3036 lists
>>
>> https://drive.google.com/file/d/1FrggAa-JoxxoRqRs8NVV_F69DdVdiq_m/view?usp=sharing
>>
>> numpy array: 300 lists
>>
>> It gives wrong answer. For the main files it must give 3036 Boolean, in 
>> which 'True' is only 300 numbers. I didn't understand why this get wrong 
>> answers?? It seems it applied only on the 3rd characters in each lists of F. 
>> It is preferred to use reduce function by the two functions, np.in1d and 
>> np.isin, instead of the last method. How could to solve each of the three 
>> above methods??
>
>
> Thank you for providing the data. Can you show a complete, runnable code 
> sample that fails? There are several things that could go wrong here, and we 
> can't be sure which is which without the exact code that you ran.
>
> In general, you may well have problems with the floating point data that you 
> are not seeing with your integer examples.
>
> FWIW, I would continue to use something like the `IM = [i in S for i in F]` 
> list comprehension for data of this size.

Although somewhat off-topic for the numpy aspect, for completeness'
sake let me add that you'll probably want to first turn your list of
lists `S` into a set of tuples, and then look up each list in `F`
converted to a tuple (`[tuple(lst) in setified_S for lst in F]`). That
would probably be a lot faster for large lists.

András



You aren't getting any benefit trying to convert to arrays and using
our array set operations. They are written for 1D arrays of numbers,
not 2D arrays (attempting to treat them as 1D arrays of lists) and
won't really work on your data.
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to get Boolean matrix for similar lists in two different-size numpy arrays of lists

2021-03-14 Thread Robert Kern
On Sun, Mar 14, 2021 at 3:06 PM Ali Sheikholeslam <
sheikholeslam@gmail.com> wrote:

> I have written a question in:
>
> https://stackoverflow.com/questions/66623145/how-to-get-boolean-matrix-for-similar-lists-in-two-different-size-numpy-arrays-o
> It was recommended by numpy to send this subject to the mailing lists.
>
> The question is as follows. I would be appreciated if you could advise me
> to solve the problem:
>
> At first, I write a small example of to lists:
>
> F = [[1,2,3],[3,2,7],[4,4,1],[5,6,3],[1,3,7]]  # (1*5) 5 lists
> S = [[1,3,7],[6,8,1],[3,2,7]]  # (1*3) 3 lists
>
> I want to get Boolean matrix for the same 'list's in two F and S:
>
> [False, True, False, False, True]  #  (1*5)5 Booleans 
> for 5 lists of F
>
> By using IM = reduce(np.in1d, (F, S)) it gives results for each number in
> each lists of F:
>
> [ True  True  True  True  True  True False False  True False  True  True
>   True  True  True]   # (1*15)
>
> By using IM = reduce(np.isin, (F, S)) it gives results for each number in
> each lists of F, too, but in another shape:
>
> [[ True  True  True]
>  [ True  True  True]
>  [False False  True]
>  [False  True  True]
>  [ True  True  True]]   # (5*3)
>
> The true result will be achieved by code IM = [i in S for i in F] for the
> example lists, but when I'm using this code for my two main bigger numpy
> arrays of lists:
>
>
> https://drive.google.com/file/d/1YUUdqxRu__9-fhE1542xqei-rjB3HOxX/view?usp=sharing
>
> numpy array: 3036 lists
>
>
> https://drive.google.com/file/d/1FrggAa-JoxxoRqRs8NVV_F69DdVdiq_m/view?usp=sharing
>
> numpy array: 300 lists
>
> It gives wrong answer. For the main files it must give 3036 Boolean, in
> which 'True' is only 300 numbers. I didn't understand why this get wrong
> answers?? It seems it applied only on the 3rd characters in each lists of
> F. It is preferred to use reduce function by the two functions, np.in1d and
> np.isin, instead of the last method. How could to solve each of the three
> above methods??
>

Thank you for providing the data. Can you show a complete, runnable code
sample that fails? There are several things that could go wrong here, and
we can't be sure which is which without the exact code that you ran.

In general, you may well have problems with the floating point data that
you are not seeing with your integer examples.

FWIW, I would continue to use something like the `IM = [i in S for i in F]`
list comprehension for data of this size. You aren't getting any benefit
trying to convert to arrays and using our array set operations. They are
written for 1D arrays of numbers, not 2D arrays (attempting to treat them
as 1D arrays of lists) and won't really work on your data.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] How to get Boolean matrix for similar lists in two different-size numpy arrays of lists

2021-03-14 Thread Ali Sheikholeslam
I have written a question in:
https://stackoverflow.com/questions/66623145/how-to-get-boolean-matrix-for-similar-lists-in-two-different-size-numpy-arrays-o
It was recommended by numpy to send this subject to the mailing lists.

The question is as follows. I would be appreciated if you could advise me
to solve the problem:

At first, I write a small example of to lists:

F = [[1,2,3],[3,2,7],[4,4,1],[5,6,3],[1,3,7]]  # (1*5) 5 lists
S = [[1,3,7],[6,8,1],[3,2,7]]  # (1*3) 3 lists

I want to get Boolean matrix for the same 'list's in two F and S:

[False, True, False, False, True]  #  (1*5)5
Booleans for 5 lists of F

By using IM = reduce(np.in1d, (F, S)) it gives results for each number in
each lists of F:

[ True  True  True  True  True  True False False  True False  True  True
  True  True  True]   # (1*15)

By using IM = reduce(np.isin, (F, S)) it gives results for each number in
each lists of F, too, but in another shape:

[[ True  True  True]
 [ True  True  True]
 [False False  True]
 [False  True  True]
 [ True  True  True]]   # (5*3)

The true result will be achieved by code IM = [i in S for i in F] for the
example lists, but when I'm using this code for my two main bigger numpy
arrays of lists:

https://drive.google.com/file/d/1YUUdqxRu__9-fhE1542xqei-rjB3HOxX/view?usp=sharing

numpy array: 3036 lists

https://drive.google.com/file/d/1FrggAa-JoxxoRqRs8NVV_F69DdVdiq_m/view?usp=sharing

numpy array: 300 lists

It gives wrong answer. For the main files it must give 3036 Boolean, in
which 'True' is only 300 numbers. I didn't understand why this get wrong
answers?? It seems it applied only on the 3rd characters in each lists of
F. It is preferred to use reduce function by the two functions, np.in1d and
np.isin, instead of the last method. How could to solve each of the three
above methods??
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran

2021-03-14 Thread Sebastian Berg
On Sun, 2021-03-14 at 17:15 +1100, Juan Nunez-Iglesias wrote:
> Hi Pierre,
> 
> If you’re able to compile NumPy locally and you have reliable
> benchmarks, you can write a script that tests the runtime of your
> benchmark and reports it as a test pass/fail. You can then use “git
> bisect run” to automatically find the commit that caused the issue.
> That will help narrow down the discussion before it gets completely
> derailed a second time. 
> 
> https://lwn.net/Articles/317154/


Let me share this partial benchmark result for branch I just worked on
in NumPy:

   before   after ratio
 [c5de5b5c]   [2d9e11ea]

+ 2.12±0.01μs  3.69±0.02μs 1.74  
bench_io.Copy.time_cont_assign('float32')
+ 22.6±0.08μs   36.0±0.2μs 1.59  bench_io.CopyTo.time_copyto_sparse
+  49.4±0.8μs   55.2±0.1μs 1.12  
bench_io.CopyTo.time_copyto_8_sparse
- 7.40±0.06μs  4.11±0.01μs 0.56  bench_io.CopyTo.time_copyto_dense
- 6.99±0.05μs 3.77±0μs 0.54  
bench_io.Copy.time_cont_assign('float64')
- 6.94±0.02μs  3.73±0.01μs 0.54  
bench_io.Copy.time_cont_assign('complex64')


That looks weird!  The benchmark sometimes speeds up by a factor of
almost 2, and sometimes the (de-facto) same code slows down by just as
much? (Focus on the `time_cont_assign` with float64 vs. float32).

Even better: I know 100% that no related code is touched!  The core of
that benchmark is just:

 arrya[...] = 1

and I did not even come close to any code related to that operation.


I have, as I did before, tried quite a few things (not as much as in
Victor Stinner's blog when it comes to compiler flags).
Such as enabling/disabling huge-pages, disabling address-space-
randomization (and disabling the NumPy small-array cache).

Note that the results are *stable*, as in: On this branch, I get
extremely reliable results for the benchmark [1]!


As you noticed, I have also seen these (or similar) changes "toggle"
e.g. when copying the array multiple times.  And I have dug down into
profiling one instance on the instruction level with `perf` so I know
for a fact that it is memory access speed.  (Which is a no-brainer
here, the operations are obviously memory or even cache speed bound.)


The point I was hoping to make is: Its complicated, and I am not
holding my breath that you can find an answer without digging much
deeper.
The blog post from Victor Stinner gave me the thought that profile-
guided-optimization *might* be a way to avoid some random fluctuations,
but I have not checked that the inner-loop for the code actually
compiles to different byte-code.


I would hope that someone comes along and "just knows" what is going
on. But, I don't know where to ask or what to google for.

My best bets right now (they may be terrible!) are:

* Profiler guided optimization might help (as in stabilize compiler
output due to *random* changes in code).  Which probably is involved in
some way or another.  But Victor Stinner's timed Python and that may
not have any massively memory bound operations (which are the "big"
things here).

* Maybe try to make the NumPy allocator align all its allocation to
much larger boundaries, such as the CPU cache-line size.  But I think I
tried to check whether alignment seems to matter, and it didn't.  Also,
the arrays feel large enough that it shouldn't matter?

* CPU caching L1/L2 uses a lot of fancy heuristics these days. Maybe to
really understand whats going on, you would have to drill into what the
CPU caches are doing here?


The only thing I do know for sure currently, is that it is a rabbit
hole that I would love to understand, but don't really want to spend
days just to get nowhere.

Cheers,

Sebastian



[1] That run above is without address space randomization, it feels
even more stable than the others.  But that doesn't matter, since we
average in any case, so ASR is probably useless and maybe even
detrimental.


> 
> Juan. 
> 
> > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER <
> > pierre.aug...@univ-grenoble-alpes.fr> wrote:
> > 
> > Hi,
> > 
> > I tried to compile Numpy with `pip install numpy==1.20.1 --no-
> > binary numpy --force-reinstall` and I can reproduce the regression.
> > 
> > Good news, I was able to reproduce the difference with only Numpy
> > 1.20.1. 
> > 
> > Arrays prepared with (`df` is a Pandas dataframe)
> > 
> > arr = df.values.copy()
> > 
> > or 
> > 
> > arr = np.ascontiguousarray(df.values)
> > 
> > lead to "slow" execution while arrays prepared with
> > 
> > arr = np.copy(df.values)
> > 
> > lead to faster execution.
> > 
> > arr.copy() or np.copy(arr) do not give the same result, with arr
> > obtained from a Pandas dataframe with arr = df.values. It's strange
> > because type(df.values) gives  so I would
> > expect arr.copy() and np.copy(arr) to give exactly the same result.
> > 
> > Note that I think I'm doing quite serious and reproducible
> > benchmarks. I also checked that this 

[Numpy-discussion] Pi day easter egg

2021-03-14 Thread Neal Becker
There's a little pi day easter egg for all math fans.  Google for pi to
find it.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.20.1 availability

2021-03-14 Thread Ralf Gommers
On Sun, Mar 14, 2021 at 11:14 AM Peter Cock 
wrote:

> I would recommend using the community run conda-forge as one of your
> default conda channels. They have a very slick largely automated system
> to update recipes when upstream makes a release. The default Anaconda
> channel from Anaconda, Inc. (formerly Continuum Analytics, Inc.) is
> comparatively slow.
>

Agreed. I know the goal of the maintainers of the defaults channel is to
make the latest version available quickly. However, `defaults` requires
more integration testing than conda-forge/PyPI, and work tends to happen in
batches - in the past we've seen update times ranging from days to several
months.

We have some guidance at https://numpy.org/install/. Basically the two main
reasons to use `defaults`: for beginning users with modest needs, the
easiest thing to get started is just installing the Anaconda distribution
(which gives you `defaults`). Or you have corporate policies to use
`defaults` - you can pay Anaconda and it does come with things companies
and institutions may need, like guarantees around uptime and security.

Cheers,
Ralf


> You may recognise some of the maintainers of the conda-forge numpy
> recipe? https://github.com/conda-forge/numpy-feedstock/
>
> I'm impressed to see 17 million conda-forge numpy downloads, vs
> 'just' 2.5 million downloads of the default channel's package:
>
> https://anaconda.org/conda-forge/numpy
> https://anaconda.org/anaconda/numpy
>
> Regards,
>
> Peter
>
> On Sun, Mar 14, 2021 at 8:06 AM Matti Picus  wrote:
> >
> > On 3/14/21 6:12 AM, dan_patterson wrote:
> >
> > Any idea why the most recent version isn't available on the main anaconda
> > channel.  conda-forge and building are not options for a number of
> reasons.
> > I posted a package request there but double digit days have gone by it
> just
> > got a thumbs up and package-request tag
> > https://github.com/ContinuumIO/anaconda-issues/issues/12309
> > I realize it could be the "times" or maybe no one is aware of its
> absence.
> >
> >
> > NumPy does not control the packages on the main anaconda channel, so a
> request here is likely to go unanswered. The package has been updated in
> the conda-forge channel.
> >
> >
> > Matti
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.20.1 availability

2021-03-14 Thread dan_patterson
Thanks, glad to hear that people are aware of the delay.
As I said, there are other reasons beyond my control, for the limitations.
The wait is on.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.20.1 availability

2021-03-14 Thread Peter Cock
I would recommend using the community run conda-forge as one of your
default conda channels. They have a very slick largely automated system
to update recipes when upstream makes a release. The default Anaconda
channel from Anaconda, Inc. (formerly Continuum Analytics, Inc.) is
comparatively slow.

You may recognise some of the maintainers of the conda-forge numpy
recipe? https://github.com/conda-forge/numpy-feedstock/

I'm impressed to see 17 million conda-forge numpy downloads, vs
'just' 2.5 million downloads of the default channel's package:

https://anaconda.org/conda-forge/numpy
https://anaconda.org/anaconda/numpy

Regards,

Peter

On Sun, Mar 14, 2021 at 8:06 AM Matti Picus  wrote:
>
> On 3/14/21 6:12 AM, dan_patterson wrote:
>
> Any idea why the most recent version isn't available on the main anaconda
> channel.  conda-forge and building are not options for a number of reasons.
> I posted a package request there but double digit days have gone by it just
> got a thumbs up and package-request tag
> https://github.com/ContinuumIO/anaconda-issues/issues/12309
> I realize it could be the "times" or maybe no one is aware of its absence.
>
>
> NumPy does not control the packages on the main anaconda channel, so a 
> request here is likely to go unanswered. The package has been updated in the 
> conda-forge channel.
>
>
> Matti
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.20.1 availability

2021-03-14 Thread Matti Picus

  
  
On 3/14/21 6:12 AM, dan_patterson wrote:


  Any idea why the most recent version isn't available on the main anaconda
channel.  conda-forge and building are not options for a number of reasons.  
I posted a package request there but double digit days have gone by it just
got a thumbs up and package-request tag
https://github.com/ContinuumIO/anaconda-issues/issues/12309
I realize it could be the "times" or maybe no one is aware of its absence.





NumPy does not control the packages on the main anaconda channel,
  so a request here is likely to go unanswered. The package has been
  updated in the conda-forge channel.



Matti

  

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion