Re: [Numpy-discussion] GSoC 2017: NumFocus will be an umbrella organization

2017-01-18 Thread Max Linke



On 01/18/2017 09:28 AM, Ralf Gommers wrote:

Hi Max,

On Tue, Jan 17, 2017 at 2:38 AM, Max Linke > wrote:

Hi

Organizations can start submitting applications for Google Summer of
Code 2017 on January 19 (and the deadline is February 9)

https://developers.google.com/open-source/gsoc/timeline?hl=en



Thanks for bringing this up, and for organizing the NumFOCUS
participation!


NumFOCUS will be applying again this year. If you want to work with
us please let me know and if you apply as an organization yourself
or under a different umbrella organization please tell me as well.


I suspect we won't participate at all, but if we do then it's likely
under the PSF umbrella as we have done previously.


Thanks for letting me now. If you decide to participate with the PSF
please write me a private mail so that I can update the NumFOCUS gsoc
page accordingly.



@all: in practice working on NumPy is just far too hard for most
GSoC students. Previous years we've registered and generated ideas,
but not gotten any students. We're also short on maintainer capacity.
So I propose to not participate this year.

Ralf



___ NumPy-Discussion
mailing list NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-18 Thread Nadav Har'El
On Wed, Jan 18, 2017 at 4:30 PM,  wrote:

>
>
> Having more sampling schemes would be useful, but it's not possible to
>> implement sampling schemes with impossible properties.
>>
>>
>
> BTW: sampling 3 out of 3 without replacement is even worse
>
> No matter what sampling scheme and what selection probabilities we use, we
> always have every element with probability 1 in the sample.
>

I agree. The random-sample function of the type I envisioned will be able
to reproduce the desired probabilities in some cases (like the example I
gave) but not in others. Because doing this correctly involves a set of n
linear equations in comb(n,k) variables, it can have no solution, or many
solutions, depending on the n and k, and the desired probabilities. A
function of this sort could return an error if it can't achieve the desired
probabilities.

But in many cases (the 0.2, 0.4, 0.4 example I gave was just something
random I tried) there will be a way to achieve exactly the desired
distribution.

I guess I'll need to write this new function myself :-) Because my use case
definitely requires that the output of the random items produced matches
the required probabilities (when possible).

Thanks,
Nadav.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-18 Thread josef . pktd
On Wed, Jan 18, 2017 at 8:53 AM,  wrote:

>
>
> On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El  wrote:
>
>>
>> On Wed, Jan 18, 2017 at 11:00 AM, aleba...@gmail.com 
>> wrote:
>>
>>> Let's look at what the user asked this function, and what it returns:
>>>

 User asks: please give me random pairs of the three items, where item 1
 has probability 0.2, item 2 has 0.4, and 3 has 0.4.

 Function returns: random pairs, where if you make many random returned
 results (as in the law of large numbers) and look at the items they
 contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
 0.38333.
 These are not (quite) the probabilities the user asked for...

 Can you explain a sense where the user's requested probabilities (0.2,
 0.4, 0.4) are actually adhered in the results which random.choice returns?

>>>
>>> I think that the question the user is asking by specifying p is a
>>> slightly different one:
>>>  "please give me random pairs of the three items extracted from a
>>> population of 3 items where item 1 has probability of being extracted of
>>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
>>> extracted."
>>>
>>
>> You are right, if that is what the user wants, numpy.random.choice does
>> the right thing.
>>
>> I'm just wondering whether this is actually what users want, and whether
>> they understand this is what they are getting.
>>
>> As I said, I expected it to generate pairs with, empirically, the desired
>> distribution of individual items. The documentation of numpy.random.choice
>> seemed to me (wrongly) that it implis that that's what it does. So I was
>> surprised to realize that it does not.
>>
>
> As Alessandro and you showed, the function returns something that makes
> sense. If the user wants something different, then they need to look for a
> different function, which is however difficult if it doesn't have a
> solution in general.
>
> Sounds to me a bit like a Monty Hall problem. Whether we like it or not,
> or find it counter intuitive, it is what it is given the sampling scheme.
>
> Having more sampling schemes would be useful, but it's not possible to
> implement sampling schemes with impossible properties.
>

BTW: sampling 3 out of 3 without replacement is even worse

No matter what sampling scheme and what selection probabilities we use, we
always have every element with probability 1 in the sample.


(Which in survey statistics implies that the sampling error or standard
deviation of any estimate of a population mean or total is zero. Which I
found weird. How can you do statistics and get an estimate that doesn't
have any uncertainty associated with it?)

Josef



>
> Josef
>
>
>
>>
>> Nadav.
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-18 Thread josef . pktd
On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El  wrote:

>
> On Wed, Jan 18, 2017 at 11:00 AM, aleba...@gmail.com 
> wrote:
>
>> Let's look at what the user asked this function, and what it returns:
>>
>>>
>>> User asks: please give me random pairs of the three items, where item 1
>>> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>>>
>>> Function returns: random pairs, where if you make many random returned
>>> results (as in the law of large numbers) and look at the items they
>>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
>>> 0.38333.
>>> These are not (quite) the probabilities the user asked for...
>>>
>>> Can you explain a sense where the user's requested probabilities (0.2,
>>> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>>>
>>
>> I think that the question the user is asking by specifying p is a
>> slightly different one:
>>  "please give me random pairs of the three items extracted from a
>> population of 3 items where item 1 has probability of being extracted of
>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
>> extracted."
>>
>
> You are right, if that is what the user wants, numpy.random.choice does
> the right thing.
>
> I'm just wondering whether this is actually what users want, and whether
> they understand this is what they are getting.
>
> As I said, I expected it to generate pairs with, empirically, the desired
> distribution of individual items. The documentation of numpy.random.choice
> seemed to me (wrongly) that it implis that that's what it does. So I was
> surprised to realize that it does not.
>

As Alessandro and you showed, the function returns something that makes
sense. If the user wants something different, then they need to look for a
different function, which is however difficult if it doesn't have a
solution in general.

Sounds to me a bit like a Monty Hall problem. Whether we like it or not, or
find it counter intuitive, it is what it is given the sampling scheme.

Having more sampling schemes would be useful, but it's not possible to
implement sampling schemes with impossible properties

Josef



>
> Nadav.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.12.0 release

2017-01-18 Thread Nathaniel Smith
On Wed, Jan 18, 2017 at 3:43 AM, Julian Taylor
 wrote:
> The version of gcc used will make a large difference in some places.
> E.g. the AVX2 integer ufuncs require something around 4.5 to work and in
> general the optimization level of gcc has improved greatly since the
> clang competition showed up around that time. centos 5 has 4.1 which is
> really ancient.
> I though the wheels used newer gccs also on centos 5?

The wheels are built with gcc 4.8, which is the last version that you
can get to build for centos 5.

When we bump to centos 6 as the minimum supported, we'll be able to
switch to gcc 5.3.1.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.12.0 release

2017-01-18 Thread David Cournapeau
On Wed, Jan 18, 2017 at 11:43 AM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:

> The version of gcc used will make a large difference in some places.
> E.g. the AVX2 integer ufuncs require something around 4.5 to work and in
> general the optimization level of gcc has improved greatly since the
> clang competition showed up around that time. centos 5 has 4.1 which is
> really ancient.
> I though the wheels used newer gccs also on centos 5?
>

I don't know if it is mandatory for many wheels, but it is possilbe to
build w/ gcc 4.8 at least, and still binary compatibility with centos 5.X
and above, though I am not sure about the impact on speed.

It has been quite some time already that building numpy/scipy with gcc 4.1
causes troubles with errors and even crashes anyway, so you definitely want
to use a more recent compiler in any case.

David


> On 18.01.2017 08:27, Nathan Goldbaum wrote:
> > I've seen reports on the anaconda mailing list of people seeing similar
> > speed ups when they compile e.g. Numpy with a recent gcc. Anaconda has
> > the same issue as manylinux in that they need to use versions of GCC
> > available on CentOS 5.
> >
> > Given the upcoming official EOL for CentOS5, it might make sense to
> > think about making a pep for a CentOS 6-based manylinux2 docker image,
> > which will allow compiling with a newer GCC.
> >
> > On Tue, Jan 17, 2017 at 9:15 PM Jerome Kieffer  > > wrote:
> >
> > On Tue, 17 Jan 2017 08:56:42 -0500
> >
> > Neal Becker >
> wrote:
> >
> >
> >
> > > I've installed via pip3 on linux x86_64, which gives me a wheel.
> My
> >
> > > question is, am I loosing significant performance choosing this
> > pre-built
> >
> > > binary vs. compiling myself?  For example, my processor might have
> > some more
> >
> > > features than the base version used to build wheels.
> >
> >
> >
> > Hi,
> >
> >
> >
> > I have done some benchmarking (%timeit) for my code running in a
> >
> > jupyter-notebook within a venv installed with pip+manylinux wheels
> >
> > versus ipython and debian packages (on the same computer).
> >
> > I noticed the debian installation was ~20% faster.
> >
> >
> >
> > I did not investigate further if those 20% came from the manylinux (I
> >
> > suspect) or from the notebook infrastructure.
> >
> >
> >
> > HTH,
> >
> > --
> >
> > Jérôme Kieffer
> >
> >
> >
> > ___
> >
> > NumPy-Discussion mailing list
> >
> > NumPy-Discussion@scipy.org 
> >
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.12.0 release

2017-01-18 Thread Neal Becker
Matthew Brett wrote:

> On Tue, Jan 17, 2017 at 3:47 PM, Neal Becker  wrote:
>> Matthew Brett wrote:
>>
>>> Hi,
>>>
>>> On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker 
>>> wrote:
 Charles R Harris wrote:

> Hi All,
>
> I'm pleased to announce the NumPy 1.12.0 release. This release
> supports Python 2.7 and 3.4-3.6. Wheels for all supported Python
> versions may be downloaded from PiPY
> , the
> tarball and zip files may be downloaded from Github
> . The release
> notes and files hashes may also be found at Github
>  .
>
> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
> contributors and comprises a large number of fixes and improvements.
> Among
> the many improvements it is difficult to  pick out just a few as
> standing above the others, but the following may be of particular
> interest or indicate areas likely to have future consequences.
>
> * Order of operations in ``np.einsum`` can now be optimized for large
> speed improvements.
> * New ``signature`` argument to ``np.vectorize`` for vectorizing with
> core dimensions.
> * The ``keepdims`` argument was added to many functions.
> * New context manager for testing warnings
> * Support for BLIS in numpy.distutils
> * Much improved support for PyPy (not yet finished)
>
> Enjoy,
>
> Chuck

 I've installed via pip3 on linux x86_64, which gives me a wheel.  My
 question is, am I loosing significant performance choosing this
 pre-built
 binary vs. compiling myself?  For example, my processor might have some
 more features than the base version used to build wheels.
>>>
>>> I guess you are thinking about using this built wheel on some other
>>> machine?   You'd have to be lucky for that to work; the wheel depends
>>> on the symbols it found at build time, which may not exist in the same
>>> places on your other machine.
>>>
>>> If it does work, the speed will primarily depend on your BLAS library.
>>>
>>> The pypi wheels should be pretty fast; they are built with OpenBLAS,
>>> which is at or near top of range for speed, across a range of
>>> platforms.
>>>
>>> Cheers,
>>>
>>> Matthew
>>
>> I installed using pip3 install, and it installed a wheel package.  I did
>> not
>> build it - aren't wheels already compiled packages?  So isn't it built
>> for the common denominator architecture, not necessarily as fast as one I
>> built
>> myself on my own machine?  My question is, on x86_64, is this potential
>> difference large enough to bother with not using precompiled wheel
>> packages?
> 
> Ah - my guess is that you'd be hard pressed to make a numpy that is as
> fast as the precompiled wheel.   The OpenBLAS library included in
> numpy selects the routines for your CPU at run-time, so they will
> generally be fast on your CPU.   You might be able to get equivalent
> or even better performance with a ATLAS BLAS library recompiled on
> your exact machine, but that's quite a serious investment of time to
> get working, and you'd have to benchmark to find if you were really
> doing any better.
> 
> Cheers,
> 
> Matthew

OK, so at least for BLAS things should be pretty well optimized.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] NumPy 1.12.0 release

2017-01-18 Thread Neal Becker
Nathaniel Smith wrote:

> On Tue, Jan 17, 2017 at 3:47 PM, Neal Becker  wrote:
>> Matthew Brett wrote:
>>
>>> Hi,
>>>
>>> On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker 
>>> wrote:
 Charles R Harris wrote:

> Hi All,
>
> I'm pleased to announce the NumPy 1.12.0 release. This release
> supports Python 2.7 and 3.4-3.6. Wheels for all supported Python
> versions may be downloaded from PiPY
> , the
> tarball and zip files may be downloaded from Github
> . The release
> notes and files hashes may also be found at Github
>  .
>
> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
> contributors and comprises a large number of fixes and improvements.
> Among
> the many improvements it is difficult to  pick out just a few as
> standing above the others, but the following may be of particular
> interest or indicate areas likely to have future consequences.
>
> * Order of operations in ``np.einsum`` can now be optimized for large
> speed improvements.
> * New ``signature`` argument to ``np.vectorize`` for vectorizing with
> core dimensions.
> * The ``keepdims`` argument was added to many functions.
> * New context manager for testing warnings
> * Support for BLIS in numpy.distutils
> * Much improved support for PyPy (not yet finished)
>
> Enjoy,
>
> Chuck

 I've installed via pip3 on linux x86_64, which gives me a wheel.  My
 question is, am I loosing significant performance choosing this
 pre-built
 binary vs. compiling myself?  For example, my processor might have some
 more features than the base version used to build wheels.
>>>
>>> I guess you are thinking about using this built wheel on some other
>>> machine?   You'd have to be lucky for that to work; the wheel depends
>>> on the symbols it found at build time, which may not exist in the same
>>> places on your other machine.
>>>
>>> If it does work, the speed will primarily depend on your BLAS library.
>>>
>>> The pypi wheels should be pretty fast; they are built with OpenBLAS,
>>> which is at or near top of range for speed, across a range of
>>> platforms.
>>>
>>> Cheers,
>>>
>>> Matthew
>>
>> I installed using pip3 install, and it installed a wheel package.  I did
>> not
>> build it - aren't wheels already compiled packages?  So isn't it built
>> for the common denominator architecture, not necessarily as fast as one I
>> built
>> myself on my own machine?  My question is, on x86_64, is this potential
>> difference large enough to bother with not using precompiled wheel
>> packages?
> 
> Ultimately, it's going to depend on all sorts of things, including
> most importantly your actual code. Like most speed questions, the only
> real way to know is to try it and measure the difference.
> 
> The wheels do ship with a fast BLAS (OpenBLAS configured to
> automatically adapt to your CPU at runtime), so the performance will
> at least be reasonable. Possible improvements would include using a
> different and somehow better BLAS (MKL might be faster in some cases),
> tweaking your compiler options to take advantage of whatever SIMD ISAs
> your particular CPU supports (numpy's build system doesn't do this
> automatically but in principle you could do it by hand -- were you
> bothering before? does it even make a difference in practice? I
> dunno), and using a new compiler (the linux wheels use a somewhat
> ancient version of gcc for Reasons; newer compilers are better at
> optimizing -- how much does it matter? again I dunno).
> 
> Basically: if you want to experiment and report back then I think we'd
> all be interested to hear; OTOH if you aren't feeling particularly
> curious/ambitious then I wouldn't worry about it :-).
> 
> -n
> 

Yes, I always add -march=native, which should pickup whatever SIMD is 
available.  So my question was primarily if I should bother.  Thanks for the 
detailed answer.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.12.0 release

2017-01-18 Thread Julian Taylor
The version of gcc used will make a large difference in some places.
E.g. the AVX2 integer ufuncs require something around 4.5 to work and in
general the optimization level of gcc has improved greatly since the
clang competition showed up around that time. centos 5 has 4.1 which is
really ancient.
I though the wheels used newer gccs also on centos 5?

On 18.01.2017 08:27, Nathan Goldbaum wrote:
> I've seen reports on the anaconda mailing list of people seeing similar
> speed ups when they compile e.g. Numpy with a recent gcc. Anaconda has
> the same issue as manylinux in that they need to use versions of GCC
> available on CentOS 5.
> 
> Given the upcoming official EOL for CentOS5, it might make sense to
> think about making a pep for a CentOS 6-based manylinux2 docker image,
> which will allow compiling with a newer GCC.
> 
> On Tue, Jan 17, 2017 at 9:15 PM Jerome Kieffer  > wrote:
> 
> On Tue, 17 Jan 2017 08:56:42 -0500
> 
> Neal Becker > wrote:
> 
> 
> 
> > I've installed via pip3 on linux x86_64, which gives me a wheel.  My
> 
> > question is, am I loosing significant performance choosing this
> pre-built
> 
> > binary vs. compiling myself?  For example, my processor might have
> some more
> 
> > features than the base version used to build wheels.
> 
> 
> 
> Hi,
> 
> 
> 
> I have done some benchmarking (%timeit) for my code running in a
> 
> jupyter-notebook within a venv installed with pip+manylinux wheels
> 
> versus ipython and debian packages (on the same computer).
> 
> I noticed the debian installation was ~20% faster.
> 
> 
> 
> I did not investigate further if those 20% came from the manylinux (I
> 
> suspect) or from the notebook infrastructure.
> 
> 
> 
> HTH,
> 
> --
> 
> Jérôme Kieffer
> 
> 
> 
> ___
> 
> NumPy-Discussion mailing list
> 
> NumPy-Discussion@scipy.org 
> 
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-18 Thread Nadav Har'El
On Wed, Jan 18, 2017 at 11:00 AM, aleba...@gmail.com 
wrote:

> Let's look at what the user asked this function, and what it returns:
>
>>
>> User asks: please give me random pairs of the three items, where item 1
>> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>>
>> Function returns: random pairs, where if you make many random returned
>> results (as in the law of large numbers) and look at the items they
>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
>> 0.38333.
>> These are not (quite) the probabilities the user asked for...
>>
>> Can you explain a sense where the user's requested probabilities (0.2,
>> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>>
>
> I think that the question the user is asking by specifying p is a slightly
> different one:
>  "please give me random pairs of the three items extracted from a
> population of 3 items where item 1 has probability of being extracted of
> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
> extracted."
>

You are right, if that is what the user wants, numpy.random.choice does the
right thing.

I'm just wondering whether this is actually what users want, and whether
they understand this is what they are getting.

As I said, I expected it to generate pairs with, empirically, the desired
distribution of individual items. The documentation of numpy.random.choice
seemed to me (wrongly) that it implis that that's what it does. So I was
surprised to realize that it does not.

Nadav.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-18 Thread aleba...@gmail.com
2017-01-18 9:35 GMT+01:00 Nadav Har'El :

>
> On Wed, Jan 18, 2017 at 1:58 AM, aleba...@gmail.com 
> wrote:
>
>>
>>
>> 2017-01-17 22:13 GMT+01:00 Nadav Har'El :
>>
>>>
>>> On Tue, Jan 17, 2017 at 7:18 PM, aleba...@gmail.com 
>>> wrote:
>>>
 Hi Nadav,

 I may be wrong, but I think that the result of the current
 implementation is actually the expected one.
 Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and
 0.4

 P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])

>>>
>>> Yes, this formula does fit well with the actual algorithm in the code.
>>> But, my question is *why* we want this formula to be correct:
>>>
>>> Just a note: this formula is correct and it is one of statistics
>> fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability
>> + https://en.wikipedia.org/wiki/Bayes%27_theorem
>>
>
> Hi,
>
> Yes, of course the formula is correct, but it doesn't mean we're not
> applying it in the wrong context.
>
> I'll be honest here: I came to numpy.random.choice after I actually coded
> a similar algorithm (with the same results) myself, because like you I
> thought this was the "obvious" and correct algorithm. Only then I realized
> that its output doesn't actually produce the desired probabilities
> specified by the user - even in the cases where that is possible. And I
> started wondering if existing libraries - like numpy - do this differently.
> And it turns out, numpy does it (basically) in the same way as my algorithm.
>
>
>>
>> Thus, the result we get from random.choice IMHO definitely makes sense.
>>
>
> Let's look at what the user asked this function, and what it returns:
>
> User asks: please give me random pairs of the three items, where item 1
> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>
> Function returns: random pairs, where if you make many random returned
> results (as in the law of large numbers) and look at the items they
> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
> 0.38333.
> These are not (quite) the probabilities the user asked for...
>
> Can you explain a sense where the user's requested probabilities (0.2,
> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>

I think that the question the user is asking by specifying p is a slightly
different one:
 "please give me random pairs of the three items extracted from a
population of 3 items where item 1 has probability of being extracted of
0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
extracted."


> Thanks,
> Nadav Har'El.
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
--
NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain
confidential information and are intended for the sole use of the
recipient(s) named above. If you are not the intended recipient of this
message you are hereby notified that any dissemination or copying of this
message is strictly prohibited. If you have received this e-mail in error,
please notify the sender either by telephone or by e-mail and delete the
material from any computer. Thank you.
--
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-18 Thread Nadav Har'El
On Wed, Jan 18, 2017 at 1:58 AM, aleba...@gmail.com 
wrote:

>
>
> 2017-01-17 22:13 GMT+01:00 Nadav Har'El :
>
>>
>> On Tue, Jan 17, 2017 at 7:18 PM, aleba...@gmail.com 
>> wrote:
>>
>>> Hi Nadav,
>>>
>>> I may be wrong, but I think that the result of the current
>>> implementation is actually the expected one.
>>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and
>>> 0.4
>>>
>>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>>>
>>
>> Yes, this formula does fit well with the actual algorithm in the code.
>> But, my question is *why* we want this formula to be correct:
>>
>> Just a note: this formula is correct and it is one of statistics
> fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability +
> https://en.wikipedia.org/wiki/Bayes%27_theorem
>

Hi,

Yes, of course the formula is correct, but it doesn't mean we're not
applying it in the wrong context.

I'll be honest here: I came to numpy.random.choice after I actually coded a
similar algorithm (with the same results) myself, because like you I
thought this was the "obvious" and correct algorithm. Only then I realized
that its output doesn't actually produce the desired probabilities
specified by the user - even in the cases where that is possible. And I
started wondering if existing libraries - like numpy - do this differently.
And it turns out, numpy does it (basically) in the same way as my algorithm.


>
> Thus, the result we get from random.choice IMHO definitely makes sense.
>

Let's look at what the user asked this function, and what it returns:

User asks: please give me random pairs of the three items, where item 1 has
probability 0.2, item 2 has 0.4, and 3 has 0.4.

Function returns: random pairs, where if you make many random returned
results (as in the law of large numbers) and look at the items they
contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
0.38333.
These are not (quite) the probabilities the user asked for...

Can you explain a sense where the user's requested probabilities (0.2, 0.4,
0.4) are actually adhered in the results which random.choice returns?

Thanks,
Nadav Har'El.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] GSoC 2017: NumFocus will be an umbrella organization

2017-01-18 Thread Ralf Gommers
Hi Max,

On Tue, Jan 17, 2017 at 2:38 AM, Max Linke  wrote:

> Hi
>
> Organizations can start submitting applications for Google Summer of Code
> 2017 on January 19 (and the deadline is February 9)
>
> https://developers.google.com/open-source/gsoc/timeline?hl=en


Thanks for bringing this up, and for organizing the NumFOCUS participation!


> NumFOCUS will be applying again this year. If you want to work with us
> please let me know and if you apply as an organization yourself or under a
> different umbrella organization please tell me as well.


I suspect we won't participate at all, but if we do then it's likely under
the PSF umbrella as we have done previously.

@all: in practice working on NumPy is just far too hard for most GSoC
students. Previous years we've registered and generated ideas, but not
gotten any students. We're also short on maintainer capacity. So I propose
to not participate this year.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion