Re: [Numpy-discussion] non-uniform discrete sampling with given probabilities (w/ and w/o replacement)

2011-08-31 Thread Christopher Jordan-Squire
On Wed, Aug 31, 2011 at 2:07 PM, Olivier Delalleau sh...@keba.be wrote:
 You can use:
 1 + numpy.argmax(numpy.random.multinomial(1, [0.1, 0.2, 0.7]))

 For your real application you'll probably want to use a value 1 for the
 first parameter (equal to your sample size), instead of calling it multiple
 times.

 -=- Olivier

Thanks. Warren (Weckesser) mentioned this possibility to me yesterday
and I forgot to put it in my post. I assume you mean something like

x = np.arange(3)
y = np.random.multinomial(30, [0.1,0.2,0.7])
z = np.repeat(x, y)
np.random.shuffle(z)

That look right?

-Chris JS


 2011/8/31 Christopher Jordan-Squire cjord...@uw.edu

 In numpy, is there a way of generating a random integer in a specified
 range where the integers in that range have given probabilities? So,
 for example, generating a random integer between 1 and 3 with
 probabilities [0.1, 0.2, 0.7] for the three integers?

 I'd like to know how to do this without replacement, as well. If the
 probabilities are uniform, there are a number of ways, including just
 shuffling the data and taking the first however-many elements of the
 shuffle. But this doesn't apply with non-uniform probabilities.
 Similarly, one could try arbitrary-sampling-method X (such as
 inverse-cdf sampling) and then rejecting repeats. But that is clearly
 sub-optimal if the number of samples desired is near the same order of
 magnitude as the total population, or if the probabilities are very
 skewed. (E.g. a weighted sample of size 2 without replacement from
 [0,1,2] with probabilities [0.999,.5, 0.5] will take a long
 time if you just sample repeatedly until you have two distinct
 samples.)

 I know parts of what I want can be done in scipy.statistics using a
 discrete_rv or with the python standard library's random package. I
 would much prefer to do it only using numpy because the eventual
 application shouldn't have a scipy dependency and should use the same
 random seed as numpy.random.

 (For more background, what I want is to create a function like sample
 in R, where I can give it an array-like of doo-hickeys and another
 array-like of probabilities associated with each doo-hickey, and then
 generate a random sample of doo-hickeys with those probabilities. One
 step for that is generating ints, to use as indices, with the same
 probabilities. I'd like a version of this to be in numpy/scipy, but it
 doesn't really belong in scipy since it doesn't

 -Chris JS
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-uniform discrete sampling with given probabilities (w/ and w/o replacement)

2011-08-31 Thread Olivier Delalleau
2011/8/31 Christopher Jordan-Squire cjord...@uw.edu

 On Wed, Aug 31, 2011 at 2:07 PM, Olivier Delalleau sh...@keba.be wrote:
  You can use:
  1 + numpy.argmax(numpy.random.multinomial(1, [0.1, 0.2, 0.7]))
 
  For your real application you'll probably want to use a value 1 for
 the
  first parameter (equal to your sample size), instead of calling it
 multiple
  times.
 
  -=- Olivier

 Thanks. Warren (Weckesser) mentioned this possibility to me yesterday
 and I forgot to put it in my post. I assume you mean something like

 x = np.arange(3)
 y = np.random.multinomial(30, [0.1,0.2,0.7])
 z = np.repeat(x, y)
 np.random.shuffle(z)

 That look right?

 -Chris JS


Yes, exactly.

-=- Olivier
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-uniform discrete sampling with given probabilities (w/ and w/o replacement)

2011-08-31 Thread josef . pktd
On Wed, Aug 31, 2011 at 3:22 PM, Olivier Delalleau sh...@keba.be wrote:
 2011/8/31 Christopher Jordan-Squire cjord...@uw.edu

 On Wed, Aug 31, 2011 at 2:07 PM, Olivier Delalleau sh...@keba.be wrote:
  You can use:
  1 + numpy.argmax(numpy.random.multinomial(1, [0.1, 0.2, 0.7]))
 
  For your real application you'll probably want to use a value 1 for
  the
  first parameter (equal to your sample size), instead of calling it
  multiple
  times.
 
  -=- Olivier

 Thanks. Warren (Weckesser) mentioned this possibility to me yesterday
 and I forgot to put it in my post. I assume you mean something like

 x = np.arange(3)
 y = np.random.multinomial(30, [0.1,0.2,0.7])
 z = np.repeat(x, y)
 np.random.shuffle(z)

 That look right?

 -Chris JS


 Yes, exactly.

Chuck's answer to the same question, when I asked on the list, used
searchsorted and is fast

cdfvalues.searchsorted(np.random.random(size))

my recent version of it for FiniteLatticeDistribution

def rvs(self, size=1):
'''draw random variables with shape given by size

'''
#w = self.pdfvalues
#p = cumsum(w)/float(w.sum())
#p.searchsorted(np.random.random(size))
return self.support[self.cdfvalues.searchsorted(np.random.random(size))]

Josef



 -=- Olivier

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion