Re: [Pdl-general] Generating a vector of random integers

```Hi, Will!

```
Cheers,
Craig

> On Apr 8, 2018, at 11:33 AM, William Schmidt <t.william.schm...@gmail.com>
> wrote:
>
> Craig,
>
> Thank you, it works but first I had to pick those method calls apart to see
> what each is doing and rtfm about sever in PDL::Core.
>
> So, I am left with a set of questions best depicted with some examples. As I
> understand the ::Core pod, that random call and floor are creating a virtual
> piddle that functions to slice or index into \$vec. And since the virtual
> piddle is longer than \$vec, it loops until exactly the number of items in the
> virtual piddle are pushed from \$vec into \$rand.

Almost — but not quite.  It works because the *value* of each of the million
elements of floor(random(1e6)*8) is between 0 and 7.  So on average you get
about 125,000 copies of each value in \$vec in the output vector, in this
example.

> So, what is the difference in the following invocations?
>
> \$rand = \$vec->(floor(random(1e6) * 8))->sever;      # yours
> \$rand = \$vec->(floor(random(1e6) * 8));
> \$rand = \$vec->index(floor(random(1e6) * 8))->sever;
> \$rand = \$vec->index(floor(random(1e6) * 8));
>
> When I inspect \$rand, all are different, since random() probably generates a
> fresh seed, but they all look to be examples of my weighted probability
> distribution.
>
> To be precise, is there an implied or default method call after \$rand =
> \$vec->... in your example, and what is lost when sever is not invoked?

Hmmm,  all those should give the same answer.  The “sever” breaks an implicit
connection between piddles.  A quirk of PDL is that all indexing operations
keep their output connected to the input.  The final operation in the
expressions in lines 2 and 4 is an index of some kind (either via “index” or
via “niceslice”, which implements the slicing syntax).  The output remains
connected to \$vec, which costs some memory (PDL has to store the complete
mapping between elements, since it is arbitrary rather than affine).  The
“sever” makes sure that you get back a standalone PDL that is no longer
connected to anything.

The slicing and indexing operations do the same thing in 1-D, but they
generalize slightly differently.  Niceslice doesn’t allow multidimensional
indices.  Index does, but they have to agree (in a threading matchup sense)
with the indices of the source array.

>
> I think I can visualize how to generalize this algorithm but it is much less
> intuitive than the way one can do it in R and building \$vec can be
> problematic, or am I missing something? A complex set of weights using reals,
> for example: [0.18, 0.31, 0.05, 0.07,...] summing to 1 would require a lot of
> fiddling for it to be used as a slice.

The easiest way to generalize it is to just let \$vec get huge.  If you want six
significant digits of precision in the probability, you can just create a
million-bin \$vec.  That’s independent of the number of actual output values you
want.  The whole idea of languages like PDL (or, to some extent, R) is to trade
computer effort for yours — so brute-forcing a lookup to moderate size is no
big deal.  If you’re doing a *lot* of computations (e.g. sampling a million
different distributions) then you want something more elegant — but if you’re
doing under, say, a thousand different ones then brute force is just the trick.
One advantage of doing it that way is that if you accumulate the bins in your
\$vec algorithmically you don’t have to normalize — just track out how many
elements you use as you accumulate,and divide by that number in the end.

For heavier firepower, you can use PDL::GSL, which includes the whole suite of
random number generators from the Gnu Scientific Library.  If you want an
arbitrary threshold selector, you can
use the following one-level-less-brute-force code:

###########
# rand_i_from_histogram
# Pass in a 1-D histogram of probabilities for the index integer value,
and
# a set of parameters to random(), and get back an appropriately sized
# array of samples of the random variable with that histogram’s
distribution.
# Probabilities are auto-normalized.
use strict;
sub rand_i_from_histogram {
my \$cumudist = pdl(shift)->cumusumover;   # invoke pdl() to allow
array refs.
my \$rand = random( @_ ) * \$cumudist->max; # pass all remaining
parameters to random()
my \$out = zeroes(\$rand);                  # Make space for the
output
for my \$i(0..\$cumudist->nelem-2){        # Implement threshold
selection
\$out += ( \$rand >= \$cumudist->((\$i)) );
}
return \$out;
}
\$samples = rand_i_from_histogram([0.18,0.31,0.05,0.07], 10);

>
> Regards,
> Will
> t.william.schm...@gmail.com <mailto:t.william.schm...@gmail.com>
>
>
>
>
> On Fri, Apr 6, 2018 at 7:58 PM, Craig DeForest <defor...@boulder.swri.edu
> <mailto:defor...@boulder.swri.edu>> wrote:
> Welcome, William!
>
> You are probably looking for “random()”, which has the same syntax as
> “zeroes()” but returns a vector of pseudorandom values on [0,1).
> To make a vector of a million of those, use “\$a = random(1e6)”.
>
> To make random integers based on a histogram that you already have in-hand, a
> simple way is:
>
> \$vec = pdl(0,1,1,1,2,2,2,3); # note 8 values
> \$rand = \$vec->(floor(random(1e6) * 8))->sever;
>
> The “random(1e6”) makes a million elements on [0,1).  Multiply by 8 and take
> the floor to get integers on [0,7].  The outermost operation indexes \$vec
> with the corresponding random value.  Since there are three 1’s, 1 is three
> times  as likely in the output.
>
> Does that work?
>
> Best,
> Craig
>
>
>
>> On Apr 6, 2018, at 5:16 PM, William Schmidt <t.william.schm...@gmail.com
>> <mailto:t.william.schm...@gmail.com>> wrote:
>>
>> Hello Piddlers,
>>
>> I am moving from R to PDL, with tons of experience with Perl, lots with R
>> but zero with PDL,
>> so this is a pretty basic question. I can see from the PDL Book that PDL is
>> very
>> sophisticated, with much more functionality than I will ever use, but I want
>> to master basic PDL to leverage my Perl. My focus is on probability in two
>> dimensions so
>> I will be working mostly with 1-dimensional vectors. Here is an example from
>> R that
>> I would like to learn how to do in PDL. It is a small example but once I
>> master
>> the construction of this data I will extend it to much larger vectors.
>>
>> Suppose I have random variable X whose values and probabilities are as
>> follows:
>>
>> x   p(x)
>> 0   1/8
>> 1   3/8
>> 2   3/8
>> 3   1/8
>>
>> To get a sample of 50 random values drawn from this population with
>> replacement in R I
>> would say:
>>
>> x <- seq.int <http://seq.int/>(0,3)       # Concatenate a sequence of ints
>> from 0 to 3.
>> x                              # print x.
>> [1]  0 1 2 3
>>
>> weights <- c(1/8, 3/8, 3/8, 1/8)    # Another form of concatenation.
>> weights
>> [1]  0.125 0.375 0.375 0.125
>>
>> s <- sample(x, 50, replace=TRUE, prob=weights)
>> s
>>  [1] 0 1 1 3 2 2 2 3 2 0 0 1 3 1 1 3 0 2 1 2 2 1 3 1 2 2 0 2 2 2 3 2
>> [33] 1 1 3 1 2 2 1 1 0 1 3 2 2 1 3 0 1 1
>>
>> I can now manipulate s, calculate its statistical properties and graph its
>> probability distribution. Fifty integer values is not very interesting but
>> the problems I am studying have thousands of values and very different
>> weights. How do I do this in PDL? I have PDL::Stats::Basic and
>> PDL::Stats::Distr installed along with PGPLOT but it's generating this basic
>> data that has me stumped.
>>
>> Thanks and regards,
>>
>> Will Schmidt
>> t.william.schm...@gmail.com <mailto:t.william.schm...@gmail.com>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org <http://slashdot.org/>!
>> pdl-general mailing list
>> pdl-general@lists.sourceforge.net <mailto:pdl-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/pdl-general
>> <https://lists.sourceforge.net/lists/listinfo/pdl-general>
>
>

```
```------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
```_______________________________________________