Re: [Pdl-general] Generating a vector of random integers

Craig DeForest Sun, 08 Apr 2018 16:51:59 -0700

Hi, Will!

Answers in line.


Cheers,
Craig


> On Apr 8, 2018, at 11:33 AM, William Schmidt <[email protected]> 
> wrote:
> 
> Craig,
> 
> Thank you, it works but first I had to pick those method calls apart to see 
> what each is doing and rtfm about sever in PDL::Core.
> 
> So, I am left with a set of questions best depicted with some examples. As I 
> understand the ::Core pod, that random call and floor are creating a virtual 
> piddle that functions to slice or index into $vec. And since the virtual 
> piddle is longer than $vec, it loops until exactly the number of items in the 
> virtual piddle are pushed from $vec into $rand.

Almost — but not quite.  It works because the *value* of each of the million 
elements of floor(random(1e6)*8) is between 0 and 7.  So on average you get 
about 125,000 copies of each value in $vec in the output vector, in this 
example.


> So, what is the difference in the following invocations?
> 
> $rand = $vec->(floor(random(1e6) * 8))->sever;      # yours
> $rand = $vec->(floor(random(1e6) * 8));
> $rand = $vec->index(floor(random(1e6) * 8))->sever;
> $rand = $vec->index(floor(random(1e6) * 8));
> 
> When I inspect $rand, all are different, since random() probably generates a 
> fresh seed, but they all look to be examples of my weighted probability 
> distribution.
> 
> To be precise, is there an implied or default method call after $rand = 
> $vec->... in your example, and what is lost when sever is not invoked?

Hmmm,  all those should give the same answer.  The “sever” breaks an implicit 
connection between piddles.  A quirk of PDL is that all indexing operations 
keep their output connected to the input.  The final operation in the 
expressions in lines 2 and 4 is an index of some kind (either via “index” or 
via “niceslice”, which implements the slicing syntax).  The output remains 
connected to $vec, which costs some memory (PDL has to store the complete 
mapping between elements, since it is arbitrary rather than affine).  The 
“sever” makes sure that you get back a standalone PDL that is no longer 
connected to anything.

The slicing and indexing operations do the same thing in 1-D, but they 
generalize slightly differently.  Niceslice doesn’t allow multidimensional 
indices.  Index does, but they have to agree (in a threading matchup sense) 
with the indices of the source array.

> 
> I think I can visualize how to generalize this algorithm but it is much less 
> intuitive than the way one can do it in R and building $vec can be 
> problematic, or am I missing something? A complex set of weights using reals, 
> for example: [0.18, 0.31, 0.05, 0.07,...] summing to 1 would require a lot of 
> fiddling for it to be used as a slice.

The easiest way to generalize it is to just let $vec get huge.  If you want six 
significant digits of precision in the probability, you can just create a 
million-bin $vec.  That’s independent of the number of actual output values you 
want.  The whole idea of languages like PDL (or, to some extent, R) is to trade 
computer effort for yours — so brute-forcing a lookup to moderate size is no 
big deal.  If you’re doing a *lot* of computations (e.g. sampling a million 
different distributions) then you want something more elegant — but if you’re 
doing under, say, a thousand different ones then brute force is just the trick. 
 One advantage of doing it that way is that if you accumulate the bins in your 
$vec algorithmically you don’t have to normalize — just track out how many 
elements you use as you accumulate,and divide by that number in the end.

For heavier firepower, you can use PDL::GSL, which includes the whole suite of 
random number generators from the Gnu Scientific Library.  If you want an 
arbitrary threshold selector, you can
use the following one-level-less-brute-force code:

        ###########
        # rand_i_from_histogram
        # Pass in a 1-D histogram of probabilities for the index integer value, 
and
        # a set of parameters to random(), and get back an appropriately sized 
        # array of samples of the random variable with that histogram’s 
distribution.
        # Probabilities are auto-normalized.
        use strict;
        sub rand_i_from_histogram {
            my $cumudist = pdl(shift)->cumusumover;   # invoke pdl() to allow 
array refs. 
            my $rand = random( @_ ) * $cumudist->max; # pass all remaining 
parameters to random()
            my $out = zeroes($rand);                  # Make space for the 
output
          for my $i(0..$cumudist->nelem-2){        # Implement threshold 
selection
            $out += ( $rand >= $cumudist->(($i)) );   
          }
          return $out;                      
        }
        $samples = rand_i_from_histogram([0.18,0.31,0.05,0.07], 10);

> 
> Regards,
> Will
> [email protected] <mailto:[email protected]>
> 
> 
> 
> 
> On Fri, Apr 6, 2018 at 7:58 PM, Craig DeForest <[email protected] 
> <mailto:[email protected]>> wrote:
> Welcome, William!
> 
> You are probably looking for “random()”, which has the same syntax as 
> “zeroes()” but returns a vector of pseudorandom values on [0,1).
> To make a vector of a million of those, use “$a = random(1e6)”.
> 
> To make random integers based on a histogram that you already have in-hand, a 
> simple way is:
> 
> $vec = pdl(0,1,1,1,2,2,2,3); # note 8 values
> $rand = $vec->(floor(random(1e6) * 8))->sever;
> 
> The “random(1e6”) makes a million elements on [0,1).  Multiply by 8 and take 
> the floor to get integers on [0,7].  The outermost operation indexes $vec 
> with the corresponding random value.  Since there are three 1’s, 1 is three 
> times  as likely in the output.
> 
> Does that work?
> 
> Best,
> Craig
> 
> 
> 
>> On Apr 6, 2018, at 5:16 PM, William Schmidt <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hello Piddlers,
>> 
>> I am moving from R to PDL, with tons of experience with Perl, lots with R 
>> but zero with PDL,
>> so this is a pretty basic question. I can see from the PDL Book that PDL is 
>> very
>> sophisticated, with much more functionality than I will ever use, but I want
>> to master basic PDL to leverage my Perl. My focus is on probability in two 
>> dimensions so
>> I will be working mostly with 1-dimensional vectors. Here is an example from 
>> R that 
>> I would like to learn how to do in PDL. It is a small example but once I 
>> master
>> the construction of this data I will extend it to much larger vectors.
>>  
>> Suppose I have random variable X whose values and probabilities are as 
>> follows:
>> 
>> x   p(x)
>> 0   1/8
>> 1   3/8
>> 2   3/8
>> 3   1/8
>> 
>> To get a sample of 50 random values drawn from this population with 
>> replacement in R I
>> would say:
>> 
>> x <- seq.int <http://seq.int/>(0,3)       # Concatenate a sequence of ints 
>> from 0 to 3.
>> x                              # print x.
>> [1]  0 1 2 3
>> 
>> weights <- c(1/8, 3/8, 3/8, 1/8)    # Another form of concatenation.
>> weights
>> [1]  0.125 0.375 0.375 0.125
>> 
>> s <- sample(x, 50, replace=TRUE, prob=weights)
>> s
>>  [1] 0 1 1 3 2 2 2 3 2 0 0 1 3 1 1 3 0 2 1 2 2 1 3 1 2 2 0 2 2 2 3 2
>> [33] 1 1 3 1 2 2 1 1 0 1 3 2 2 1 3 0 1 1
>> 
>> I can now manipulate s, calculate its statistical properties and graph its 
>> probability distribution. Fifty integer values is not very interesting but 
>> the problems I am studying have thousands of values and very different 
>> weights. How do I do this in PDL? I have PDL::Stats::Basic and 
>> PDL::Stats::Distr installed along with PGPLOT but it's generating this basic 
>> data that has me stumped.
>> 
>> Thanks and regards,
>> 
>> Will Schmidt
>> [email protected] <mailto:[email protected]>
>> 
>> 
>> 
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org <http://slashdot.org/>! 
>> http://sdm.link/slashdot_______________________________________________ 
>> <http://sdm.link/slashdot_______________________________________________>
>> pdl-general mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.sourceforge.net/lists/listinfo/pdl-general 
>> <https://lists.sourceforge.net/lists/listinfo/pdl-general>
> 
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
pdl-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pdl-general

Re: [Pdl-general] Generating a vector of random integers

Reply via email to