Hi, Will! Answers in line.

Cheers, Craig > On Apr 8, 2018, at 11:33 AM, William Schmidt <t.william.schm...@gmail.com> > wrote: > > Craig, > > Thank you, it works but first I had to pick those method calls apart to see > what each is doing and rtfm about sever in PDL::Core. > > So, I am left with a set of questions best depicted with some examples. As I > understand the ::Core pod, that random call and floor are creating a virtual > piddle that functions to slice or index into $vec. And since the virtual > piddle is longer than $vec, it loops until exactly the number of items in the > virtual piddle are pushed from $vec into $rand. Almost — but not quite. It works because the *value* of each of the million elements of floor(random(1e6)*8) is between 0 and 7. So on average you get about 125,000 copies of each value in $vec in the output vector, in this example. > So, what is the difference in the following invocations? > > $rand = $vec->(floor(random(1e6) * 8))->sever; # yours > $rand = $vec->(floor(random(1e6) * 8)); > $rand = $vec->index(floor(random(1e6) * 8))->sever; > $rand = $vec->index(floor(random(1e6) * 8)); > > When I inspect $rand, all are different, since random() probably generates a > fresh seed, but they all look to be examples of my weighted probability > distribution. > > To be precise, is there an implied or default method call after $rand = > $vec->... in your example, and what is lost when sever is not invoked? Hmmm, all those should give the same answer. The “sever” breaks an implicit connection between piddles. A quirk of PDL is that all indexing operations keep their output connected to the input. The final operation in the expressions in lines 2 and 4 is an index of some kind (either via “index” or via “niceslice”, which implements the slicing syntax). The output remains connected to $vec, which costs some memory (PDL has to store the complete mapping between elements, since it is arbitrary rather than affine). The “sever” makes sure that you get back a standalone PDL that is no longer connected to anything. The slicing and indexing operations do the same thing in 1-D, but they generalize slightly differently. Niceslice doesn’t allow multidimensional indices. Index does, but they have to agree (in a threading matchup sense) with the indices of the source array. > > I think I can visualize how to generalize this algorithm but it is much less > intuitive than the way one can do it in R and building $vec can be > problematic, or am I missing something? A complex set of weights using reals, > for example: [0.18, 0.31, 0.05, 0.07,...] summing to 1 would require a lot of > fiddling for it to be used as a slice. The easiest way to generalize it is to just let $vec get huge. If you want six significant digits of precision in the probability, you can just create a million-bin $vec. That’s independent of the number of actual output values you want. The whole idea of languages like PDL (or, to some extent, R) is to trade computer effort for yours — so brute-forcing a lookup to moderate size is no big deal. If you’re doing a *lot* of computations (e.g. sampling a million different distributions) then you want something more elegant — but if you’re doing under, say, a thousand different ones then brute force is just the trick. One advantage of doing it that way is that if you accumulate the bins in your $vec algorithmically you don’t have to normalize — just track out how many elements you use as you accumulate,and divide by that number in the end. For heavier firepower, you can use PDL::GSL, which includes the whole suite of random number generators from the Gnu Scientific Library. If you want an arbitrary threshold selector, you can use the following one-level-less-brute-force code: ########### # rand_i_from_histogram # Pass in a 1-D histogram of probabilities for the index integer value, and # a set of parameters to random(), and get back an appropriately sized # array of samples of the random variable with that histogram’s distribution. # Probabilities are auto-normalized. use strict; sub rand_i_from_histogram { my $cumudist = pdl(shift)->cumusumover; # invoke pdl() to allow array refs. my $rand = random( @_ ) * $cumudist->max; # pass all remaining parameters to random() my $out = zeroes($rand); # Make space for the output for my $i(0..$cumudist->nelem-2){ # Implement threshold selection $out += ( $rand >= $cumudist->(($i)) ); } return $out; } $samples = rand_i_from_histogram([0.18,0.31,0.05,0.07], 10); > > Regards, > Will > t.william.schm...@gmail.com <mailto:t.william.schm...@gmail.com> > > > > > On Fri, Apr 6, 2018 at 7:58 PM, Craig DeForest <defor...@boulder.swri.edu > <mailto:defor...@boulder.swri.edu>> wrote: > Welcome, William! > > You are probably looking for “random()”, which has the same syntax as > “zeroes()” but returns a vector of pseudorandom values on [0,1). > To make a vector of a million of those, use “$a = random(1e6)”. > > To make random integers based on a histogram that you already have in-hand, a > simple way is: > > $vec = pdl(0,1,1,1,2,2,2,3); # note 8 values > $rand = $vec->(floor(random(1e6) * 8))->sever; > > The “random(1e6”) makes a million elements on [0,1). Multiply by 8 and take > the floor to get integers on [0,7]. The outermost operation indexes $vec > with the corresponding random value. Since there are three 1’s, 1 is three > times as likely in the output. > > Does that work? > > Best, > Craig > > > >> On Apr 6, 2018, at 5:16 PM, William Schmidt <t.william.schm...@gmail.com >> <mailto:t.william.schm...@gmail.com>> wrote: >> >> Hello Piddlers, >> >> I am moving from R to PDL, with tons of experience with Perl, lots with R >> but zero with PDL, >> so this is a pretty basic question. I can see from the PDL Book that PDL is >> very >> sophisticated, with much more functionality than I will ever use, but I want >> to master basic PDL to leverage my Perl. My focus is on probability in two >> dimensions so >> I will be working mostly with 1-dimensional vectors. Here is an example from >> R that >> I would like to learn how to do in PDL. It is a small example but once I >> master >> the construction of this data I will extend it to much larger vectors. >> >> Suppose I have random variable X whose values and probabilities are as >> follows: >> >> x p(x) >> 0 1/8 >> 1 3/8 >> 2 3/8 >> 3 1/8 >> >> To get a sample of 50 random values drawn from this population with >> replacement in R I >> would say: >> >> x <- seq.int <http://seq.int/>(0,3) # Concatenate a sequence of ints >> from 0 to 3. >> x # print x. >> [1] 0 1 2 3 >> >> weights <- c(1/8, 3/8, 3/8, 1/8) # Another form of concatenation. >> weights >> [1] 0.125 0.375 0.375 0.125 >> >> s <- sample(x, 50, replace=TRUE, prob=weights) >> s >> [1] 0 1 1 3 2 2 2 3 2 0 0 1 3 1 1 3 0 2 1 2 2 1 3 1 2 2 0 2 2 2 3 2 >> [33] 1 1 3 1 2 2 1 1 0 1 3 2 2 1 3 0 1 1 >> >> I can now manipulate s, calculate its statistical properties and graph its >> probability distribution. Fifty integer values is not very interesting but >> the problems I am studying have thousands of values and very different >> weights. How do I do this in PDL? I have PDL::Stats::Basic and >> PDL::Stats::Distr installed along with PGPLOT but it's generating this basic >> data that has me stumped. >> >> Thanks and regards, >> >> Will Schmidt >> t.william.schm...@gmail.com <mailto:t.william.schm...@gmail.com> >> >> >> >> ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org <http://slashdot.org/>! >> http://sdm.link/slashdot_______________________________________________ >> <http://sdm.link/slashdot_______________________________________________> >> pdl-general mailing list >> pdl-general@lists.sourceforge.net <mailto:pdl-general@lists.sourceforge.net> >> https://lists.sourceforge.net/lists/listinfo/pdl-general >> <https://lists.sourceforge.net/lists/listinfo/pdl-general> > >

------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________ pdl-general mailing list pdl-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pdl-general