Hi, Will!
Answers in line.
Cheers,
Craig
> On Apr 8, 2018, at 11:33 AM, William Schmidt <t.william.schm...@gmail.com>
> wrote:
>
> Craig,
>
> Thank you, it works but first I had to pick those method calls apart to see
> what each is doing and rtfm about sever in PDL::Core.
>
> So, I am left with a set of questions best depicted with some examples. As I
> understand the ::Core pod, that random call and floor are creating a virtual
> piddle that functions to slice or index into $vec. And since the virtual
> piddle is longer than $vec, it loops until exactly the number of items in the
> virtual piddle are pushed from $vec into $rand.
Almost — but not quite. It works because the *value* of each of the million
elements of floor(random(1e6)*8) is between 0 and 7. So on average you get
about 125,000 copies of each value in $vec in the output vector, in this
example.
> So, what is the difference in the following invocations?
>
> $rand = $vec->(floor(random(1e6) * 8))->sever; # yours
> $rand = $vec->(floor(random(1e6) * 8));
> $rand = $vec->index(floor(random(1e6) * 8))->sever;
> $rand = $vec->index(floor(random(1e6) * 8));
>
> When I inspect $rand, all are different, since random() probably generates a
> fresh seed, but they all look to be examples of my weighted probability
> distribution.
>
> To be precise, is there an implied or default method call after $rand =
> $vec->... in your example, and what is lost when sever is not invoked?
Hmmm, all those should give the same answer. The “sever” breaks an implicit
connection between piddles. A quirk of PDL is that all indexing operations
keep their output connected to the input. The final operation in the
expressions in lines 2 and 4 is an index of some kind (either via “index” or
via “niceslice”, which implements the slicing syntax). The output remains
connected to $vec, which costs some memory (PDL has to store the complete
mapping between elements, since it is arbitrary rather than affine). The
“sever” makes sure that you get back a standalone PDL that is no longer
connected to anything.
The slicing and indexing operations do the same thing in 1-D, but they
generalize slightly differently. Niceslice doesn’t allow multidimensional
indices. Index does, but they have to agree (in a threading matchup sense)
with the indices of the source array.
>
> I think I can visualize how to generalize this algorithm but it is much less
> intuitive than the way one can do it in R and building $vec can be
> problematic, or am I missing something? A complex set of weights using reals,
> for example: [0.18, 0.31, 0.05, 0.07,...] summing to 1 would require a lot of
> fiddling for it to be used as a slice.
The easiest way to generalize it is to just let $vec get huge. If you want six
significant digits of precision in the probability, you can just create a
million-bin $vec. That’s independent of the number of actual output values you
want. The whole idea of languages like PDL (or, to some extent, R) is to trade
computer effort for yours — so brute-forcing a lookup to moderate size is no
big deal. If you’re doing a *lot* of computations (e.g. sampling a million
different distributions) then you want something more elegant — but if you’re
doing under, say, a thousand different ones then brute force is just the trick.
One advantage of doing it that way is that if you accumulate the bins in your
$vec algorithmically you don’t have to normalize — just track out how many
elements you use as you accumulate,and divide by that number in the end.
For heavier firepower, you can use PDL::GSL, which includes the whole suite of
random number generators from the Gnu Scientific Library. If you want an
arbitrary threshold selector, you can
use the following one-level-less-brute-force code:
###########
# rand_i_from_histogram
# Pass in a 1-D histogram of probabilities for the index integer value,
and
# a set of parameters to random(), and get back an appropriately sized
# array of samples of the random variable with that histogram’s
distribution.
# Probabilities are auto-normalized.
use strict;
sub rand_i_from_histogram {
my $cumudist = pdl(shift)->cumusumover; # invoke pdl() to allow
array refs.
my $rand = random( @_ ) * $cumudist->max; # pass all remaining
parameters to random()
my $out = zeroes($rand); # Make space for the
output
for my $i(0..$cumudist->nelem-2){ # Implement threshold
selection
$out += ( $rand >= $cumudist->(($i)) );
}
return $out;
}
$samples = rand_i_from_histogram([0.18,0.31,0.05,0.07], 10);
>
> Regards,
> Will
> t.william.schm...@gmail.com <mailto:t.william.schm...@gmail.com>
>
>
>
>
> On Fri, Apr 6, 2018 at 7:58 PM, Craig DeForest <defor...@boulder.swri.edu
> <mailto:defor...@boulder.swri.edu>> wrote:
> Welcome, William!
>
> You are probably looking for “random()”, which has the same syntax as
> “zeroes()” but returns a vector of pseudorandom values on [0,1).
> To make a vector of a million of those, use “$a = random(1e6)”.
>
> To make random integers based on a histogram that you already have in-hand, a
> simple way is:
>
> $vec = pdl(0,1,1,1,2,2,2,3); # note 8 values
> $rand = $vec->(floor(random(1e6) * 8))->sever;
>
> The “random(1e6”) makes a million elements on [0,1). Multiply by 8 and take
> the floor to get integers on [0,7]. The outermost operation indexes $vec
> with the corresponding random value. Since there are three 1’s, 1 is three
> times as likely in the output.
>
> Does that work?
>
> Best,
> Craig
>
>
>
>> On Apr 6, 2018, at 5:16 PM, William Schmidt <t.william.schm...@gmail.com
>> <mailto:t.william.schm...@gmail.com>> wrote:
>>
>> Hello Piddlers,
>>
>> I am moving from R to PDL, with tons of experience with Perl, lots with R
>> but zero with PDL,
>> so this is a pretty basic question. I can see from the PDL Book that PDL is
>> very
>> sophisticated, with much more functionality than I will ever use, but I want
>> to master basic PDL to leverage my Perl. My focus is on probability in two
>> dimensions so
>> I will be working mostly with 1-dimensional vectors. Here is an example from
>> R that
>> I would like to learn how to do in PDL. It is a small example but once I
>> master
>> the construction of this data I will extend it to much larger vectors.
>>
>> Suppose I have random variable X whose values and probabilities are as
>> follows:
>>
>> x p(x)
>> 0 1/8
>> 1 3/8
>> 2 3/8
>> 3 1/8
>>
>> To get a sample of 50 random values drawn from this population with
>> replacement in R I
>> would say:
>>
>> x <- seq.int <http://seq.int/>(0,3) # Concatenate a sequence of ints
>> from 0 to 3.
>> x # print x.
>> [1] 0 1 2 3
>>
>> weights <- c(1/8, 3/8, 3/8, 1/8) # Another form of concatenation.
>> weights
>> [1] 0.125 0.375 0.375 0.125
>>
>> s <- sample(x, 50, replace=TRUE, prob=weights)
>> s
>> [1] 0 1 1 3 2 2 2 3 2 0 0 1 3 1 1 3 0 2 1 2 2 1 3 1 2 2 0 2 2 2 3 2
>> [33] 1 1 3 1 2 2 1 1 0 1 3 2 2 1 3 0 1 1
>>
>> I can now manipulate s, calculate its statistical properties and graph its
>> probability distribution. Fifty integer values is not very interesting but
>> the problems I am studying have thousands of values and very different
>> weights. How do I do this in PDL? I have PDL::Stats::Basic and
>> PDL::Stats::Distr installed along with PGPLOT but it's generating this basic
>> data that has me stumped.
>>
>> Thanks and regards,
>>
>> Will Schmidt
>> t.william.schm...@gmail.com <mailto:t.william.schm...@gmail.com>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org <http://slashdot.org/>!
>> http://sdm.link/slashdot_______________________________________________
>> <http://sdm.link/slashdot_______________________________________________>
>> pdl-general mailing list
>> pdl-general@lists.sourceforge.net <mailto:pdl-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/pdl-general
>> <https://lists.sourceforge.net/lists/listinfo/pdl-general>
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
pdl-general mailing list
pdl-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pdl-general