Hello Konrad,
> In case I need not a specific number of triples but all triples for a
> specific number of resources, I would just use bif:rnd(10,?s) instead of
> bif:rnd(10,?s, ?p, ?o), is that correct?
Arguments of bif:rnd() listed after the first one are for fooling the
SQL optimizer only, they does not affect the logic of pattern match. So
you will need a subquery that produces a decimated collection of
subjects and then the basic graph pattern to select their properties. I
guess, something like
select ?s1 ?p1 ?o1 where {
{ select ?s2 as ?s1 where {
{ select distinct ?s3 as ?s2 where {?s3 ?p3 ?o3} }
FILTER (1> bif:rnd (10, ?s2)) } }
?s1 ?p1 ?o1 . }
(select distinct subjects ?s2 from all subjects s3, then decimate and get ?s1,
then select all properties of ?s1).
> I'm also unsure what value to assign to the "decimation factor" f for a
> given sample size of n and instance count m.
> A naive way would be to use f = m/n (and thus having a probability for
> each element to be chosen of p = n/m), however due to the variance of
> the normal distribution there would be a variety of result sizes.
> Would the following approach be good?
>
> 1. Calculating the standard deviation sigma, which is (m*p*(1-p))^1/2
> for a normal distribution, simplified to (n*(1-n/m))^1/2, which is close
> to n^1/2 for small sample sizes in relation to the whole instance count.
> Example: n = 100, m = 10^6 -> sigma ~ 10, p = 10^-4
> 2. setting p' = p * (1+ c*sigma/m), where c is the safety level, e.g. 3
> for a 99.7 % chance of getting enough instances.
> In the example p' = 10^-4 + 3*10^-5 = 1,3 *10^4
> 3. Using a "decimation factor" of 1/p', which is 10^4/1.3 in this case
> and querying the endpoint.
> 4. Randomly chosing n elements out of the result set.
That's indeed the best variant for fixed number of rows, because random
chosing N elements from known M elements is cheap: make a loop on these
M elements, and if I elements are tried already and J were selected out
of them then the probability of selecting the current element should be
(N-J)/(M-I)
Best Regards,
Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com