Re: [ot-users] Grouped Sobol indices

BAUDIN Michael Thu, 05 Jul 2018 01:52:01 -0700

Hi Régis,

Considering a group of output variables would be interesting indeed. I cannot 
however clearly see how this can be done or used. Would it allow to select the 
variables having influence on several outputs ?


By “produce the conditional expectations as functions”, you mean create a 
function which takes a group of variables as input and returns the conditional 
expectation given the values of the variables in the group as output ? That is, 
if the group is denoted by Xu, the function is E(Y|Xu). In other words, this 
produces a function which averages the output Y for the variables not in the 
group Xu. Ok : this reduces the dimension of the function, because this 
function has only Xu as inputs, instead of all the input vector X. Was it your 
purpose ? This could be very useful, as a post-processing of the chaos : this 
would combine the chaos, the sensitivity analysis step and the dimension 
reduction in one single step.

Best regards,

Michaël

De : [email protected] 
[mailto:[email protected]]
Envoyé : mardi 3 juillet 2018 18:23
À : [email protected]; BAUDIN Michael <[email protected]>
Objet : Re: RE: [ot-users] Grouped Sobol indices

Hi Michael,

You may also want to combine the contribution of a group of input variables on 
a group of output variables, eg when the output is the vector of coefficients 
of a Karhunen Loeve expansion... At the end it produces a lot of combinations!
It may also be interesting to produce the conditional expectations as functions 
as a post-processing of a chaos expansion. Your opinion on this point?

Cheers

Régis

Le mardi 3 juillet 2018 à 17:49:31 UTC+2, BAUDIN Michael 
<[email protected]<mailto:[email protected]>> a écrit :



Hi Régis,



Indeed, my mail is back, but not the data… yet. Hence, I would be happy to show 
a basic Python implementation of the total group Sobol’ indice, but the most 
efficient way is to get a template script … which is in the encrypted hard 
drive of the failed computer. ☹ This should be solved within a couple of days, 
I hope.



Indeed a generic regular expression engine could to the trick, but it is much 
more complicated than required, I think. After all, here is all the choices we 
have :

•         First or total indice

•         By variable or by group

This generates only 4 indices that can be required, to my knowledge :

•         first order indice by variable

•         total order indice by variable

•         first order indice by group

•         total order indice by group

and this is it.



In attachment, I put three files in order to show a basic test case for what I 
would like to do

•         crue-generate-with-dependency.py : generates a sample with 8 
variables in input and 1 variable in output

•         crue-8var-copula.csv

•         chaos-by-group.py : creates a chaos from the sample and explores the 
indices from it.

This last script ends with the lines :



# Groups of variables (known)

allgroups = [[0],[1],[2],[3,4,5],[6,7]]

# Create a functional chaos model

algo = ot.FunctionalChaosAlgorithm(x, y)

print(algo.getAdaptiveStrategy())

print(algo.getProjectionStrategy())

algo.run()

result = algo.getResult()

fccr = ot.FunctionalChaosRandomVector(result)

for group in allgroups:

    s = fccr.getSobolGroupedIndex(group)

    # TODO : get the total order indice of each group

    print("Group=%s, S=%.2f" % (str(group),s))





The script I have in my hard drive allow to loop over the multi-indices, 
computing the required indices within the loop : I will complete my message 
when possible. Meanwhile, I think that there is an issue within the API since 
it does not seem possible to get the enumeration rule from the functional chaos 
“algo” : am I wrong ?



Your analysis of the estimate of these grouped indices by sampling make it 
clear that this is an advantage of the chaos : estimating the group indices is 
much easier with chaos than by sampling.



Thank you very much with the copula decomposition script : I think that I can 
use almost as is !



Best regards,



Michaël





De : 
[email protected]<mailto:[email protected]> 
[mailto:[email protected]]
Envoyé : mardi 19 juin 2018 10:40
À : [email protected]<mailto:[email protected]>; BAUDIN Michael 
<[email protected]<mailto:[email protected]>>
Objet : Re: [ot-users] Grouped Sobol indices



Hi Michael,



Nice to see that your email is back ;-)



Concerning your first point and a part of your post-scriptum, an idea could be 
to provide a kind of regular expression based filter to compute the indices you 
want. Given a set of input variables, you may want to collect all the 
coefficients involving each of the variables within this set with a positive 
exponent, or all the coefficients involving at least one of these variables 
with a positive exponent, or at least all these variables with a positive 
exponent, or all the coefficients involving none of these variables with a 
positive exponent and so on. And you may want to collect these coefficients 
output component by output component, or for a given set of output component, 
or... You see what I mean? At one point we have to bound the number of methods, 
so a generic extraction mechanism and a dedicated set of filters may be the 
most efficient way to go. Do you have a list of indices of interest, with their 
formal definition in terms of conditional expectation? It would help to see how 
to implement this method.



Concerning the estimation of group indices by sampling, as soon as you are able 
to express them in terms of variance of conditional expectations, it is quite 
straightforward to implement a scheme similar to the Saltelli method. The main 
difficulties are the potentially large number of indices to compute, and the 
huge number of simulations required to sample conditional expectations with a 
large conditioning dimension.



For the last point, there is currently no way to inspect a given multivariate 
distribution to see how to factor it into smaller pieces. The point of view in 
OpenTURNS is the opposite: you can build such a product multivariate 
distribution using the ComposedCopula class (each sub-copula describes the 
dependence within a group of dependent variables), but there is no easy way to 
perform the inverse computation. From a C++ perspective, you can ask for the 
copula of your distribution and try to dynamic_cast it into a ComposedCopula to 
get the associated ComposedCopula, but if it is the case, odds are large that 
you could have access to this information by looking at your Python script a 
few lines above!

Nevertheless, if you really need this kind of service, it can be implemented 
quite easily using Csiszar divergences. In the following script, I use the 
(squared) Hellinger divergence to check if a given distribution is a product of 
two lower dimensional distributions at a given position. The computation of the 
divergence is done by sampling (Monte Carlo). If you have *exact* independence, 
the sampling size can be very low (less than 100) as the divergence is 
*exactly* zero and the kernel function drops to nearly zero at each point of 
the sample. From this script, it is straightforward to get the associated block 
distributions.



import openturns as ot

import numpy as np

from time import time



def isProduct(distribution, cut, N = 10000, epsilon = 1.0e-16):

    dim = distribution.getDimension()

    distribution1 = distribution.getMarginal([i for i in range(cut)])

    distribution2 = distribution.getMarginal([cut+i for i in range(dim-cut)])

    def kernel(x):

        x = np.array(x)

        s = x.shape

        x1 = x[:, 0:cut]

        pdf1 = np.array(distribution1.computePDF(x1))

        x2 = x[:, cut:dim]

        pdf2 = np.array(distribution2.computePDF(x2))

        pdf = np.array(distribution.computePDF(x))

        return ((1.0 - np.sqrt(pdf1 * pdf2 / pdf))**2 * pdf).reshape(s[0], 1)



    value = ot.PythonFunction(dim, 1, 
func_sample=kernel)(distribution.getSample(N)).computeMean()[0]

    return (value < epsilon)



def decompose(distribution, N = 10000, epsilon = 1.0e-16):

    dim = distribution.getDimension()

    blocks = list()

    currentBlock = [0]

    for i in range(1, dim-1):

        if isProduct(distribution, i, N, epsilon):

            blocks.append(currentBlock)

            currentBlock = [i]

        else:

            currentBlock.append(i)

    currentBlock.append(dim-1)

    blocks.append(currentBlock)

    return blocks



if __name__ == "__main__":

    d1 = 2

    d2 = 1

    d3 = 4

    dimension = d1 + d2 + d3

    R = ot.CorrelationMatrix(dimension)

    for i in range(d1):

        for j in range(i):

            R[i, j] = 0.1 / (1 + i + j)

    for i in range(d2):

        for j in range(i):

            R[i + d1, j + d1] = 0.1 / (1 + i + j)

    for i in range(d3):

        for j in range(i):

            R[i + d1 + d2, j + d1 + d2] = 0.1 / (1 + i + j)

    print(R)



    t0 = time()

    print("Test 1: Normal distribution")

    distribution = ot.Normal([0.0]*dimension, [1.0]*dimension, R)

    print("blocks=", decompose(distribution))

    print("Test 2: Composed copula")

    distribution = ot.ComposedCopula([ot.ClaytonCopula(5.0)]*5 + 
[ot.NormalCopula(R)])

    print("blocks=", decompose(distribution))

    print("t=", time() - t0, "s")





Cheers



Régis LEBRUN









Le lundi 18 juin 2018 à 16:53:20 UTC+2, BAUDIN Michael 
<[email protected]<mailto:[email protected]>> a écrit :





Hi !



I have a computer code with dependent input vector and I would like to perform 
sensitivity analysis on the vector output. To do this, I would like to estimate 
the Sobol’ indices. In order to workaround the dependent inputs, I would like 
to gather dependent input variables into independent groups and to estimate the 
Sobol’ indices on these groups.



•         My first idea is to use a chaos expansion in order to estimate Sobol’ 
indices. It is easy to estimate the first order Sobol’ indices with the 
getSobolGroupedIndex method of the FunctionnalChaosSobolIndices class:



http://openturns.github.io/openturns/master/user_manual/response_surface/_generated/openturns.FunctionalChaosSobolIndices.html



However, the corresponding total order indices method seem to be unavailable: 
there is no getSobolGroupedTotalIndex ?



If so, this is a pity, since this is just another “simple” arrangement of the 
coefficients from the chaos expansion. It would allow to estimate the total 
effect of a group. Compared to the group Sobol first indice, it would allow to 
see if there are interactions with this group and other variables outside from 
the group. Also, if the group Sobol total indice is close to zero, this means 
that this group has no effect of the variability of the output.



•         The second topic of my post is the SobolIndicesAlgorithm class :



http://openturns.github.io/openturns/master/user_manual/_generated/openturns.SobolIndicesAlgorithm.html



I can see that there is no “group” method here. Is there a theoretical way of 
computing grouped Sobol’ indices with these estimators?



•         Finally, I would like to know if there is a way of using the joint 
distribution in order to generate the independent groups of dependent input 
variables? In other words, with a given multivariate joint distribution, is 
there a way of generating a list of sub-lists of indices such that:

o   Each sub-list contain dependent variables,

o   Two sub-lists are independent.

With such a goal, what classes / methods to use ?



Best regards,



Michaël



PS

By the way, the total Sobol’ indices of a group of variables lead to a way of 
removing input variables from the model. Indeed, we could compute the largest 
group having a total Sobol’ indice lower than a given small threshold (says 
0.01 for example). By design, all variables from this group could be replaced 
by fixed inputs without changing the variability of the output. Furthermore, if 
we had the distribution of this estimator, we could guarantee this property 
with, say, 95% confidence (i.e. including the variability from the estimate).

In order to compute this group, the variables could first be ordered by 
decreasing total Sobol’ indices. Then, we could add the variables into the 
group until the total Sobol’ indice exceeds the threshold: the final group is 
the one just before this step.

With this raw algorithm, the chaos expansion has a great advantage: given that 
the decomposition is known, the estimate is almost CPU-free.

Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à 
l'intention exclusive des destinataires et les informations qui y figurent sont 
strictement confidentielles. Toute utilisation de ce Message non conforme à sa 
destination, toute diffusion ou toute publication totale ou partielle, est 
interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si 
vous avez reçu ce Message par erreur, merci de le supprimer de votre système, 
ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support 
que ce soit. Nous vous remercions également d'en avertir immédiatement 
l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.
____________________________________________________

This message and any attachments (the 'Message') are intended solely for the 
addressees. The information contained in this Message is confidential. Any use 
of information contained in this Message not in accord with its purpose, any 
dissemination or disclosure, either whole or partial, is prohibited except 
formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any 
part of it. If you have received this message in error, please delete it and 
all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or 
virus-free.

_______________________________________________
OpenTURNS users mailing list
[email protected]<mailto:[email protected]>
http://openturns.org/mailman/listinfo/users

Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à 
l'intention exclusive des destinataires et les informations qui y figurent sont 
strictement confidentielles. Toute utilisation de ce Message non conforme à sa 
destination, toute diffusion ou toute publication totale ou partielle, est 
interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si 
vous avez reçu ce Message par erreur, merci de le supprimer de votre système, 
ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support 
que ce soit. Nous vous remercions également d'en avertir immédiatement 
l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.
____________________________________________________

This message and any attachments (the 'Message') are intended solely for the 
addressees. The information contained in this Message is confidential. Any use 
of information contained in this Message not in accord with its purpose, any 
dissemination or disclosure, either whole or partial, is prohibited except 
formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any 
part of it. If you have received this message in error, please delete it and 
all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or 
virus-free.



Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à 
l'intention exclusive des destinataires et les informations qui y figurent sont 
strictement confidentielles. Toute utilisation de ce Message non conforme à sa 
destination, toute diffusion ou toute publication totale ou partielle, est 
interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si 
vous avez reçu ce Message par erreur, merci de le supprimer de votre système, 
ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support 
que ce soit. Nous vous remercions également d'en avertir immédiatement 
l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.
____________________________________________________

This message and any attachments (the 'Message') are intended solely for the 
addressees. The information contained in this Message is confidential. Any use 
of information contained in this Message not in accord with its purpose, any 
dissemination or disclosure, either whole or partial, is prohibited except 
formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any 
part of it. If you have received this message in error, please delete it and 
all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or 
virus-free.

_______________________________________________
OpenTURNS users mailing list
[email protected]
http://openturns.org/mailman/listinfo/users

Re: [ot-users] Grouped Sobol indices

Reply via email to