Re: [ot-users] Sample transformation

roy Sun, 29 Oct 2017 05:16:45 -0700

Hi Regis,

This is great thanks. It is now working as expected.
Maybe this can be clarified in the documentation.



Pamphile ROY
Chercheur doctorant en Quantification d’Incertitudes
CERFACS - Toulouse (31) - France
+33 (0) 5 61 19 31 57
+33 (0) 7 86 43 24 22



> Le 12 oct. 2017 à 00:11, regis lebrun <[email protected]> a 
> écrit :
> 
> I found it. In order to speed up the computation of the coefficients of the 
> polynomial expansion, we developed a class named DesignProxy, which acts like 
> a cache for the evaluation of the multivariate basis oven the input sample. 
> Essentially, it contains a large matrix, with a default size given by 
> ResourceMap.GetAsUnsignedInteger("DesignProxy-DefaultCacheSize") and equals 
> to 16777216, it means 128Mo.
> 
> 
> So if you add 
> ot.ResourceMap.SetAsUnsignedInteger("DesignProxy-DefaultCacheSize", 
> smallSize) with smallSize adapted to your memory budget (eg. smallSize=0), 
> then everything should be ok.
> 
> You can also run the algorithm on the whole output sample. The DesignProxy 
> instance is built once and shared among the different marginals. You can see 
> that the memory cost of the algorithm is essentially the same for an output 
> sample of dimension 1 or 14. Concerning the computation time, a part of the 
> computation is shared between the marginals so the total cost is not 
> proportional to the output dimension, even if no parallelization is 
> implemented here (but the linear algebra is already parallelized using 
> threads).
> 
> Tell me if it solved your problem!
> 
> Régis
> 
> ________________________________
> De : roy <[email protected]>
> À : regis lebrun <[email protected]> 
> Envoyé le : Mercredi 11 octobre 2017 10h40
> Objet : Re: [ot-users] Sample transformation
> 
> 
> 
> I was able to make an extract.
> 
> I am fitting a case with functional output. So to parallelize the fitting I 
> use a function that independently construct a model per feature.
> The memory consumption is coming from every call to run() with a bump of ~130 
> Mo each time. Maybe OT can handle itself the parallelization?
> I saw that it was working without needing the loop, so maybe I should do that 
> instead.
> 
> But still, 130 Mo for a model is quite a lot.
> 
> 
> Cheers,
> 
> Pamphile ROY
> Chercheur doctorant en Quantification d’Incertitudes
> CERFACS - Toulouse (31) - France
> +33 (0) 5 61 19 31 57
> +33 (0) 7 86 43 24 22
> 
> 
> 
> Le 11 oct. 2017 à 08:44, regis lebrun <[email protected]> a 
> écrit :
>> 
>> ouch! 3-4 Go is crazy! Do you have any script to share in order to help us 
>> catching the bug? I use the FunctionalChaosAlgorithm class more than often 
>> and I never faced this kind of behavior. If there is a bug it should be a 
>> good thing to catch it asap: we enter the 1.10 release candidate phase, a 
>> good slot to fix this kind of bugs.
>> 
>> Cheers
>> 
>> Régis
>> 
>> 
>> 
>> ________________________________
>> De : roy <[email protected]>
>> À : regis lebrun <[email protected]> 
>> Cc : users <[email protected]>
>> Envoyé le : Mercredi 11 octobre 2017 0h21
>> Objet : Re: [ot-users] Sample transformation
>> 
>> 
>> 
>> Hi Régis,
>> 
>> Not sure about the leak as I only do python.
>> But using the tool I know, I was not able to free the memory(using some del 
>> and gc.collect()).
>> 
>> I saw the issue when constructing a model on a cluster (Quadrature with 121 
>> points, degree 10 in 2d) and the batch manager killed the job
>> due to memory consumption. On my Mac the memory goes to 3-4 Go for this but 
>> on the cluster it explodes.
>> 
>> As always, thanks for the quick reply :)
>> 
>> 
>> Pamphile ROY
>> Chercheur doctorant en Quantification d’Incertitudes
>> CERFACS - Toulouse (31) - France
>> +33 (0) 5 61 19 31 57
>> +33 (0) 7 86 43 24 22
>> 
>> 
>> 
>> Le 10 oct. 2017 à 23:13, regis lebrun <[email protected]> a 
>> écrit :
>> 
>> 
>>> Hi Pamphil,
>>> 
>>> Nice to know that the code *seems* to work ;-)
>>> 
>>> Are you sure that there is a memory leak? The algorithm creates potentially 
>>> large objects, which are stored into the FunctionalChaosResult member of 
>>> the algorithm. If there is a pending reference to this object, the memory 
>>> will not be released. Maybe Denis, Julien or Sofiane have more insight on 
>>> this point?
>>> 
>>> Cheers
>>> 
>>> Régis
>>> 
>>> 
>>> 
>>> ________________________________
>>> De : roy <[email protected]>
>>> À : regis lebrun <[email protected]> 
>>> Cc : users <[email protected]>
>>> Envoyé le : Mardi 10 octobre 2017 6h35
>>> Objet : Re: [ot-users] Sample transformation
>>> 
>>> 
>>> 
>>> Hi Regis,
>>> 
>>> Thanks for this long and well detailed answer!
>>> The code you provided seems to work as expected.
>>> 
>>> However during my tests I noticed that the memory was not freed correctly.
>>> Once the class FunctionalChaosAlgorithm is called, there is a memory bump 
>>> and even after calling del
>>> and gc.collect(), memory is still not freed (using memory_profiler for 
>>> that). Might be a memory leak?
>>> 
>>> Kind regards,
>>> 
>>> Pamphile ROY
>>> Chercheur doctorant en Quantification d’Incertitudes
>>> CERFACS - Toulouse (31) - France
>>> +33 (0) 5 61 19 31 57
>>> +33 (0) 7 86 43 24 22
>>> 
>>> 
>>> 
>>> Le 7 oct. 2017 à 19:59, regis lebrun <[email protected]> a 
>>> écrit :
>>> 
>>> 
>>> 
>>> Hi Pamphil,
>>>> 
>>>> You were almost right: the AdaptiveStieltjesAlgorithm is very close to 
>>>> what you are looking for, but not exactly what you need. It is the 
>>>> algorithmic part of the factory of orthonormal polynomials, the class you 
>>>> have to use is StandardDistributionPolynomialFactory, ie a factory (=able 
>>>> to build something) and not an algorithm (=something able to compute 
>>>> something). You have all the details here:
>>>> 
>>>> http://openturns.github.io/openturns/master/user_manual/_generated/openturns.StandardDistributionPolynomialFactory.html
>>>> 
>>>> I agree on the fact that the difference is quite subtle, as it can be seen 
>>>> by comparing the API of the two classes. The distinction was made at a 
>>>> time were several algorithms were competing for the task 
>>>> (GramSchmidtAlgorithm, ChebychevAlgorithm) but in fact the 
>>>> AdaptiveStieltjesAlgorithm proved to be much more accurate and reliable 
>>>> than the other algorithms, and now it is the only orthonormalization 
>>>> algorithm available.
>>>> 
>>>> Another subtle trick is the following.
>>>> 
>>>> If you create a basis this way:
>>>> basis = ot.StandardDistributionPolynomialFactory(dist)
>>>> you will get the basis associated to the *standard representative* 
>>>> distribution in the parametric family to which dist belongs. It means the 
>>>> distribution with zero mean and unit variance, or with support equals to 
>>>> [-1, 1], or dist itself if no affine transformation is able to reduce the 
>>>> number of parameters of the distribution. 
>>>> It is managed automatically within the FunctionalChaosAlgorithm, but can 
>>>> be disturbing if you do things by hand.
>>>> 
>>>> If you create a basis this way:
>>>> basis = 
>>>> ot.StandardDistributionPolynomialFactory(ot.AdaptiveStieltjesAlgorithm(dist))
>>>> then the distribution is preserved, and you get the orthonormal 
>>>> polynomials corresponding to dist. Be aware of the fact that the algorithm 
>>>> may have hard time to build the polynomials if your distribution is far 
>>>> away from its standard representative, as it may involves the computation 
>>>> of recurrence coefficients with a much wider range of variation. The 
>>>> benefit is that the orthonormality measure is exactly your distribution, 
>>>> assuming that its copula is the independent one, so you don't have to 
>>>> introduce a marginal transformation between both measures.
>>>> 
>>>> Some additional remarks:
>>>> + it looks like dist has dimension>1, as you extract its marginal 
>>>> distributions later on. AdaptiveStieltjesAlgorithm and 
>>>> StandardDistributionPolynomialFactory only work with 1D distributions (it 
>>>> is not checked by the library, my shame). What you have to do is:
>>>> 
>>>> basis = 
>>>> ot.OrthogonalProductPolynomialFactory([ot.StandardDistributionPolynomialFactory(ot.AdaptiveStieltjesAlgorithm(dist.getMarginal(i)))
>>>>  for i in range(dist.getDimension())])
>>>> Quite a long line, I know...
>>>> It will build a multivariate polynomial basis orthonormal with respect to 
>>>> the product distribution (ie with independent copula) sharing the same 1D 
>>>> marginal distributions as dist.
>>>> 
>>>> 
>>>> After that, everything will work as expected and you will NOT have to 
>>>> build the transformation (if you build it it will coincide with the 
>>>> identity function). If you encounter performance issues (the polynomials 
>>>> of high degrees take ages to be built as in 
>>>> http://trac.openturns.org/ticket/885, or there is an overflow, or the 
>>>> numerical precision is bad) then use:
>>>> basis = 
>>>> ot.OrthogonalProductPolynomialFactory([ot.StandardDistributionPolynomialFactory(dist.getMarginal(i))
>>>>  for i in range(dist.getDimension())])
>>>> and build the transformation the way you do it.
>>>> 
>>>> + if you use the FunctionalChaosAlgorithm class by providing an input 
>>>> sample and an output sample, you also have to provide the weights of the 
>>>> input sample EVEN IF the experiment given in the projection strategy would 
>>>> allow to recompute them. It is because the fact that you provide the input 
>>>> sample overwrite the weighted experiment of the projection stratey by a 
>>>> FixedExperiment doe.
>>>> 
>>>> I attached two complete examples: one using the exact marginal 
>>>> distributions and the other using the standard representatives.
>>>> 
>>>> Best regards
>>>> 
>>>> Régis
>>>> 
>>>> ________________________________
>>>> De : roy <[email protected]>
>>>> À : regis lebrun <[email protected]> 
>>>> Cc : users <[email protected]>
>>>> Envoyé le : Vendredi 6 octobre 2017 14h22
>>>> Objet : Re: [ot-users] Sample transformation
>>>> 
>>>> 
>>>> 
>>>> Hi Regis,
>>>> 
>>>> Thank you for this detailed answer.
>>>> 
>>>> - I am using the latest release from conda (OT 1.9, python 3.6.2, latest 
>>>> numpy, etc.) ,
>>>> - For the sample, I need it to generate externally the output (cost code 
>>>> that cannot be integrated into OT as model),
>>>> - I have to convert ot.Sample into np.array because it is then used by 
>>>> other functions to create the simulations, etc.
>>>> 
>>>> If I understood correctly, I can create the projection strategy using this 
>>>> snippet:
>>>> 
>>>> basis = ot.AdaptiveStieltjesAlgorithm(dist)
>>>> measure = basis.getMeasure()
>>>> quad = ot.Indices(in_dim)
>>>> for i in range(in_dim):
>>>> quad[i] = degree + 1
>>>> 
>>>> comp_dist = ot.GaussProductExperiment(measure, quad)
>>>> proj_strategy = ot.IntegrationStrategy(comp_dist)
>>>> 
>>>> inv_trans = 
>>>> ot.Function(ot.MarginalTransformationEvaluation([measure.getMarginal(i) 
>>>> for i in range(in_dim)], distributions))
>>>> sample = np.array(inv_trans(comp_dist.generate()))
>>>> 
>>>> 
>>>> It seems to work. Except that the basis does not work with 
>>>> ot.FixedStrategy(basis, dim_basis). I get a non implemented method error.
>>>> 
>>>> After I get the sample and the corresponding output, what is the way to 
>>>> go? Which arguments do I need to use for the
>>>> ot.FunctionalChaosAlgorithm? 
>>>> 
>>>> I am comparing the Q2 and on Ishigami and I was only able to get correct 
>>>> results using:
>>>> 
>>>> pc_algo = ot.FunctionalChaosAlgorithm(sample, output, dist, trunc_strategy)
>>>> 
>>>> But for least square strategy I had to use this:
>>>> 
>>>> pc_algo = ot.FunctionalChaosAlgorithm(sample, output)
>>>> 
>>>> 
>>>> Is it normal?
>>>> 
>>>> 
>>>> Pamphile ROY
>>>> Chercheur doctorant en Quantification d’Incertitudes
>>>> CERFACS - Toulouse (31) - France
>>>> +33 (0) 5 61 19 31 57
>>>> +33 (0) 7 86 43 24 22
>>>> 
>>>> 
>>>> 
>>>> Le 5 oct. 2017 à 15:40, regis lebrun <[email protected]> a 
>>>> écrit :
>>>> 
>>>> 
>>>> 
>>>> Hi Pamphile,
>>>> 
>>>> 
>>>>> 
>>>>> 1) The problem:
>>>>> The problem you get is due to the fact that in your version of OpenTURNS 
>>>>> (1.7 I suppose), the GaussProductExperiment class has a different way to 
>>>>> handle the input distribution than the other WeightedExperiment classes: 
>>>>> it generates the quadrature rule of the *standard representatives* of the 
>>>>> marginal distributions instead of the marginal distributions. It does not 
>>>>> change the rate of convergence of the PCE algorithm and allows to use 
>>>>> specific algorithms for distributions with known orthonormal polynomials. 
>>>>> It is not explained in the documentation and if you ask the doe for its 
>>>>> distribution it will give you the initial distribution instead of the 
>>>>> standardized one.
>>>>> 
>>>>> 2) The mathematical background:
>>>>> The generation of quadrature rules for arbitrary 1D distributions is a 
>>>>> badly conditioned problem. Even if the quadrature rule is well-defined 
>>>>> (existence of moments of any order, distribution characterized by these 
>>>>> moments), the application that maps the recurrence coefficients of the 
>>>>> orthogonal polynomials to their value can have a very large condition 
>>>>> number. As a result, the adaptive integration used to compute the 
>>>>> recurrence coefficients of order n, based on the values of the 
>>>>> polynomials of degree n-1 and n-2, can lead to wrong values and all the 
>>>>> process falls down.
>>>>> 
>>>>> 3) The current state of the software:
>>>>> Since version 1.8, OpenTURNS no more generates the quadrature rule of the 
>>>>> standard representatives, but the quadrature rule of the actual marginal 
>>>>> distributions. The AdaptiveStieltjesAlgorithm class, introduced in 
>>>>> release 1.8, is much more robust than the previous orthonormalization 
>>>>> algorithms and is able to handle even stiff problems. There are still 
>>>>> difficult situations (distributions with discontinuous PDF inside of the 
>>>>> range, fixed in OT 1.9, or really badly conditioned distributions, 
>>>>> hopefully fixed when ticket#885 will be solved) but most usual situations 
>>>>> are under control even with marginal degrees of order 20.
>>>>> 
>>>>> 4) The (probable) bug in your code and the way to solve it
>>>>> You must be aware of the fact that the distribution you put into your 
>>>>> WeightedExperiment object will be superseded by the distribution 
>>>>> corresponding to your OrthogonalBasisFactory inside of the 
>>>>> FunctionalChaosAlgorithm. If you need to have the input sample before to 
>>>>> run the functional chaos algorithm, then you have to build your 
>>>>> transformation by hand. Assuming that you already defined your projection 
>>>>> basis called 'myBasis', your marginal integration degrees 'myDegrees' and 
>>>>> your marginal distributions 'myMarginals', you have to write (in OT 1.7):
>>>>> 
>>>>> # Here the explicit cast into a NumericalMathFunction is to be able to 
>>>>> evaluate the transformation over a sample
>>>>> myTransformation = 
>>>>> ot.NumericalMathFunction(ot.MarginalTransformationEvaluation([myBasis.getDistribution().getMarginal(i)
>>>>>  for i in range(dimension), myMarginals))
>>>>> sample = 
>>>>> myTransformation(ot.GaussProductExperiment(myBasis.getDistribution(), 
>>>>> myDegrees).generate())
>>>>> 
>>>>> 
>>>>> You should avoid to cast OT objects into np objects as much as possible, 
>>>>> and if you cannot avoid these casts you should do them only in the 
>>>>> sections where they are needed. They can be expansive for large objects, 
>>>>> and if the sample you get from generate() is used only as an argument of 
>>>>> a NumericalMathFunction, then it will be converted back into a 
>>>>> NumericalSample!
>>>>> 
>>>>> Best regards
>>>>> 
>>>>> Régis
>>>>> ________________________________
>>>>> De : roy <[email protected]>
>>>>> À : users <[email protected]> 
>>>>> Envoyé le : Jeudi 5 octobre 2017 11h13
>>>>> Objet : [ot-users] Sample transformation
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am facing consistency concerns in the API regarding distributions and 
>>>>> sampling.
>>>>> 
>>>>> The initial goal was to get the sampling for Polynomial Chaos as I must 
>>>>> not use the model variable.
>>>>> So for least square strategy I do something like this:
>>>>> 
>>>>> proj_strategy = ot.LeastSquaresStrategy(montecarlo_design)
>>>>> sample = np.array(proj_strategy.getExperiment().generate())
>>>>> 
>>>>> sample is correct as the bounds of each feature lie in the corresponding 
>>>>> ranges.
>>>>> 
>>>>> But now if I want to use IntegrationStrategy:
>>>>> 
>>>>> ot.IntegrationStrategy(ot.GaussProductExperiment(dists, list))
>>>>> sample = np.array(proj_strategy.getExperiment().generate())
>>>>> 
>>>>> sample’s outputs lie between [-1, 1] which does not corresponds to the 
>>>>> distribution I have initially.
>>>>> 
>>>>> So I used the conversion class but it does not work well with 
>>>>> GaussProductExperiment as it requires [0, 1] instead of [-1, 1].
>>>>> 
>>>>> Thus I use this hack:
>>>>> 
>>>>> # Convert from [-1, 1] -> input distributions
>>>>> marg_inv_transf = ot.MarginalTransformationEvaluation(distributions, 1)
>>>>> sample = (proj_strategy.getExperiment().generate() + 1) / 2.
>>>>> 
>>>>> 
>>>>> Is it normal that the distribution classes are not returning in the same 
>>>>> intervals?
>>>>> 
>>>>> 
>>>>> Thanks for your support!
>>>>> 
>>>>> 
>>>>> Pamphile ROY
>>>>> Chercheur doctorant en Quantification d’Incertitudes
>>>>> CERFACS - Toulouse (31) - France
>>>>> +33 (0) 5 61 19 31 57
>>>>> +33 (0) 7 86 43 24 22
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> OpenTURNS users mailing list
>>>>> [email protected]
>>>>> http://openturns.org/mailman/listinfo/users
>>>>> <example.py><example_standard.py>
>>>>> 
>>>>> _______________________________________________
>>> OpenTURNS users mailing list
>>> [email protected]
>>> http://openturns.org/mailman/listinfo/users
>>> 
>>> 
>> _______________________________________________
>> OpenTURNS users mailing list
>> [email protected]
>> http://openturns.org/mailman/listinfo/users
>>

_______________________________________________
OpenTURNS users mailing list
[email protected]
http://openturns.org/mailman/listinfo/users

Re: [ot-users] Sample transformation

Reply via email to