Re: [ot-users] Pickling and parallelism

regis lebrun Sat, 22 Oct 2016 06:17:09 -0700

Hi Pamphile,

First of all, the SensitivityAnalysis class allows you to compute first and 
second order Sobol indices, in addition to the first order total indices. These 
indices are computed thanks to the computeSobolIndices() method, which is 
protected so is not exposed into the Python interface.
The behaviour of the class is the following:
+ At the first call to either getFirstOrderIndices() or getTotalOrderIndices(), 
if no previous call to getSecondOrderIndices() has been made, a call to the 
computeSobolIndices() method is done and both the first order and total order 
indices are computed. THIS COMPUTATION IS DONE FOR ALL THE OUTPUTS IN ONE RUN.
+ At a second call to either getFirstOrderIndices() or getTotalOrderIndices(), 
no new computation is done, you get a value already computed and stored into 
the relevant data structure
+ At the first call to getSecondOrderIndices(), all the indices (first order, 
total order and second order) are computed. THIS COMPUTATION IS DONE FOR ALL 
THE OUTPUTS IN ONE RUN.
+ At a second call to either getFirstOrderIndices(), getTotalOrderIndices() or 
getSecondOrderIndices(), no new computation is done.


The good practice is then to first call getSecondOrderIndices() if you need 
both first and second order indices, otherwise the first order indices will be 
computed twice.

Concerning the loop over the outputs, you will not get any acceleration if you 
parallelize the loop as the FIRST call to any of the getXXXIndices(int) will 
start the computation of ALL the indices for ALL the outputs. The key point is 
thus to efficiently compute all these indices, which is done by providing an 
efficient implementation of the _exec_sample() method in your wrapper.

I checked the source code of SensitivityAnalysis::computeSobolIndices() in 
OT1.6, and the conclusion is:

+ there is no use of TBBs here
+ the model you provide to the constructor is called over samples (hence the 
interest in providing an _exec_sample method into your wrapper)
+ the model is called over the two samples you provide to the constructor, each 
call is N evaluations
+ the model *should be* called over samples of much larger size if all the 
scramble inputs were pre-computed, which could exhaust the available memory. To 
avoid this, the evaluation over the scrambled inputs are partitioned into 
chunks of size blockSize (excepted the last chunk which can have a smaller size 
as it contains the remaining computations), which is precisely what you tune 
thanks to the setBlockSize() method.

In your case, depending on the amount of memory you have and the distribution 
scheduler you use, you should set the block size to the largest possible value 
and call getSecondOrderIndices() first (the argument can be any valid index 
value) if you need second order indices, as it will compute all the indices at 
once, instead of calling first getFirstOrderIndices() then 
getSecondOrderIndices() as it would compute the first order and total order 
indices twice.

The technology you use to implement _exec_sample is your choice, but here 
otwrapy could help.

The key point is that you don't need to parallelize the loop over the output 
dimension, but the evaluation of your model over a sample.

Cheers

Régis LEBRUN

----- Mail original -----
> De : Pamphile ROY <[email protected]>
> À : regis lebrun <[email protected]>
> Cc : users <[email protected]>
> Envoyé le : Samedi 22 octobre 2016 13h26
> Objet : Re: [ot-users] Pickling and parallelism
> 
> Hi Régis, 
> 
> Thanks for your fast reply.
> 
> From what I understand, otwrapy would do the same thing as 
> sobol.setBlockSize(int(ot.ResourceMap.Get("parallel-threads"))).
> So your suggesting to do instead:
> import otwrapy as otw
> model = otw.Parallelizer(ot.PythonFunction(2, 400, func), n_cpus=10)
> If so, what is the benefit of doing it?
> 
> This will only be useful when doing sobol.getFirstOrderIndices(0).
> 
> What I am trying to do is to perform several analysis as I have a functional 
> output:
> 
> model = ot.PythonFunction(2, 400, func)
> sobol = ot.SensitivityAnalysis(sample1, sample2, model)
> sobol.setBlockSize(int(ot.ResourceMap.Get("parallel-threads")))
> indices = [[], [], []]
> for i in range(400):
>             indices[1].append(np.array(sobol.getFirstOrderIndices(i)))
>             indices[2].append(np.array(sobol.getTotalOrderIndices(i)))
> 
> This work but I want the for loop to be parallel so I tried:
> 
> from pathos.multiprocessing import ProcessingPool, cpu_count
> 
> model = ot.PythonFunction(2, 400, func)
> sobol = ot.SensitivityAnalysis(sample1, sample2, sobol_model)
> sobol.setBlockSize(int(ot.ResourceMap.Get("parallel-threads")))
> 
> def map_indices(i):
>     first = np.array(sobol.getFirstOrderIndices(i))
>     total = np.array(sobol.getTotalOrderIndices(i))
>     return first, total
> 
> pool = ProcessingPool(cpu_count())
> results = pool.imap(map_indices, range(400))
> first = np.empty(400)
> total = np.empty(400)
> 
> for i in range(400):
>     first[i], total[i] = results.next()
> 
> But in order for this to work, the map_indice function has to be pickled 
> (here 
> it uses dill that can serialize close to everything).
> Hence the error I get.
> 
> 
> Thanks again,
> 
> Pamphile ROY
> 
> 
> 
> ----- Mail original -----
> De: "regis lebrun" <[email protected]>
> À: "Pamphile ROY" <[email protected]>, "users" 
> <[email protected]>
> Envoyé: Samedi 22 Octobre 2016 12:48:29
> Objet: Re: [ot-users] Pickling and parallelism
> 
> Hi,
> 
> I understand that you have a model that has been interfaced with OpenTURNS 
> using 
> the OpenTURNSPythonFunction class and you want to perform sensitivity 
> analysis 
> using the SensitivityAnalysis class, and you would like to benefit from some 
> parallelism in the execution of the analysis.
> 
> The correct way to do that is to provide the _exec_sample method. In this 
> method, you are free to use any multithreading/multiprocessing capability you 
> want. You may consider either otwrapy (http://felipeam86.github.io/otwrapy/) 
> or 
> one of the solutions proposed here: 
> 
> http://openturns.github.io/developer_guide/wrapper_development.html or your 
> favorite tool. Then, you will get rid of the GIL.
> 
> I cannot help for the second point, as it is far beyond my knowledge on 
> python 
> and serialization.
> 
> Cheers
> 
> Régis LEBRUN
> 
>> ________________________________
>>  De : Pamphile ROY <[email protected]>
>> À : [email protected] 
>> Envoyé le : Vendredi 21 octobre 2016 21h56
>> Objet : [ot-users] Pickling and parallelism
>> 
>> 
>> 
>> Hi, 
>> 
>> 
>> I would have 2 questions:
>> 
>> 
>> 1. I have seen that the class (and others) allows some multithreading. From 
> my understanding, it is based on TBB and only multithreads the tasks.
>> Thus it is concerned by the GIL. Is there an automatic set up like for 
> multithreading but for multiprocessing instead? Any advice?
>> 
>> 
>> 2. Using pathos for multiprocessing, I am trying to dump an instance of 
> SensitivityAnalysis but I cannot get it to work even with dill.
>> For information, I am running under macOS sierra and this is OT 1.6 (maybe 
> it is coming from here... I am going to upgrade but it means a refactoring on 
> my 
> side).
>> Here is the following traceback:
>> 
>> 
>> sobol = ot.SensitivityAnalysis(sample1, sample2, sobol_model)
>> _f = dill.dumps(sobol)
>> 
>> File 
> "/Users/Pamphile/.virtualenvs/jpod/lib/python2.7/site-packages/dill/dill.py", 
> line 243, in dumps
>> dump(obj, file, protocol, byref, fmode, recurse)#, strictio)
>> File 
> "/Users/Pamphile/.virtualenvs/jpod/lib/python2.7/site-packages/dill/dill.py", 
> line 236, in dump
>> pik.dump(obj)
>> File 
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
>  
> line 224, in dump
>> self.save(obj)
>> File 
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
>  
> line 306, in save
>> rv = reduce(self.proto)
>> File 
> "/Applications/OpenTURNS/openturns/lib/python2.7/site-packages/openturns/common.py",
>  
> line 258, in Object___getstate__
>> study.add('instance', self)
>> File 
> "/Applications/OpenTURNS/openturns/lib/python2.7/site-packages/openturns/common.py",
>  
> line 688, in add
>> return _common.Study_add(self, *args)
>> NotImplementedError: Wrong number or type of arguments for overloaded 
> function 'Study_add'.
>> Possible C/C++ prototypes are:
>> OT::Study::add(OT::InterfaceObject const &)
>> OT::Study::add(OT::String const &,OT::InterfaceObject const 
> &,OT::Bool)
>> OT::Study::add(OT::String const &,OT::InterfaceObject const &)
>> OT::Study::add(OT::PersistentObject const &)
>> OT::Study::add(OT::String const &,OT::PersistentObject const 
> &,OT::Bool)
>> OT::Study::add(OT::String const &,OT::PersistentObject const &)
>> 
>> 
>> Thank you for your help!
>> 
>> 
>> 
>> 
>> Pamphile
>> _______________________________________________
>> OpenTURNS users mailing list
>> [email protected]
>> http://openturns.org/mailman/listinfo/users
>> 
>> 
>> 
> 
_______________________________________________
OpenTURNS users mailing list
[email protected]
http://openturns.org/mailman/listinfo/users

Re: [ot-users] Pickling and parallelism

Reply via email to