Hi Andy,

I'll take a look at the in transit approach. I'm essentially extracting a slice 
every N steps and collecting it as a single slice on a single processor each 
time, accumulating the slices as I go. Then every M*N steps, I want that single 
processor to do an expensive operation and save the output.


So if the in transit approach you mentioned would work well for that, I'll give 
it a shot. I'm doing this on some SGI and Cray machines, I don't know if that 
has special ways to do this like you mentioned exists at NERSC.


Thanks,


Tim


________________________________
From: Andy Bauer <[email protected]>
Sent: Tuesday, October 25, 2016 4:43 PM
To: Ufuk Utku Turuncoglu (BE)
Cc: Gallagher, Timothy P; [email protected]
Subject: Re: [Paraview] Non-blocking coprocessing

Hi Tim,

This may be better to do as an in transit set up. This way the processes would 
be independent. Through Catalyst I'd worry about all of the processes waiting 
on the global rank 0 doing work before all of the other Catalyst ranks return 
control to the simulation. Depending on the system you're on you could do this 
communication through file IO for something like Cori@NERSC with its burst 
buffers.

If you want to do down the in transit path, let me know and I can see about 
digging up some scripts that I had for that.

Best,
Andy

On Tue, Oct 25, 2016 at 3:57 AM, Ufuk Utku Turuncoglu (BE) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Tim,

I am not sure about the non-blocking type communication is supported by 
ParaView, Catalyst or not but i think that assigning an extra core for global 
reduction is possible. You could use MPI communication for this purpose. So, 
look at following code of mine for overloaded coprocessorinitializewithpython 
function. As you can see, it also gets the MPI communicatior and it allows to 
use a pool of processor (or cores) for co-processing. In my case, it is running 
smoothly without any problem. I hope it helps.

--ufuk

extern "C" void my_coprocessorinitializewithpython_(int *fcomm, const char* 
pythonScriptName, const char strarr[][255], int *size) {
  if (pythonScriptName != NULL) {
    if (!g_coprocessor) {
      g_coprocessor = vtkCPProcessor::New();
      MPI_Comm handle = MPI_Comm_f2c(*fcomm);
      vtkMPICommunicatorOpaqueComm *Comm = new 
vtkMPICommunicatorOpaqueComm(&handle);
      g_coprocessor->Initialize(*Comm);
      vtkSmartPointer<vtkCPPythonScriptPipeline> pipeline = 
vtkSmartPointer<vtkCPPythonScriptPipeline>::New();
      pipeline->Initialize(pythonScriptName);
      g_coprocessor->AddPipeline(pipeline);
      //pipeline->FastDelete();
    }

    if (!g_coprocessorData) {
      g_coprocessorData = vtkCPDataDescription::New();
      // must be input port for all model components and for all dimensions
      for (int i = 0; i < *size; i++) {
        g_coprocessorData->AddInput(strarr[i]);
        std::cout << "adding input port [" << i << "] = " << strarr[i] << 
std::endl;

      }
    }
  }
}

On 25/10/16 01:56, Gallagher, Timothy P wrote:

Hello again!


I'm looking at using coprocessing for something that may take awhile to 
actually compute, so I would like to do it in a non-blocking fashion. 
Essentially I am going to be extracting data from the simulation into some 
numpy arrays (so once copied, the original data in the pipeline can change) and 
then send it to the root processor to do some global operations.


The global operations may take some time (not minutes, but longer than I want 
my simulation to wait for it to complete). Is there a way to do part of the 
pipeline in a non-blocking fashion, where the script calls a function that will 
write out a data file when processing and then returns control to the 
simulation prior to the function completion? Will I have to do something in 
native-python, like spawn a new thread to do the function call, or is there a 
way to do it with how Paraview operates?


On a related note, I may not want to have the root processor of the 
coprocessing to have any simulation code running on it. If I am running my 
simulation on N cores, is it possible to have N+1 cores running the coprocessor 
pipeline where the extra core receives the global data reduction from the N 
cores and does the crunching? Or am I starting to ask for too much there?


Thanks as always,


Tim



_______________________________________________
Powered by www.kitware.com<http://www.kitware.com>

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Search the list archives at: http://markmail.org/search/?q=ParaView

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/paraview



_______________________________________________
Powered by www.kitware.com<http://www.kitware.com>

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Search the list archives at: http://markmail.org/search/?q=ParaView

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/paraview


_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Search the list archives at: http://markmail.org/search/?q=ParaView

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/paraview

Reply via email to