Hi,
I guess I forgot about the data duplication issue. For most statistics that I
calculate in parallel, the basic results are not affected that much (i.e. mean,
standard dev). But for the Pearson correlation coefficient, which is a bit more
complicated, there is a noticeable difference (0.833 for parallel and 0.99 for
serial).
Will this point sharing information become available in later versions of
ParaView? i.e. would it ever be easy to identify and count duplicates?
Thanks,
Sohail
________________________________
From: David Thompson <[email protected]>
To: Sohail Shafii <[email protected]>
Cc: "[email protected]" <[email protected]>
Sent: Friday, August 17, 2012 3:40 PM
Subject: Re: [Paraview] Numpy masking (via programm filter) not quite working
in parallel
Hi Sohail,
This is likely caused by points shared on several processes. While ParaView
splits the cells of a mesh across processes, cells on the boundary between
processes share vertices. Thus if a vertex bounds cells split across 3
processes, that vertex will appear in 3 different lists of local vertices and
thus be counted 3 times instead of once. There is nothing in ParaView for
determining which vertices are shared (and by how many processes).
This also affects the statistics filters when run in parallel on point-centered
data. For most large data, the number of vertices on inter-process boundaries
is small compared to the total so we documented the behavior for the statistics
filters but did not implement a solution (because the expected skew is small).
David
On Aug 17, 2012, at 5:05 PM, Sohail Shafii <[email protected]> wrote:
> Hi,
>
> I need to use Numpy in a lot of the programmable filters that I write, and
> I've run into differences in how its masking feature works in serial and
> parallel. Masking allows one to filter out portions of an array that do not
> pass some condition.
>
> As an example, I've created a stock paraview wavelet, and saved it as a pvd
> file. I then load it in, and run this inside of a programmable filter:
> ---
> import numpy
>
> data = inputs[0].PointData['RTData']
> # create a mask that tells us which points are equal to one
> mask = numpy.ma.masked_equal(data, 1)
> # filter data array by the mask conditions (so that other points are excluded)
> maskedPnts = numpy.extract(mask, data)
>
> print len(maskedPnts)
>
> ---
> In serial mode, I get 9261 points. With two processes, I get 2 x 4851 or
> 9702. So masking always produces more points.
>
> Any ideas to why that is? Is there anything I can do/print out to see why
> masking doesn't quite work in parallel?
>
> Thanks, Sohail
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at:
> http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:
> http://www.paraview.org/mailman/listinfo/paraview
_______________________________________________
Powered by www.kitware.com
Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html
Please keep messages on-topic and check the ParaView Wiki at:
http://paraview.org/Wiki/ParaView
Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview