Re: [opendx-dev] Adding a new module to CVS?

Jeff Braun Thu, 11 Jan 2001 09:38:03 -0800 (PST)

On Thu, 11 Jan 2001, Chris Pelkie wrote:

> Does your routine work "only" in the case where you essentially properly
> define the original regular grid to your routine? If so, I presume one
> acquires the special knowledge of this original grid in some way from the
> researcher (experience shows THEY never know what it is cause some grad
> student 3 years ago did the actual work (:-) So in some sense, I predict I
> will end up trying the following until they miraculously remember what the
> correct grid was...).


If the original data was on a grid, then it works best to supply the same
regular grid.  I have on several occassions have had to derive the
original grid for data sets that are not my own. Fortunately, in those
cases the scattered data were organized in such a manner it is not too
difficult. The hardest being a grid with x,y rotated say 30 degrees, and
the file is output as scattered from the ymax to ymin, which makes it hard
to infer the grid from the file, but looking at a plot of the scattered
points, one can usually figure it out. Never tried it with z also rotated. 
In the little documentation that I put in the code, I did say the
next step is to add a routine to derive the grid. But since Regrid
requires a grid input, I think this would probably be best added to
AutoRegrid. 
 
> Does it work in the following cases:
> 
> 1) original grid was, say  origin 0,0,0; delta 1,1,1; counts 20,20,20
> and you define the output grid as 0,0,0; delta 1,1,1; counts 10,10,10
> (should fall on every other point exactly but does it average the
> in-between values too? or discard them?)

Currently it would discard 7/8 of the data as described above, because
they would be outside the bounds of the output grid.   What you might
really be wanting to ask is either:

a) Output grid as 0,0,0; delta 2,2,2 counts 10,10,10; in which case it
would average 8 points (unweighted). The better solution would obviously
to weight it to favor the actual grid point.

b) Output grid as 0,0,0; delta .5,.5,.5; counts 40,40,40; now every other
point would be assigned a null value (or as missing with Regrid). It might
be easy to go through the grid and average values around the missing
values. But thinking of point .5,.5,.5, the grid points above, below,
left, right, etc. are also missing values. I guess two passes through the
grid to find average values would work. Though the real solutions is to go
back and redefine the grid.

> 2) original grid was, say  origin 0,0,0; delta 1,1,1; counts 20,20,20
> and you define the output grid as 0,0,0; delta 1,1,1; counts 13,12,11
> (interpolation or bin assignment would be involved)

Same as above, all output grid points would have a value. What you might
be thinking about is output grid with origin 0,0,0; delta 1.5,1.5,1.5;
counts 13,12,11.  Now some binning or averaging would occur.

> 3) most of the input data falls on a regular grid but some schmutz around
> the edges or wherever does not? (i.e., do you get the speedup when things
> work as you expect, and the slowdown in the more general Regrid cases? or
> does your routine become godawful slow when its expectations are not met?)

Missing data around the edges does not effect the routine. In fact that is
probably why come programs output the grid data as scattered data. The
case I mentioned with 1 million grid points would involve 1 million values
being output on a regular grid. However, since only 150,000 grid points
had data, the program might have said hey I can output x,y,z, and the data
value, thus only 600,000 values, instead of 1 million. 

For scattered missing data in the interior, I have often thought a pass
through the grid to average the surronding values would not be a bad idea.
But I think it should be the same as Regrid and simply assign missing.

The routine always takes the same amount of time. It is simply one pass
through the scattered data set, assigning each scattered point to the
nearest grid point.  Whereas (I believe) the Regrid is one pass through
the scattered data set for each grid point, thus O(n) vs. O(n^2) if the
number of grid points is on the order of the number of scattered points.

> I think adding your stuff as a method to Regrid sounds elegant: just be
> sure to warn the prospective user about the incredible speedup he'll get
> only if he plays by the rules (unless, for example, you get the speedup
> (over Regrid) in version 2 above which I suspect you may not).

Definately my concern too.  Thats why it would be necessary to edit the
documentation and context senstive help to explain the radius=0 option and
what it does and does not do.

Jeff

Re: [opendx-dev] Adding a new module to CVS?

Reply via email to