Re: [opendx-users] Question

Chris Pelkie Wed, 22 Mar 2000 05:13:24 -0800 (PST)

>Hello,
>
>I have recently started using OpenDX data model for our software
>development. One thing, I am not clear as to how I can use
>this model for extremely huge data sets ( > 100GB ). Is it
>necessary to load series members at the beginning of calculations.
>
>For one CFD data set of 10GB ( one file ), how can I use this
>model.
>
>I would like some information about OpenDX used in co-processing
>environment, if it is available.
>
>
>Thanks
>
>
>Chaman Singh
>


I'll start this off and I'm sure others will add their thoughts. First,
welcome to the list!

The DX data model is quite distinct from the current implementation of the
openDX visualization program that makes images from data. This may simply
be a fine point of language, but it will help you understand us if I make
it clear. The data model is a conceptual organization scheme for describing
a mapping of sampled data on a spatial coordinate system. As such, there is
no inherent limit on the size of object you can describe with it. In
practice, of course, your machine's MAXINT and MAXFLOAT might place limits
on your object description.

The software you have downloaded places very real-world limits on the data
you can process, by virtue of the pesky fact that you have to run it on
real hardware. The single biggest limitation is RAM memory, since the
implementation of openDX's memory management is entirely RAM based
(discounting swap as a mere artifact of machine implementation, not really
"extra" RAM). This differs from a limited number of other programs which
can buffer large objects on disk and load them apparently "seamlessly" as
you pan around, say, an image. The only example that comes to mind is (or
was) ER-Mapper which only could deal with 2D images anyway. I don't know of
any examples that can buffer 3D complex objects from disk to the frame
buffer. Anyone else?

But you are on the right track. If you have a very large data set which is
in fact made up of a lot of smaller data samples, as say, a series of
measurements which are collected together into what you call (and we call)
a "series" object, you can in fact load one or two or ten or whatever your
system can physically hold at a time and make images from them, save images
to disk, then load the next one or two or ten objects and so on. I doubt
very much that you have 10Gb of RAM, so you probably will not be loading
one 10Gb object, though. Even if you could load it, you'd need another
couple of gigabytes of RAM to do any useful manipulations on it using
openDX since openDX's operational model is to duplicate arrays that are
going to be modified, while pointing to arrays that are not modified but
need to be shared (or reused if you will).

Since you cannot operate at all with openDX on an object you can't load
into RAM, you can't use its structural functions to decompose, reduce,
sample, slice, slab, etc. the large object into smaller objects. You will
have to do this externally to openDX in a pre-processing operation you
create. In some cases, the solution is to store the data object in a
database that permits you to subselect a portion or a sampled reduced
resolution version of the large object. In other cases, you simply hack up
some code of your own to do it.

Once you do get an object loaded into openDX, you then can further
subsample it, (depending on its topology, different operations are
permitted or excluded), then immediately operate on the new object if you
still have enough RAM left, or decompose and Export the new object, close
that 'net' and open the new object from the disk file using a different net
that doesn't incur the RAM overhead of first loading the parent (very
large) object.

Be aware also that the openDX script language may permit you to perform an
operation like "open each series member, find and store its max and min
value, compare this to the previous max and min, store the highest and
lowest, close this object, repeat". This can walk through an enormous
object (series) and yield a useful result, since the series max-min values
can then control scale or colormap bounds for each member when you generate
the series of images one member at a time. (You could do this with the
visual interface as well, but it would take a lot longer to run.)

One more point that analysts new to visualization (I'm not saying you are)
tend to forget. How many nodes or vertices are in your 10Gb or 100Gb
object? How many pixels are there on the display screen or printed page
that you are going to generate an image on? Not a very close match, is it?
So why go to the trouble of generating 100-1000 times as many "pixels" as
you can actually see? In other words, you need all the detail in your mesh
for the sake of accuracy in calculating your model, but you need vastly
fewer representational points for your eye to comprehend the overall
pattern. By the same token, if you zoom in on an area of interest, you
inherently are discarding from the view buffer all the nodes you can't see,
so why load them into memory to then not image them? You may say, well, I
plan to pan to the right in the next frame to see more of them. This is
true, but that's where you may need to have a pre-processor that can pick
out the next adjacent and overlapping subset of your large mesh and load
only that portion.

Visualization has always stressed any hardware we've had. I've been at a
supercomputer center for over 10 years now and we in Viz have invariably
been able to blow out the available memory (and sometimes disk space) of
any machine they've given us on somebody's data set. So you are joining (or
have been in) the fraternity of cheerful machine-busters. I just love to
see the systems weenies squirm when they realize their monster killer
horror machine has been brought to its knees by "those Viz guys". (:-) Hey,
like it's my fault they can't buy us terabyte RAM!

Chris Pelkie
Vice President/Scientific Visualization Producer
Conceptual Reality Presentations, Inc.
30 West Meadow Drive
Ithaca, NY 14850
[EMAIL PROTECTED]

Re: [opendx-users] Question

Reply via email to