>Hello, > >I have recently started using OpenDX data model for our software >development. One thing, I am not clear as to how I can use >this model for extremely huge data sets ( > 100GB ). Is it >necessary to load series members at the beginning of calculations. > >For one CFD data set of 10GB ( one file ), how can I use this >model. > >I would like some information about OpenDX used in co-processing >environment, if it is available. > > >Thanks > > >Chaman Singh >
I'll start this off and I'm sure others will add their thoughts. First, welcome to the list! The DX data model is quite distinct from the current implementation of the openDX visualization program that makes images from data. This may simply be a fine point of language, but it will help you understand us if I make it clear. The data model is a conceptual organization scheme for describing a mapping of sampled data on a spatial coordinate system. As such, there is no inherent limit on the size of object you can describe with it. In practice, of course, your machine's MAXINT and MAXFLOAT might place limits on your object description. The software you have downloaded places very real-world limits on the data you can process, by virtue of the pesky fact that you have to run it on real hardware. The single biggest limitation is RAM memory, since the implementation of openDX's memory management is entirely RAM based (discounting swap as a mere artifact of machine implementation, not really "extra" RAM). This differs from a limited number of other programs which can buffer large objects on disk and load them apparently "seamlessly" as you pan around, say, an image. The only example that comes to mind is (or was) ER-Mapper which only could deal with 2D images anyway. I don't know of any examples that can buffer 3D complex objects from disk to the frame buffer. Anyone else? But you are on the right track. If you have a very large data set which is in fact made up of a lot of smaller data samples, as say, a series of measurements which are collected together into what you call (and we call) a "series" object, you can in fact load one or two or ten or whatever your system can physically hold at a time and make images from them, save images to disk, then load the next one or two or ten objects and so on. I doubt very much that you have 10Gb of RAM, so you probably will not be loading one 10Gb object, though. Even if you could load it, you'd need another couple of gigabytes of RAM to do any useful manipulations on it using openDX since openDX's operational model is to duplicate arrays that are going to be modified, while pointing to arrays that are not modified but need to be shared (or reused if you will). Since you cannot operate at all with openDX on an object you can't load into RAM, you can't use its structural functions to decompose, reduce, sample, slice, slab, etc. the large object into smaller objects. You will have to do this externally to openDX in a pre-processing operation you create. In some cases, the solution is to store the data object in a database that permits you to subselect a portion or a sampled reduced resolution version of the large object. In other cases, you simply hack up some code of your own to do it. Once you do get an object loaded into openDX, you then can further subsample it, (depending on its topology, different operations are permitted or excluded), then immediately operate on the new object if you still have enough RAM left, or decompose and Export the new object, close that 'net' and open the new object from the disk file using a different net that doesn't incur the RAM overhead of first loading the parent (very large) object. Be aware also that the openDX script language may permit you to perform an operation like "open each series member, find and store its max and min value, compare this to the previous max and min, store the highest and lowest, close this object, repeat". This can walk through an enormous object (series) and yield a useful result, since the series max-min values can then control scale or colormap bounds for each member when you generate the series of images one member at a time. (You could do this with the visual interface as well, but it would take a lot longer to run.) One more point that analysts new to visualization (I'm not saying you are) tend to forget. How many nodes or vertices are in your 10Gb or 100Gb object? How many pixels are there on the display screen or printed page that you are going to generate an image on? Not a very close match, is it? So why go to the trouble of generating 100-1000 times as many "pixels" as you can actually see? In other words, you need all the detail in your mesh for the sake of accuracy in calculating your model, but you need vastly fewer representational points for your eye to comprehend the overall pattern. By the same token, if you zoom in on an area of interest, you inherently are discarding from the view buffer all the nodes you can't see, so why load them into memory to then not image them? You may say, well, I plan to pan to the right in the next frame to see more of them. This is true, but that's where you may need to have a pre-processor that can pick out the next adjacent and overlapping subset of your large mesh and load only that portion. Visualization has always stressed any hardware we've had. I've been at a supercomputer center for over 10 years now and we in Viz have invariably been able to blow out the available memory (and sometimes disk space) of any machine they've given us on somebody's data set. So you are joining (or have been in) the fraternity of cheerful machine-busters. I just love to see the systems weenies squirm when they realize their monster killer horror machine has been brought to its knees by "those Viz guys". (:-) Hey, like it's my fault they can't buy us terabyte RAM! Chris Pelkie Vice President/Scientific Visualization Producer Conceptual Reality Presentations, Inc. 30 West Meadow Drive Ithaca, NY 14850 [EMAIL PROTECTED]
