Edward et. al,
Perhaps I should have prefaced my request with an explanation of what
I am up too. I am working on the REAP project (Which I imagine you
are already familiar with. The REAP's PIs include the lead PIs that
initiated the Kepler project e.g., Ludäscher, Jones, Altintas, etc.)
Part of the REAP project is to provide data integration with OPeNDAP
data sources (www.opendap.org). From my (so far limited) view of PTII
and Kepler it seems that many of the OPeNDAP datasources include data
sets whose structure is from new (to PTII and Kepler) scientific
domains and in many cases includes 4D data. Global atmospheric models
and satellite orbits are two examples of such data.
The OPeNDAP data model and the PTII data model have many
similarities, but they diverge in that:
- The OPeNDAP data model is optimized for multidimensional arrays.
- The OPeNDAP data model supports a Grid data type that encapsulates
map vector semantics for multidimensional gridded data.
Both of these data types represent high value content that the REAP
project would like to make available in PTII/Kepler.
I hope a little background helps to further our discussion.
On Aug 6, 2007, at 5:39 AM, Edward A. Lee wrote:
Nathan,
Perhaps it would help to explain the reason for the current
design:
The existence is MatrixToken types has two motivations.
First, matrices have a natural set of operations that require
fairly sophisticated libraries to support (multiplication,
inverse, eigenvalue computation, etc.). As far as I known,
there are no such natural operations for higher dimensional
matrices. Second, algorithms using matrices need to be
efficient. Data should be represented using native Java
data types, not wrapped in Tokens.
That kind of optimization is very useful, and is exactly what is done
for all arrays in the reference implementations of the OPeNDAP data
model.
In my experience, how to represent higher dimensional
data really depends on the application. E.g., images
are 2-D, and can be represented in matrix types.
Video is 3-D, and is naturally represented as a sequence
of matrix tokens.
By mapping the OPeNDAP data model into PTII/Kepler we will be making
available many multidimensional data sets, including 4D atmospheric
models. Many of these data sets are quite large, so storage/
processing optimizations like the one used in the MatrixToken data
type are generally favored.
Part of this process is to determine how to most effectively
represent this information in the target(PTII/Kepler) data model.
This should be driven in part by the tools available in the
application for subseting and slicing the data.
At this point, based on your comments and those posted by C. Brooks,
my instinct is to develop a prototype based using MatrixTokens for
all 2D arrays and nested ArrayTokens elsewhere, as required. You make
an excellent point about the optimization of using the native java
types for arrays, and certainly promoting individual array values to
ArrayToken objects will probably create serious memory usage and
performance issues in the long run.
I can see a couple of alternate mechainsms for addressing this:
- Extend the PTII/Kepler data model with a more generalized array
type that uses an efficient storage mechanism and that provides a
reasonable interface for sub-setting/slicing/dicing the array. I am
not suggesting a set linear algebra stype functions, just sub-setting
methods. (I think one goal here would be that if a sub-setting
activity produces a 2D array result then it should get mapped to a
MatrixToken type.)
- OPeNDAP servers can perform server side sub-setting. It may be the
most expedient thing would be to force the sub-setting to happen on
the orgin server and only import 1D or 2D subsets into the ptolemy
environment.
You have readily available the following mechanisms
for adding dimensions:
- sequences (streams)
- arrays (and arrays of arrays)
- records
Which to choose will depend on the modeling problem, I think.
Note that a while ago, we did some major research into
representations of multidimensional data via generalized streams
(multidimensional streams). See:
http://ptolemy.eecs.berkeley.edu/publications/papers/02/synchronous/
I followed this link to a page containing the abstract for the paper
you mentioned. Unfortunately the link on that page to the full paper:
http://ptolemy.eecs.berkeley.edu/publications/papers/02/synchronous/
MurthyLee_MultimensionalSDF.pdf
Returns a 404 not found. Is there another location that you know of?
I would interested in reading the document.
Thanks you all for your thoughtful responses,
Nathan
This mechanism was implemented in Ptolemy Classic, but never ported
to Ptolemy II. Probably some good research to be done here still...
Edward
At 07:57 PM 8/3/2007, Nathan Potter wrote:
Greetings,
I am looking at how I might represent an N