Re: Data Type Mapping

2007-08-06 Thread Nathan Potter



Edward et. al,


Perhaps I should have prefaced my request with an explanation of what  
I am up too. I am working on the REAP project (Which I imagine you  
are already familiar with. The REAP's PIs include the lead PIs that  
initiated the Kepler project e.g., Ludäscher, Jones, Altintas, etc.)


Part of the REAP project is to provide data integration with OPeNDAP  
data sources (www.opendap.org). From my (so far limited) view of PTII  
and Kepler it seems that many of the OPeNDAP datasources include data  
sets whose structure is from new (to PTII and Kepler) scientific  
domains and in many cases includes 4D data. Global atmospheric models  
and satellite orbits are two examples of such data.


The OPeNDAP data model and the PTII data model have many  
similarities, but they diverge in that:


- The OPeNDAP data model is optimized for multidimensional arrays.
- The OPeNDAP data model supports a Grid data type that encapsulates  
map vector semantics for multidimensional gridded data.


Both of these data types represent high value content that the REAP  
project would like to make available in PTII/Kepler.


I hope a little background helps to further our discussion.


On Aug 6, 2007, at 5:39 AM, Edward A. Lee wrote:




Nathan,

Perhaps it would help to explain the reason for the current
design:

The existence is MatrixToken types has two motivations.
First, matrices have a natural set of operations that require
fairly sophisticated libraries to support (multiplication,
inverse, eigenvalue computation, etc.). As far as I known,
there are no such natural operations for higher dimensional
matrices.  Second, algorithms using matrices need to be
efficient. Data should be represented using native Java
data types, not wrapped in Tokens.




That kind of optimization is very useful, and is exactly what is done  
for all arrays in the reference implementations of the OPeNDAP data  
model.





In my experience, how to represent higher dimensional
data really depends on the application.  E.g., images
are 2-D, and can be represented in matrix types.
Video is 3-D, and is naturally represented as a sequence
of matrix tokens.




By mapping the OPeNDAP data model into PTII/Kepler we will be making  
available many multidimensional data sets, including 4D atmospheric  
models. Many of these data sets are quite large, so storage/ 
processing optimizations like the one used in the MatrixToken data  
type are generally favored.


Part of this process is to determine how to most effectively  
represent this information in the target(PTII/Kepler) data model.  
This should be driven in part by the tools available in the  
application for subseting and slicing the data.


At this point, based on your comments and those posted by C. Brooks,  
my instinct is to develop a prototype based using MatrixTokens for  
all 2D arrays and nested ArrayTokens elsewhere, as required. You make  
an excellent point about the optimization of using the native java  
types for arrays, and certainly promoting individual array values to  
ArrayToken objects will probably create serious memory usage and  
performance issues in the long run.


I can see a couple of alternate mechainsms for addressing this:

- Extend the PTII/Kepler data model with a more generalized array  
type that uses an efficient storage mechanism and that provides a  
reasonable interface for sub-setting/slicing/dicing the array. I am  
not suggesting a set linear algebra stype functions, just sub-setting  
methods. (I think one goal here would be that if a sub-setting  
activity produces a 2D array result then it should get mapped to a  
MatrixToken type.)


- OPeNDAP servers can perform server side sub-setting. It may be the  
most expedient thing would be to force the sub-setting to happen on  
the orgin server and only import 1D or 2D subsets into the ptolemy  
environment.







You have readily available the following mechanisms
for adding dimensions:
 - sequences (streams)
 - arrays (and arrays of arrays)
 - records
Which to choose will depend on the modeling problem, I think.

Note that a while ago, we did some major research into
representations of multidimensional data via generalized streams
(multidimensional streams).  See:

http://ptolemy.eecs.berkeley.edu/publications/papers/02/synchronous/



I followed this link to a page containing the abstract for the paper  
you mentioned. Unfortunately the link on that page to the full paper:


http://ptolemy.eecs.berkeley.edu/publications/papers/02/synchronous/ 
MurthyLee_MultimensionalSDF.pdf


Returns a 404 not found. Is there another location that you know of?  
I would interested in reading the document.



Thanks you all for your thoughtful responses,



Nathan







This mechanism was implemented in Ptolemy Classic, but never ported
to Ptolemy II.  Probably some good research to be done here still...

Edward


At 07:57 PM 8/3/2007, Nathan Potter wrote:



Greetings,

I am looking at how I might represent an N

Data Type Mapping

2007-08-03 Thread Nathan Potter


Greetings,

I am looking at how I might represent an N dimensional array in the  
ptolemy data model.


There is an obvious mapping for 1D (ArrayToken) and 2D (MatrixToken)  
arrays. But when I look at what I might do to map higher dimensions I  
get stopped by my lack of knowledge regarding the way that users  
expect to see data represented in ptolemy/kepler.


I imagine I could make ArrayTokens whose members are ArrayTokens  
whose members are ArrayTokens whose...


OR

I could use nested RecordTokens in much the same way.

Which is preferable?

There is a RecordDisassembler actor that could probably pick apart  
the later, but the former seems to better preserve the semantic  
relationships of the dimensions.


Ultimately I suppose the question is: Are there any actors in the  
library that are designed to deal with either construct? Or are  
multidimensional arrays a relatively foreign type of data  
organization in ptolemy/kepler?



Thanks,


Nathan



= = =
Nathan Potterndp at opendap.org
OPeNDAP, Inc.541.752.1852




Posted to the ptolemy-hackers mailing list.  Please send administrative
mail for this list to: [EMAIL PROTECTED]