Re: [Jprogramming] Sparse arrays with 13 dimensions

Bo Jacoby Sun, 03 Jul 2011 10:00:52 -0700

For that kind of data I use Ordinal Fractions!

You don't know what it is, so let me tell you.



An ordinal fraction is something like 'the third fourth', because 'the third' 
is an ordinal number and 'a fourth' is a fraction. 


The third fourth is coded as 21. Why? Because the first half is 1 and the 
second half is 2. Ordinal fractions may be padded with zeroes to the right, 
(just like decimal fractions where 0.1=0.10). So 1=10 and 2=20. 


The first fourth is 11, the second fourth is 12, the third fourth is 21 and the 
fourth forth is 22. The odd fourths is identified by the ordinal fraction 01 
and the even fourths by 02. The whole is identified by the ordinal fraction 
00.  

Summarizing:
00 whole

01 odd fourths
02 even fourths
10 first half
11 first fourth
12 second fourth
20 second half
21 third fourth
22 fourth fourth

The data is represented by a text file like this summary, where each record 
consists of a line number, a blank, a data element, and a new line character. 
Each line number is an ordinal fraction. 


The ordinal fraction relationships between the line numbers must reflect the 
logical relationships between the data elements. 


The five ordinal fraction relationships are:


10 EQUALS 10. 10=10. 

10 INCLUDES 11. 10>11.
10 is INCLUDED in 00. 10<00.10 INTERSECTS 01. 10<>01. 
10 is PARALLEL to 20. 10><20.

The data relationships corresponding to these ordinal fraction relationships 
are:

The first half is equal to the first half.
The first half includes the first fourth.
The first half is included in the whole.

The first half intersects the odd fourths.

The first half is parallel to the second half.

An ordinal fraction is like an infinite dimensional array in which the array 
indexes are single-digit numbers, and where only a finite number of indexes are 
nonzero. Note the following differences.

1. An array has a finite dimension. An ordinal fraction has infinite dimension. 

2. An array may include a finite number of subarrays. An ordinal fraction 
includes an infinite number of ordinal fractions.
3. An array may contain elements. No ordinal fraction may contain elements.
4. An array has a name. An ordinal fraction has no name, but it may contain 
data for explanation.

5. Array indexes are nonnegative numbers 0,1,...,N. Ordinal fraction indexes 
are single-digit numbers 1,2,...,9. 
6. Digit zero is wildcard character.


In your data model there will be about 13 dimensions. So at least 13 digits are 
necessary to identify an information.

Cardinalities greater than 9 is handled by using more than one digit for 
indexing. 81 items may be indexed by 11 through 99 where digit zero is not 
used. 729 items may be indexed by 111 through 999 where digit zero is not used. 

If this is sufficient, fine! If you have questions, ask them. Have fun!
- Bo


>________________________________
>Fra: david alis <[email protected]>
>Til: Programming forum <[email protected]>
>Sendt: 15:34 søndag den 3. juli 2011 
>Emne: [Jprogramming] Sparse arrays with 13 dimensions
>
>Does anyone have advice about how to tackle a problem where sparse arrays
>would be a good implementation in principle, but not in practice?
>
>This particular problem comes from a group of colleagues that compiles
>statistics.
>In the proposed data model there will be about 13 dimensions.
>
>The cardinalities of three of these dimensions is around 400.
>These dimensions represent countries - either individually or grouped.
>The remaining dimensions have cardinalities of between 3 and 30.
>
>The data is very sparse - probably only 3 dimensions will be dense.
>None of the high cardinality dimensions are dense.
>
>Time period,(i.e. year and month) is additional dimension but does not
>present an issue because data for each period can quite naturally
>be held in its own file.
>
>
>The types of operations are simple -
>(i) storage and retrieval of selections for display in Excel etc
>(ii) totaling and subtotaling up most of the dimensions (e.g. aggregating
>countries).
>
>A J-sparse array implementation would have 10 sparse axis.
>This means that for every observation there would 10 extra numbers (i.e.
>integers).
>i.e. for each 8 bytes of useful data there needs to be 800 bytes of support
>(J64).
>
>The problem comes from the fact that for each period there may be
>between 10 and 50 million observations.
>
>Assuming that each element in the index array for a sparse noun
>uses 8 bytes then this implies a memory requirement of 800 - 4000 Mb for
>each period.
>
>If it's really true that an element for each index in each sparse dimension
>needs 8 bytes then the sparse implementation is quite inefficient.
>
>A way around this could be to combine several dimensions
>using #. and #: (something old-time APL programmers did
>using code and decode).
>
>Using this trick the number of sparse dimensions could be reduced to 3 or 4.
>While this would reduce space requirements it introduces lots of complexity.
>
>As things stand, sparse arrays are not supported by mapped nouns.
>
>Given that the source is now available, how practical would it be to
>implement
>mapped noun support for sparse arrays? And if it was, are we talking days or
>months?
>
>Regards
>David
>----------------------------------------------------------------------
>For information about J forums see http://www.jsoftware.com/forums.htm
>
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Sparse arrays with 13 dimensions

Reply via email to