Re: [Jprogramming] Mapped Files and Sparse Arrays?

Dan Bron Tue, 04 Jan 2011 14:49:50 -0800

Jack Ort asked:
>  If I can go with the 64-bit version, what kind of
>  size limits would I have to worry about?


In theory, you can have arrays up to 2^64 bytes in size.  In practice, the
number will be much lower (of course).  But basically you can have as much
"memory" as you've got disk.  Of course "memory" that's really on disk will
be significantly slower than memory that's really in RAM (which is slower
than memory that's really in the processor's cache which is slower than
memory that's really in the processor's registers....).

Sparse arrays will allow you to hold much larger (isomorphic) arrays in
memory.  "Isomorphic" because from a user perspective, the arrays act the
same and identical operations produce (isomophically) identical results,
but under the covers the data structures are different, and space is not
allocated for the "zeros" of the array (the "zeros" don't actually have to
be 0; you can nominate the value that represents "not there").

But I'll warn you that sparse arrays aren't as well exercised as normal
arrays, so you may see strange behavior or outright bugs when applying
complex (or just not-simple) operations.  In a way, this is good news: it
means that very few Jers have found the need for sparse arrays in the first
place (or they'd be better exercised), and that for most applications the
data fits into memory, or is easily partitioned into manipulable sizes.

But if you do end up needing sparse arrays, I'd like to ask a favor of you.
Please report any bugs or strange behavior you observe, so that it can be
addressed or corrected in future versions of J, and everyone will benefit.
In return, I'll help you work around obstacles, if you like.  And I suspect
I speak for the whole community on that.

>  If I use mapped files, what does
>  that do to size limits?

Nothing.  Both J and the OS treat memory mapped files the same as memory,
which is the point, to an extent.  The only reasons to use memory mapped
nouns instead of regular (RAM-only) nouns are persistence (to disk) or
sharing between processes, or both.  Basically, you get the benefits of a
file with the ease of use (semantics) of a variable.  Memory mapping doesn't
let you use larger nouns; if a file won't fit into memory, you can't map it
(or all of it, anyway).

>  I see postings on the forum as recently as 2008
>  saying that mapped files cannot support sparse arrays

Still true as of J602a:

           9!:14''
        j602/2008-03-03/16:45
           
           load'jmf'
           createjmf_jmf_ 10000;~fn=.jpath'~temp\sparsetest.jmf'
           map_jmf_ 'S';fn


           S=: 2000 ?...@$ 2     NB.  Works OK

           S=:$. 2000 ?...@$ 2
        |domain error
        |   S    =:$.20...@$2
           
(Note the domain error is on the assignment - you can tell by the extra
spacing in the error message.)

I can't say for certain whether this is still true in the beta, because
apparently the JMF scripts have been renamed, moved, or removed:

           9!:14''
        j701/beta/2009-12-06/14:40

           load 'jmf'
        |file name error: script
        |       0!:0 y[4!:55<'y'
           
but according to Roger in [1], not much has changed in the language in J7,
and he doesn't mention sparse arrays at all, so I wouldn't expect that you
can memory-map them now.
 
>  numerically-encoded multi-dimensional OLAP-like "cube".

Cool.  One neat feature of J, if you're not familiar with it, is  s:  .
That primitive will allow you to encode strings as numbers (and decode them
back).  That way, you can include "strings" in a homogenous numeric array,
and so avoid boxing & unboxing.  For example, let's say we had some sales
records:


           ]T=:|: ('Paid' ; <"0] 100.00 71.30 451.60 12.32),~  _5 ]\ ;:
'Cust adam bob charlie dave Loc NY NY CA TX '
        +-------+---+-----+
        |Cust   |Loc|Paid |
        +-------+---+-----+
        |adam   |NY |100  |
        +-------+---+-----+
        |bob    |NY |71.3 |
        +-------+---+-----+
        |charlie|CA |451.6|
        +-------+---+-----+
        |dave   |TX |12.32|
        +-------+---+-----+

Using s: , we could convert this to a homogenous numeric array, sans boxes:

           s2i =: 6 s: s:

           ] M =: ( s2i@:(}:"1)  ,. >@:({:"1) ) }.T
        1 2   100
        3 2  71.3
        4 5 451.6
        6 7 12.32

Here, the first two columns correspond to the strings in the original sales
records.  Now, we can do our normal processing on the numeric array; let's
say we wanted all customers who paid $100.00 or more:
           
           bs =: M #~ 100 <: {:"1 M  NB.  Big spenders

and now we use J's inversion magic to turn the numbers back to strings:

           s2i^:_1 {."1 bs
        +----+-------+
        |adam|charlie|
        +----+-------+
                   
Finally, in addition to J's inherent array-processing capabilities, you
might want to look at existing user tools.  In particular, check out JDB,
which is a database built on top of J:

        http://www.jsoftware.com/jwiki/JDB

I don't think it does cubes, but it has a rich query language and already
supports mapped files, so you can probably use it as the foundation of your
application and build the cubes & analysis on top.

-Dan

[1]  J7 language changes / new features:
http://www.jsoftware.com/pipermail/beta/2010-September/004596.html 


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Mapped Files and Sparse Arrays?

Reply via email to