I was somewhat surprised to find out that symbols only save space on large
strings in an individually boxed table. In an inverted table, the space
used is the same for large and small symbols.

str1=: 100 2 $ (<10$'ab')
sym1=: 100 2 $ <"0 s:(<10$'ab')
str2=: 100 2 $ (<100$'ab')
sym2=: 100 2 $ <"0 s:(<100$'ab')
7!:5 ;: 'str1 sym1 str2 sym2'

27648 27648 53248 27648

Small strings (str1 and sym1) use the same size as the large symbol (sym2)
in a boxed table.

istr1=:<@(>"1)@|: str1
isym1=:<@(>"1)@|: sym1
istr2=:<@(>"1)@|: str2
isym2=:<@(>"1)@|: sym2
7!:5 ;: 'istr1 isym1 istr2 isym2'

4224 2176 32896 2176

In an inverted table, str1 is larger than sym1 and the difference is much
more clear on the large strings (istr2 and isym2).

So is there any advantage to using symbols over small strings in an
individually boxed table?

It seems as though a string and small symbol take up the same space

s1=:s:'abc'
s2='abc'
7!:5 ;: 's1 s2'
128 128

Is this because a small string takes up roughly the same size as a number,
which can be quite large or have extended precision?


int main() {
 char foo[3];
 strcpy(foo, "abc");
 int ival = 8;
 long lval = 8;
 printf("char: %d, int: %d, long: %d", sizeof(foo)*sizeof(char),
sizeof(ival), sizeof(lval));
}

char: 3, int: 4, long: 4

(64 bit os)

I'm asking because I'm working with a large file that I'd like to retain in
memory as a boxed table or inverted table. Somehow R is able to hold the
entire structure in memory at about 1/3 the size as J. R uses a symbol
table for strings.

It seems like an inverted table of symbols is the way to go in J
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to