Starting from your setup I get a different impression:
+/ 7!:5 ;:'db rno rname popul sno sname name ename tno tname tsize
tcat'
6272
NB. 600 bytes/tuple
7!:5 <'db'
1664
ix =. (< 3 0 2 3 1 2 3 0 0 0 0)
7!:2 'db=. 1 ix} db'
14976
7!:2 'db=. 0 ix} db'
4864
NB. This doesn't look inplace.
Agreed that you can query by extracting the indexes and operating on
them; but wouldn't it be better to store the indexes in a name to avoid
the extraction?
Henry Rich
On 4/6/2021 9:06 PM, Igor Zhuravlov wrote:
On Tue, Apr 6, 2021, 01:28 PM Henry Rich <[email protected]> wrote:
'May be'. Have you actually implemented it?
Yes, I was playing with this idea some time ago. With this approach, an example database
"sed" from
Jd [1] may be implemented as following (best viewing with monospace font,
beware line wraps):
NB. r
rno=. 0 4 5 6 NB. field #0
rname=. <;._1 ' Naples Rome' NB. field #1
popul=. 0 1000 2000 NB. field #2
NB. s
sno=. 0 31 33 34 35 NB. field #3
sname=. <;._1 ' Auto Truck' NB. field #4
name=. <;._1 ' Sales Clerical Engineering Marketing' NB. field #5
NB. e
ename=. <;._1 ' Smith Jones Robinson Jasper Steinberg Rafferty' NB. field #6
NB. t
tno=. 0 11 12 13 14 NB. field #7
tname=. <;._1 ' Desk Cabinet Chair' NB. field #8
tsize=. <;._1 ' small medium' NB. field #9
tcat=. ' ABR' NB. field #a
shape=. (# rno) , (# rname) , (# popul) , (# sno) , (# sname) , (# name) , (#
ename) , (# tno) , (# tname) , (# tsize) , (# tcat)
db=. 1 $. shape ; (i. # shape) ; 0
NB. tbl: r s
e t
NB. col: 0 1 2 3 4 5 6 7 8 9 a 0 1 2 3 4 5
6 7 8 9 a
db=. 1 (< 1 1 1 1 1 1 6 0 0 0 0)} db NB. (4 Naples 1000) (31 Auto Sales
) (Rafferty ) ( )
db=. 1 (< 1 1 1 4 1 4 0 0 0 0 0)} db NB. (4 Naples 1000) (35 Auto Marketing
) ( ) ( )
db=. 1 (< 2 2 2 2 2 3 2 1 1 1 1)} db NB. (5 Rome 2000) (33 Truck
Engineering) (Jones ) (11 Desk small A)
db=. 1 (< 2 2 2 2 2 3 2 4 2 2 3)} db NB. (5 Rome 2000) (33 Truck
Engineering) (Jones ) (14 Cabinet medium R)
db=. 1 (< 2 2 2 2 2 3 4 2 2 2 2)} db NB. (5 Rome 2000) (33 Truck
Engineering) (Jasper ) (12 Cabinet medium B)
db=. 1 (< 2 2 2 2 2 3 5 0 0 0 0)} db NB. (5 Rome 2000) (33 Truck
Engineering) (Steinberg) ( )
db=. 1 (< 3 1 2 1 2 2 0 0 0 0 0)} db NB. (6 Naples 2000) (31 Truck Clerical
) ( ) ( )
db=. 1 (< 3 1 2 3 1 2 1 3 3 1 1)} db NB. (6 Naples 2000) (34 Auto Clerical
) (Smith ) (13 Chair small A)
db=. 1 (< 3 1 2 3 1 2 3 0 0 0 0)} db NB. (6 Naples 2000) (34 Auto Clerical
) (Robinson ) ( )
Note: leading axis element (IO=0) is reserved for "no relation" case.
I find it very hard to believe that a J sparse array would be the best
choice for any database. It seems to me to:
* take lots of memory per value
It's true. The sketch above consumes 11 integers to store one end-to-end tuple.
By the way, for sparse array A with element e:
e=. ISO { A
cases where sizeof(ISO)>sizeof(e) are general. Opposite cases are:
- integer or float sparse vectors (sizeof(ISO)=8, sizeof(e)=8)
- complex sparse vectors and matrices (sizeof(ISO)<=2*8, sizeof(e)=2*8)
- other cases with non-scalar numeric non-boolean values (sizeof(e)>8)
Memory saving appears due to absence of necessity to store and maintain:
- "foreign key" relation columns in each relating table,
- intermediate many-to-many relation tables.
Data fields in Jd sed database occupies 880 bytes, 388 of which (44%) are used
to store foreign key relation fields.
Due to small data set, indices really consumes 24 bits which fits in single
integer. But this storage scheme will
require additional pre- and post-processing.
* have limited or inefficient queries
Queries may be performed by two approaches:
1) operating on index array (4 $. db): operating with columns as integer sets;
2) operating on subarrays (SUB_ISO { db): operating with sparse boolean
matrices.
Both may benefit from rich J facilities.
Indeed, all kinds of queries are possible.
* be very slow for updates, as the entire table must be copied
To establish or relax a relation, a single bit must be set or reset at certain
index (though, it's not really true
in common cases). This can be executed in-place.
References:
[1] https://www.jsoftware.com/jd_tuts.html#sed
--
This email has been checked for viruses by AVG.
https://www.avg.com
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm