Starting from your setup I get a different impression:

    +/ 7!:5 ;:'db rno rname popul sno sname name ename tno tname tsize tcat'
6272

NB. 600 bytes/tuple

   7!:5 <'db'
1664
   ix =.  (< 3 0 2 3 1 2 3 0 0 0 0)
   7!:2 'db=. 1 ix} db'
14976
   7!:2 'db=. 0 ix} db'
4864

NB. This doesn't look inplace.

Agreed that you can query by extracting the indexes and operating on them; but wouldn't it be better to store the indexes in a name to avoid the extraction?

Henry Rich




On 4/6/2021 9:06 PM, Igor Zhuravlov wrote:
On Tue, Apr 6, 2021, 01:28 PM Henry Rich <[email protected]> wrote:
'May be'.  Have you actually implemented it?
Yes, I was playing with this idea some time ago. With this approach, an example database 
"sed" from
Jd [1] may be implemented as following (best viewing with monospace font, 
beware line wraps):

NB. r
rno=. 0 4 5 6                  NB. field #0
rname=. <;._1 '  Naples Rome'  NB. field #1
popul=. 0 1000 2000            NB. field #2

NB. s
sno=. 0 31 33 34 35                                    NB. field #3
sname=. <;._1 '  Auto Truck'                           NB. field #4
name=. <;._1 '  Sales Clerical Engineering Marketing'  NB. field #5

NB. e
ename=. <;._1 '  Smith Jones Robinson Jasper Steinberg Rafferty'  NB. field #6

NB. t
tno=. 0 11 12 13 14                   NB. field #7
tname=. <;._1 '  Desk Cabinet Chair'  NB. field #8
tsize=. <;._1 '  small medium'        NB. field #9
tcat=. ' ABR'                         NB. field #a

shape=. (# rno) , (# rname) , (# popul) , (# sno) , (# sname) , (# name) , (# 
ename) , (# tno) , (# tname) , (# tsize) , (# tcat)
db=. 1 $. shape ; (i. # shape) ; 0

NB. tbl:                                   r               s                    
  e           t
NB. col:  0 1 2 3 4 5 6 7 8 9 a            0 1      2      3  4     5           
  6           7  8       9      a
db=. 1 (< 1 1 1 1 1 1 6 0 0 0 0)} db  NB. (4 Naples 1000) (31 Auto  Sales      
) (Rafferty ) (                   )
db=. 1 (< 1 1 1 4 1 4 0 0 0 0 0)} db  NB. (4 Naples 1000) (35 Auto  Marketing  
) (         ) (                   )
db=. 1 (< 2 2 2 2 2 3 2 1 1 1 1)} db  NB. (5 Rome   2000) (33 Truck 
Engineering) (Jones    ) (11 Desk    small  A)
db=. 1 (< 2 2 2 2 2 3 2 4 2 2 3)} db  NB. (5 Rome   2000) (33 Truck 
Engineering) (Jones    ) (14 Cabinet medium R)
db=. 1 (< 2 2 2 2 2 3 4 2 2 2 2)} db  NB. (5 Rome   2000) (33 Truck 
Engineering) (Jasper   ) (12 Cabinet medium B)
db=. 1 (< 2 2 2 2 2 3 5 0 0 0 0)} db  NB. (5 Rome   2000) (33 Truck 
Engineering) (Steinberg) (                   )
db=. 1 (< 3 1 2 1 2 2 0 0 0 0 0)} db  NB. (6 Naples 2000) (31 Truck Clerical   
) (         ) (                   )
db=. 1 (< 3 1 2 3 1 2 1 3 3 1 1)} db  NB. (6 Naples 2000) (34 Auto  Clerical   
) (Smith    ) (13 Chair   small  A)
db=. 1 (< 3 1 2 3 1 2 3 0 0 0 0)} db  NB. (6 Naples 2000) (34 Auto  Clerical   
) (Robinson ) (                   )

Note: leading axis element (IO=0) is reserved for "no relation" case.

I find it very hard to believe that a J sparse array would be the best
choice for any database.  It seems to me to:

* take lots of memory per value
It's true. The sketch above consumes 11 integers to store one end-to-end tuple.

By the way, for sparse array A with element e:
   e=. ISO { A
cases where sizeof(ISO)>sizeof(e) are general. Opposite cases are:
- integer or float sparse vectors (sizeof(ISO)=8, sizeof(e)=8)
- complex sparse vectors and matrices (sizeof(ISO)<=2*8, sizeof(e)=2*8)
- other cases with non-scalar numeric non-boolean values (sizeof(e)>8)

Memory saving appears due to absence of necessity to store and maintain:
- "foreign key" relation columns in each relating table,
- intermediate many-to-many relation tables.

Data fields in Jd sed database occupies 880 bytes, 388 of which (44%) are used 
to store foreign key relation fields.

Due to small data set, indices really consumes 24 bits which fits in single 
integer. But this storage scheme will
require additional pre- and post-processing.

* have limited or inefficient queries
Queries may be performed by two approaches:
1) operating on index array (4 $. db): operating with columns as integer sets;
2) operating on subarrays (SUB_ISO { db): operating with sparse boolean 
matrices.

Both may benefit from rich J facilities.

Indeed, all kinds of queries are possible.

* be very slow for updates, as the entire table must be copied
To establish or relax a relation, a single bit must be set or reset at certain 
index (though, it's not really true
in common cases). This can be executed in-place.

References:
[1] https://www.jsoftware.com/jd_tuts.html#sed



--
This email has been checked for viruses by AVG.
https://www.avg.com

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to