Re: [Jprogramming] A new bind conjunction WAS: Dictionaries

'Pascal Jasmin' via Programming Wed, 09 Feb 2022 10:55:57 -0800

on the bind conjunction proposal, one huge benefit of it would be the apply 
verb.

'[ + +/@]'& apply 1 2 3

7 8 9  NB. 1 2 3 + 6 because [ monadically is same as ]

if we've used apply, we've all tried this:

2 '[ + +/@]'& apply 1 2 3

31 32 33

which executes '+/'& apply(^:2) 1 2 3

so: 

b=: & @: (,&boxopen)
applybind =: 1 : ' (''('' , m ,'')&>/'') b apply'

2 '[ + +/@]' applybind 1 2 3

8  NB. dyadic apply application. instead of ^:x

The/an internal apply verb could be optimized upon receiving its x parameter 
binding as well.  b: is a potential available builtin instead of &::

On Monday, February 7, 2022, 11:29:34 p.m. EST, 'Pascal Jasmin' via Programming 
<programm...@jsoftware.com> wrote: 

The best idea I had in previous post

 b =: & @: (,&boxopen)

 3 b (,&<) 2

┌─┬───┐

│3│┌─┐│

│ ││2││

│ │└─┘│

└─┴───┘

(,&<) b 3 (b 2) (1 b) 4

┌───────┬─┐

│┌─┬─┬─┐│3│

││1│4│2││ │

│└─┴─┴─┘│ │

└───────┴─┘

First example binds 3 to x, 2nd binds 3 to y. A bound function will then assume 
that the other parameter is a boxed list, and when called dyadically will build 
up that list from unboxed dyadic parameters.  When called monadically, a list 
of length 1 will be the other (x or y) parameter.

This allows verbs with more than 2 parameters, where 1 is a compound boxed 
parameter.  Once the unboxed parameter is bound/curried, there is a choice of 2 
curryings of the boxed compound parameter for the next applications of b.  
First or last.  In 2nd example above, 2 is bound to last slot, then 1 bound to 
first slot (b 1 instead of 1 b would bind 1 to next to last slot... ie last of 
remaining slots).  Last expression can always be called dyadically or 
monadically

  5 (,&<) b 3 (b 2) ( 1 b) 4

┌─────────┬─┐

│┌─┬─┬─┬─┐│3│

││1│5│4│2││ │

│└─┴─┴─┴─┘│ │

└─────────┴─┘

bf =: b f.  NB. would provide optimization potential by "expanding constants."

For optimization, either f. or new f: would "propagate constants".  so:

  (3 + ])&2 f.

5"_

   ([ + 3 + ])&2 f.
5 +~ [

by scanning for f@] (when &constant/noun is provided) in a tacit expression, 
constants can get propagated, and so deep optimizations are possible:

(U V F@])&N -> (F N) V~ U
(g@] V f@])&N -> (G N) V (F N) -> (N V N)"_

for the compound parameter (0 {:: ]) or (_1 {:: ]) could also receive constant 
propagation optimizations once those positions are set.

f. or f: (this proposal is incompatible with previous version so new f: would 
avoid changing old uses of f.) could also tunnel into explicit code such that 
lines that are name =. f y become name =. constant when 4 : 'y =. f y 
...'&constant f: is provided.

Getting back to dictionaries,

I've been proposing a "thin class" that holds MetaInfo about data without the 
data (encapsulating data makes it a heavy class).  The disadvantage of this 
approach is that a lot more parameters to inverted table functions that need to 
be passed.  while get__myItblMeta does not need meta info passed along to it, a 
generic get verb does + the data + field(s) to get from + a possible modifier 
for which column subset to retrieve.  And then if there is a column subset 
modifier to a filter function (subset of records selected) then new meta data 
must also be returned as the original column index map is no longer valid.

So, the new bind definition makes all of this manageable when the metadata is 
"known"/bound first (with optimization potential), then fieldtolookup and data 
become the dyadic parameters.  set has extra parameters for key and newdata.

Another idea, I can make a functional simple kv similar to JP's class that is 
metadata free/implied (normal programming style).  The specs are:

keys are unique symbols.  (symbols are more flexible than J variable names.  
Can include spaces and other chars.  Shoehorning any data into a symbol is 
possible and guaranteed if shoehorning to a string representation is possible.  
All J data has string representations.  A further restriction that keys not 
have any leading or trailing blanks is worthwhile just to avoid access errors.  
Specialized functionality that does allow leading+trailing blanks in fields can 
be copy/edited into new functions that removes code that enforces these 
constraints.

access to keys interface supports/assumes boxed and string descriptions of 
symbol key (parameters).  keys are store in (num,1) shape

set function is a combination of JP's set or delete functionality with k/q's 
upsert, with "merge" oriented optimizations for bulk oriented upserts.  To get 
JP's upsert or delete functionality, whenever value is null, delete is 
performed.  This corresponds to keyhasnullvalue get kvdata returns the same 
when k is not found or associated value is null, and so (key;a:) set kvdata 
provides the same results if key is deleted as if it were associated with null.

add functionality, first tries to append value unboxed.  If error, then tries 
append boxed value.  If error, the boxes each existing items of values and 
appends boxedopen.  For set functionaly, copies the boxing level of existing 
value.  Once values have been boxed as a result of adding a single boxed value, 
they won't ever get unboxed in the rare case of overwritting a boxed value with 
an unboxed one would provide a homogeneous typed value list.

The core get functionality is unoptimized key lookup.  utility function is 
provided for user to create an optimized keys&i. function, that user can access 
after they are done modifying the dictionary, and many accesses or dictionary 
is quite long that optimization is useful to user.  Optimized get will still 
work with both y as kv or just a value list, though perhaps separate versions 
of each need to be provided as utilities when we assume that detecting kv vs 
just-values structure has a relatively high timing penalty. utilities are also 
provided to transform the kv data from symbol-value inverted table to either 
boxedstring keys or padded string keys and value inverted tables such that 
either lookup by value or partial string based queries can be used for access.  
Courteousy  utility functions also provided to access/query  kv data in such 
string-value inverted table form.

Some other justifications for this spec:

symbol keys provide optimization even without keys&i. step afaik.  Symbol 
compatible keys go well beyond compatibility with J locale names.  Compound 
keys may allow space or other char separated keys as symbols.

Merge/bulk oriented optimizations are worthwhile.  Can prescan the list of 
updates into unique constraints and value boxing constraints.  Where value 
constraints with nulls are still compatible if deletes (mixed with upserts) are 
moved to the top as seperate action items.  The unique filter for key uses i: 
in order to just keep the last modification action in the bulk list.

This kv data spec is enough to specify an inverted table metadata for fields.  
Variant value field means a dictionary/kv-data implementation to represent 
unique/sorted/type/boxed provides an easy access pattern to all of the fields.  
If there are not that many fields, then space innefficiency doesn't matter.  If 
you never have to access fields by one of their attributes (say all fields that 
are sorted) then inverted table access efficiencies are not needed.  This 
lua-styled approach works well for small data with key driven access.  Provides 
easy descriptive access:

'sorted' kv 'myfield' kv kvdata  NB. retrieves the sorted property of myfield 
key in kvdata.

'sorted' kv 'myfield' kv 'fields' kv metadata  NB. fields is a collection of 
dictionaries with each item keyed as a fieldname with dictionary of related 
attributes. Collections allow for like-typed data to be grouped together, and 
then "walked through" for processing.  In metadata, easier to separate from 
simple properties.

Another metadata-less dictionary structure that could be generally useful, and 
specifically useful for 1:1 mapping to class definitions is something I call 
kve: a 3 column table with key as symbols, values as strings, and "encoding" as 
symbols.  Where the encoding data associated with each key and value informs 
how to turn the value string encoding into "native data or functions".  I'm not 
sure there is general use for this, but if class descriptions need to be done, 
then this kve scheme seems necessary.

On Sunday, February 6, 2022, 04:34:28 p.m. EST, 'Pascal Jasmin' via Programming 
<programm...@jsoftware.com> wrote: 

You covered some of the issues with a data encapsulated class approach like 
yours.

The big issue for me is that your set verb returns 0 0 $0, but even if it 
returned the object reference, J is poor at compound expressions that operate 
on an object.  Need to pass strings to what effectively becomes a dsl

new j903 modifier trains get useful, but still messy

d=: dict 'abc';1 2 3

loc_z_=: (,&'_'@[ ,&'_'@, ":@>@])"1 0 boxopen
in_z_ =: ([. loc ].)~

d ('gf' in ]: + 'gf' (in d)) 'a'  NB. parameterizing dictionary as an adverb 
for lhs of fork, and hard coding on rhs

2

but if set returned an object, having a verb that operated on that object would 
require explicit code  (__y will work) to be simple.

Then there is the issue of a set operation that doesn't want a "forced side 
effect" of permanently altering the object.  instead a copy that wants to be 
temporarily used.  A filter/query operation that returns multiple "records" 

Instead of a data encapsulated class, functions that operate on inverted tables 
would allow returning a new/subset of the "data".  This adds extra work to 
save, but the extra work to copy a class in order to modify only the copy, but 
predeciding that if you want to do this, you would never want to overwrite the 
original dictionary, which seems like being above the paygrade of a function 
operating on inverted tables. Also remember to destroy the copy in your code 
when it is supposed to be discarded (actually a hard problem that would need 
its own dsl to solve all "responsibility combintations").  And then J, has 
unfriendly access problems on operating with an object parameter to a function 
if not an explicit function.

J's strengths come from its functional approach.  Returning a new copy of data 
is functional.  It is very easy in J, especially in console, to modify the 
previous line of code such that it assigns a new result value to existing or 
new variable names.  Double checking that the function works properly before 
overwritting "production" or lesser data is a prudent approach I'd recommend 
100% of the time.  J's impure functional approach is also the perfect 
functional approach.  Pure (never side effect) functions inside, but the last 
caller/user (outside) decides on what side effects to make.

An inverted table argument makes it easy to write functions that operate on 
that y argument inverted table.  An encapsulated class makes that difficult to 
extend.  I still think "keyed table" (multi column dictionary including 
potential multicolumn keys uniquely identifying a record) is still the right 
approach to a generalized dictionary, and most (90%+) column use cases would be 
uniformly typed.  A defining property of dictionaries is access by full key 
match which necessarily brings symbols as an optimization feature of fields, 
but even if dictionary/keyed table, general query access is a nice to have, 
that you have with inverted tables, and an ability to covert to/from symbols 
when "necessary".

A class based approach to keyed tables is possible and easiest to create.

I've mentioned a general datastructure framework.  Which is metadata about the 
data in one box, data in the other.  Metadata is a "property dictionary" where 
values are data or functions.  A string encoding is possible especially if 
there is a "class type" field that directs the encoding/decoding, but encoding 
values as boxed items to distinguish among different types/classes of values 
and functions is also an option.  There is an easyish 1:1 mapping between a 
metadata structure about data, and a class definition that references DATA 
variable, or better yet, use data that is expected to conform to metadata 
understanding of the data as its y function parameter.  This necessarily makes 
this approach exactly as easy as the first.  Write a class, and use it either 
as class or as metadata described structure (data) to be chose by user.

A third option, especially if it applies just to keyed tables, is having a 
dsl/description of the inverted table structure as an adverb parameter.  An 
adverb allows for optimization in the returned verb/modifier. To optimize get 
(your valuable feature of your dict class), you only need to know the table 
constraints/definitions.  set using a datastructure definition can generate a 
(pre)validation of input, along with informative descriptions for why elements 
fail if they do.  A multi column dictionary description dsl would look like:

key: ... value: ...  NB. where ... is a list of fields with attributes 
(reserved words not allowed as field names) as follows:

colname: u(nique): s(orted): type: or b(oxed): (optional if first item 
determines type.  But benefits optimization if provided in dictionary 
description)

single line definition potential is a huge convenience for both copy/edit 
coding, and console simplicity.

So a generic get (by whole field match) is an adverb that first uses 'keyed 
table def' get, but then by a column list (indexes or colnames) that permits an 
indexing optimization step on that index (m&i. where m is the column 
parameter), when a single column is passed, then all keys in y are used to 
retrieve records (one for each key passed), and when multiple columns are part 
of final adverb parameter, then y is expected as a boxed values for each 
column, and all records with a key match retrieved.  It is possible to choose 
(with additional (named) adverb) that if only one record is in dataset, then 
just raw values instead of full dictionary structure are returned.

A metadata encoded datastructure seems superior to the adverb dsl processor in 
that an adverb dsl processor could with a preceding adverb interpret any 
meta+data parameter with just the metadata portion that allows it to operate on 
any other similar structured/metadata'd data.

The end goal of an approach, IMO, should be to create improvements to J in 
terms of generic inverted table functions, with some specific improvements 
already identified in this thread:

'column list' { meta-described-dictionary NB. use FIELDS metadata keyword that 
contains symbol data, to retrieve column indexes (or other potential use of 
FIELDS duck named variable specific to datastructure) referenced in string.

&:: =: bind =: (& @: ;) new modifier train such that dyadic m&:: f and f &::n 
are (m&f)(@:;) or (f&n)(@:;).

J already has bound =: (f&n) or (m&f) have special dyadic interpretations of 
bound^:x y.  The above enhancement would allow an interpertation of bound(@:;) 
which allows writing f for 3 arguments, ie. compound 2 boxed x or y arguments, 
but allows user to provide compound part as dyadic unboxed arguments.  &:: 
compounded allows even more arguments.  If x takes 3 (boxed) arguments than 
arg0&::f&::y applied dyadically, has x as arg1 and y as arg2.  If applied 
monadically,  then the 3rd x argument (arg2) to f would be missing, and f 
c/would deal.  Compounding &:: calls would increase arity of functions from 3 
to higher than 3 parameters.

This feature would also allow optimizing inside f.  If f is explicit than any 
line that is varname =: f x (if m&f is bound) or f y (if f&n is bound), and 
where an ideal structure is x =. f x or y =. f y internally as proof that 
original x can be discarded.  If f is implicit, than any u@] or u@[ can be 
optimized away to a constant based on m&::f or f&::n, and if N V N occurrs as 
result of that optimization, then that too can be optimized into a constant.

What the above allows beyond syntax sugar for more than 2 parameter verbs, is 
not having to resort to self-written-code optimizations inside adverbs.  verbs 
can self optimize based on bound parameters (when for example (m i. ]) has same 
optimization as m&i.

> Lua table references

I've been thinking of k/q as the guiding model.  Lua's variant (boxed) key and 
variant (boxed) values tables have the simplicity of storing every potential 
scenario, but as a dictionary implementation, would provide a strong incentive 
to avoid the dictionaries for performance reason.  If you wanted to use a 
dictionary as a key, in J, you could use a linear representation of that 
dictionary in order to keep all keys as strings.

But, repeating sorry, a boxed/variant column type can coexist along side 
uniform typed columns.

Metadata (not at all Lua interpretation) would instead specify types and 
attributes of inverted table columns in the case of keyed tables.  But also 
(kinda like Lua) include optimized/specified functions related to data.

In general, I'd also say that access_keys_ being limited to valid spaceless J 
naming conventions is not a huge sacrifice for accessnames.  Extending to 
spaceless unicode strings is not an ease of use problem if the user wants 
unicode keys, though it would interfere with that 1:1 J locale/classname 
mapping of datastructure metadata.

On Sunday, February 6, 2022, 09:52:04 a.m. EST, Jan-Pieter Jacobs 
<janpieter.jac...@gmail.com> wrote: 

Hi Pascal,
I responded inline below:

A workaround is to optimize SET, ADD, UPDATE, DEL for bulk operations
> (multiple items processed at once  (] F..) super useful), and after bulk
> operations, "redefine"  (just repeat execution of same definition) GET such
> that any m&i. updates.  Also update FILTER functions (GET multiple if they
> gain from static binding optimization.
>

This is, if I get it correctly, exactly what my dict implementation (
https://github.com/jpjacobs/types_dict) does: it allows
setting/updating/removing multiple keys and the lookup verbs used are
updated only if there is a change in keys

>
> An approach that just presumes key uniqueness instead of enforcing it, is
> for GET to be based on i: instead of i. and then any ADD with a duplicate
> key effectively will return the last updated/added values.
>

This would gather a lot of garbage and would loose the advantage of
in-place updating.

>
> Back to generic datastructure, everything a class can do is possible
> within a datastructure.  All administrative "properties" (names) and their
> associated values including functions can be encoded in a dictionary,
> including a string representation dsl for representing "name values" with
> ease as to function/data.  What specializes a datastructure over a "mere"
> class is the concept of existential data held by the datastructure that a J
> user would want complete access to that data.  In a class based
> implementation, a universal name data =: holds the core data that the J
> programmer would want access to.  Usually, it is compound greater than
> atomic data that can be represented as inverted tables of "linked data".
> And part of the data specifying dsl's purpose is to include descriptions
> that permit any possible optimizations that include what k/q's attributes
> do (sorted, unique), but with extensible dsl, any other
> implications/constraints on the data can use/select a specific
> implementation of universally named "accessors"/functions
>

So a datastructure contains 2 boxes:  1st holds the name of the
> datastructure class (for lookup value of any metadata of that classname),
> and all administrative properties, and specialized functions for
> GET/ADD/DELETE and other functions expected to have meaning relative to its
> "existential" data, and the 2nd box holds the (likely compound and so extra
> boxed) "data"
>
> An advantage of a compound datastructure over a class is the user gets to
> decide whether to overwrite the "permanent" data while still having access
> to SET/DEL/ADD functionality of their own copy they may want for their
> application/data needs. It is also possible for generic GET/ADD/DELETE to
> query the datastructure as to how it can best accomplish its integral
> functionality, should there not be a specialized version defined in the
> datastructure, and GET as an adverb that takes either '',
> datastructure_name, or a specific instance of datastructure can optimize
> itself as a first step, or one that can be bound to an optimized named
> function, or if '' is the adverb parameter to GET, then the generic verb
> "inspect y for datastructure properties" before selecting implementation is
> returned.
>

I think these ideas are pretty much what Lua implements with its tables
(dictionaries that can contain anything as keys and values, joined by their
metatables, i.e. tables that can contain functions to override e.g.
indexing operations). These tables do everything: from working as locales
(function environments), over separating modules (our addons) to
implementing OOP (making liberal use of the __call metamethod, specifying
what happens if you calln a table as if you were calling a function, and
__index, specifying what happens if you try to get a non-existent key in a
table).

In my view, the problem with a locale-based dict implementation like mine
is currently that you cannot nest dicts without loosing generality.
As numbered locales are referred to by boxed numbers, you could make a
special case for these in your implementation, but would evidently loose
the possibility to store boxed numbers. Even when adding checks to whether
a boxed number is a locale, one cannot be sure the user intended to refer
to a locale or actually wanted to store a boxed number.

One could think of using the locales themselves as dicts, but there you'd
have the problem that:
- only valid names can be keys
- referring to values is only possible with dict__key, which precludes
doing so tacitly.

For such implementation to work, one could (note, I have no clue about the
implementation itself :p):
- make a datatype only for referring to locales
- implement indexing into that type with {:: following more or less the
same idea as indexing with {::
- providing a verb to amend along the same lines
- have a conjunction DoneIn that allows something like verb DoneIn mylocale
(could be called 'of' as well)
- allowing any value as "name" in locales.

Like that, implementing a dict that allows storing arbitrary keys and
values, nesting dicts and even self-reference, reference loops etc, using
locales would become possible.

In the end, I guess this would end up at about the same functionality as
Lua does for tables… so I don't know what's more effort: implementing
everything in J/C, or binding Lua. There's been a time I would have loved
to have Lua instead of J's explicit language, but I guess that would end up
as a different language :).

Jan-Pieter

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] A new bind conjunction WAS: Dictionaries

Reply via email to