Jon is a text data transfer format similar to JSON  encoding is simply:

JON =: lr_z_ =: 3 : '5!:5 <''y'''

rest of code:

cocurrent 'jon'
safe1 =: '''_0123456789+*-<>|;,#ijL{()a[]'
safe2 =: 'u:';'x:';'}.';'}:';'" ';'":';'! ';'$ ';'= ';'^ ';'^.'
issafe1d =: safe1 e.~ {. &>
issafe2d =: safe2 e.~ 2&{. each
isSafe =: */@:(issafe2d +. issafe1d)@:;:

doSafe =: 0 0&$`".@.isSafe

doSafed =: 3 : 0
o =. doSafe y
if. o = i.0 0 do. w=. ;: y
o =. 'terms not safe1' ; (] #~ -.@:issafe1d) w 
o , 'terms not safe2' ; (] #~ -.@:issafe2d) w return. end.
o
)


the verbs ending in d are not necessary, but are there for debugging when an 
expression doesn't unencode (JOFF)

JOFF =: doSafe_z_ =: doSafe_jon_


The advantage of this data format compared to JSON, is that it can be much more 
concise, and for J, orders of magnitude faster than encoding or decoding JSON.  
Also, J's library for JSON is limited in what it can validly parse.  JON can 
encode/decode any valid J noun no matter how deeply nested.  For short nouns, 
it can be more space efficient than 3!:1 encoding.  JON also encodes binary 
data (0 1 255 {a.) along side text data

   lr (i.10);( 5 * i.10);(2| i.10);(10 2 $ 3 4);'               '
(i.10);(5*i.10);0 1 0 1 0 1 0 1 0 1;(10 2$3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 
4);15$' '

   doSafe  lr (i.10);( 5 * i.10);(2| i.10);(2 2 $ 3 4);'               ' 
┌───────────────────┬───────────────────────────┬───────────────────┬───┬───────────────┐
│0 1 2 3 4 5 6 7 8 9│0 5 10 15 20 25 30 35 40 45│0 1 0 1 0 1 0 1 0 1│3 4│       
        │
│                   │                           │                   │3 4│       
        │
└───────────────────┴───────────────────────────┴───────────────────┴───┴───────────────┘
   doSafe  '(i.10);( 5 * i.10);(2| i.10)'
┌───────────────────┬───────────────────────────┬───────────────────┐
│0 1 2 3 4 5 6 7 8 9│0 5 10 15 20 25 30 35 40 45│0 1 0 1 0 1 0 1 0 1│
└───────────────────┴───────────────────────────┴───────────────────┘

Although its intended to reverse the output of lr, doSafe can process a subset 
of J, and so instead of using lr, it's possible to craft expressions that are 
shorter than what lr produces.  doSafe is also useful as a server/sandbox 
execution environment, and sending/receiving records and lists as single line 
statements.

   doSafe  lr ; , 2{"1 ] 1 (5!:7) < 'doSafed_jon_'
o=.doSafe yif.o=i.0 0do.w=.;:yo=.'terms not safe1';(]#~-.@:issafe1d)wo,'terms 
not safe2';(]#~-.@:issafe2d)wreturn.end.o

The way doSafe works is that first an expression is put through ;: .  There is 
a list of safe 1 character prefixes which includes ' and numbers, and a few J 
verb families.  i in this list, whitelists both i. and i:.  There is also a 
separate whitelist of 2 character prefixes, that can whitelist '" ' and '":' 
without "..  '= ' without =. or =:  This means that only quoted text (as data) 
can be run.

The number of whitelisted content could be expanded.  'NB.' could be allowed by 
creating a safe3 list.  Many ommitted conjunctions and verbs don't need to be.  
By adding some defined verbs inside the jon locale, those could also be 
whitelisted.  For instance the verbs v0,v1...v9 could all be "local variables" 
set by calling a0...a9.
with current implementation, this is not allowed ('/' not whitelisted)

doSafe  '(+/ % #) 1 2 3 '


but
   doSafe  '(+: + #) 1 2 3 '
5 7 9

multiline processing is also possible:

    doSafe lr '3213', LF , '222'
3213
222

      doSafe  ;. _2 doSafe lr '3213', LF , '10+20' ,LF      NB. if you expected 
multiline expressions
3213 30

   doSafe  ;. _2 doSafe lr '3213', LF , '+\20' ,LF,'23423',LF
 3213

    0  NB. not allowed in one of the lines

23423


My big question though is have I overlooked any potential unsafe code that 
could be run with doSafe?
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to