>>I believe to overcome this would
be possible if J could JIT-emit the machine code for the inside of the
loop so that the compound expression be calculated at the same
cost as a primitive expression.

>From the memory footprint, it does appear as though the code is copied to new 
>memory space along with stack/data, with every recursion.  Even $: does it. I 
>guess there is a possibility that code could be redefined during execution, 
>and in addition to direct recursion, a cycle of function calls is possible.
Some checks that *might* be worthwhile to make (to see if multiple copies of 
code are unnecessary):
-If a verb has no =: statements inside it, then it cannot redefine other 
functions.
-I'd be comfortable with the assumption/feature that any useful global 
assignment (=:) inside a function that results in a noun the first time it is 
executed, will continue to produce nouns on subsequent calls (and therefore 
does not result in verb reassignment)

To Henry,

I was too specific in describing the pattern.  The general usefulness of the 
pattern is mostly due to its repetitiveness.
Exit conditions are customarily put at the top, followed by any validation, 
then the work code (that is usually a function of both head and rest).  J is 
actually very good syntactically for writting lisp-like code because of its 
shorthands. 

One of the first projects I tried in J, was to make an alternate parser such 
that the result type of a sentence and its words could be determined without 
executing the sentence, and so syntax colouring could be possible, and greatly 
facilitate human parsing of the language.  I'm sure many of you could write 
prettier code, but as something fairly challenging, and so uncertain how 
corrections might need to be weaved into the solution later, this style of 
coding was the only method approachable to me. 

tester TEST will show sample output.  The legend (definitions) as to what each 
boxed number means is about a page down.(There's probably some dead code in 
there, and I haven't looked at it in a while. sorry its not pretty)

---Paste rest of file to .ijs


Note 'Goals'
*Determine syntax without executing.
*Never return error, or fail to evaluate sentence.
*Support optional meaningful whitespace, and other extendability.
*Extensibility includes special cases for key builtins such as : `
)

Note 'Purpose'
*Syntax colouring
*Point out left dyad args, verb phrases (bonded conj and adv), Global assigns.
*Formatting can include verb ranks
*Illustrate the extreme difficulty of reading J with unfamiliar definitions... 
without such a syntax helper.
)
Note 'Design'
sentences parsed into tree structure of groups (nodes) of words (leafs)
each word is classified into part of speech
groups (parentheses, assignments, bonded verb phrases) 
have additional tail element to classify whole group.
The whole sentence is also a group.
Chained Lisp-style parsers evaluate sentence in multiple passes so that 
features are injectable.
)
Note 'Todo'
id dyads
pretty print parsing with abreviations and without boxes
id control words 
id special words
id trains
)

P=: Pspaced
Pspaced=: 3 : 0
t=.  Spacer y NB. can use t=. ;: y if spaces are ignored.
cDyads Nestclassify2 Tclassify2 t
)
Pwords=: cDyads @: Nestclassify2 @: Tclassify2 @: ;:
    NB. alt parser.
Palt1=: 3 : 0
t=. whiteformatrule1 Spacer y NB. can use t=. ;: y if spaces are ignored.
cDyads Nestclassify2 Tclassify2 t
)

Spacer=: 3 : 0 NB. standardises any initial spaces between words to 1.
    NB. for each token, if not at pos+len of last token, insert space.
y=. (dltbs_gramatize_ y)
b=: ;: y
}: b placeSpace y

)
NB. see whitespace.ijs
whiteformatrule1=: 3 : 0
NB. if token is a name, then ensure space follows and precedes.
a=. >: I. > isname2 each y
if. (#y) <: {: a do. a=. }: a end. NB. no space after last
for_i. (|. a) do. 
    if. -. (, each '[:';' ';'=.';'=:';')') e.~ i{y do. y=. i insertat (<' ')  ; 
y end.
end.

a=.  <: I. > isname2 each y
if. 0 > {. a do. a=. }. a end. NB. no space b4 first
for_i. |. a do. 
    if. -. ( , each '[:';' ';'=.';'=:';'(') e.~ i{y do. y=. (>: i) insertat (<' 
') ; y end.
end.
y 
)

placeSpace=: 4 : 0
found =. {.y i. >{. x
if. 0 =  found do. r=. ({. x) else. r=. (<' '), ({. x) end.
if. 0=#x do. return. end.
r , ((}.x) placeSpace ( (found) +#>{.x) }.y) 
)


NB. Definitions
u_gramatize_ =: 1024 NB. undefined name, and unresolved groups
n_gramatize_ =: 1 NB. noun
a_gramatize_ =: 2 NB. adverb
c_gramatize_ =: 4 NB. conjunc
v_gramatize_ =: 8 NB. verb
s_gramatize_ =: 16 NB. space
e_gramatize_ =: 32 NB. assign (=. =: and lh noun to assign)
g_gramatize_ =: 2^15 NB. gerund
ctrl_gramatize_ =: 2^16 NB. control word

p_gramatize_ =: 64 NB. explicit parens
NB. l_gramatize_ =: 512 NB. left bond (conj/adv or dyad arg)

vp_gramatize_ =: 128 NB. verb phrase (resolved conj/adv)
d_gramatize_ =: 512 NB. dyad execution
t_gramatize_ =: 2^11 NB. train group
tm_gramatize_ =: 2^12 NB. train (member) verb -- any verb part of a train
tp_gramatize_ =: 2^13 NB. train pivot verb (middle of fork verb, left hook)-- 
or'd with previous
cap_gramatize_ =: 2^14 NB. groups verbs to its right as an 'nontrained' verb 
phrase
ge_gramatize_ =: 256 NB. global assign (=: and lh noun to assign and group it 
applies to)
glbl_gramatize_ =:ge_gramatize_ + e_gramatize_


upfltr=: 4 : 0
NB. update filtered y. x updateV`fltrV,
   '`up f' =. x
   xx=. f y
   (up xx#y) (I. xx)}y
)

Modupfltr=: 4 : 0
NB. update filtered y. x updateV`fltrV.
NB. mod verb filters the filter (drops or adds indexes
   '`mod up f' =. x
    NB.smoutput 'muf';(mod I. f y);I. f y
   (up ((mod I. f y){ y)) (mod I. f y)}y
    
)

NB. used for running count of open parentheses matching including determining 
invalid sequence.
listflips=: 4 : 0 NB. x has running total. when set to 0 return val.if run out, 
return -1 for error.
    NB. y is 2 lists.  Typically sorted positions of bracket tokens. find 
original matching close.
     NB. if original x is 1, finds first 1 in diagonal a >"0 1 b.
 'a b' =. y
 if. 0=#a do. a=. _ end.
 if. 0=#b do. _1 return. end. NB. out of cookies
 if. ({.a) < ({. b) do. 
    x listflips (}.a);b [ x=. >: x 
 else. 
    x=. <: x
    if. 0=x do. {.b return. end.
    NB.if. 1=#b do. -1 return. end. NB. out of cookies
    x listflips (a);}.b    
 end.
)

bwand     =: 17 b.  :: 0:  NB. bitwise and ret 0 if error (string/boxed etc)
workcount=: 3 : 0
    NB. determine working words of boxed sentence y
    NB. (exclude spaces  =. lh noun to =.)
    NB. assumed precluded are ()... generalize further to exclude.

y=. toplvl2 y
c =. (# y)- ((+/"1 ( s_gramatize_)= (s_gramatize_ & bwand) > y )) 
c , (((+/"1) 0<(e_gramatize_ & bwand) > y ))    NB. Just used by Gclassify. can 
kill tail.
    NB. output list so client can double tail if assign noun not flagged yet. :(
)
tokensIn=:   I.each@:<"1@:([ =/ ])
toplvl=:  ]`({:@:>)@.(1&<@L."0)@:}:
toplvl2=:  ]`({:@:>)@.(1&<@L."0) NB. without cuttingoff tail.

Tclassify =: 3 : 0
 NB. Classifies tokens with blank trailer (trailer serves to classify group) 
filled in for parens)
 if. 0=#y do. return. end.
 'a rest '=.xxs y
 a=. > a 
 NB. todo: preparse control words and make subsentences for content in between.
 if. '('=a do. NB.scan for matching )
    t=.1 listflips (,&.>'()') tokensIn rest
    if. _1=t do. t=.#rest end.
    r=.  Tclassify t{. rest
    if. 0=#r do. r=.< p_gramatize_ NB. ()
    else. r=. (< p_gramatize_) (_1)} r end.
    rest=. (t+1)}. rest
 elseif. ')'-:a do. NB. error, just put space
    r=.  s_gramatize_
 elseif. ' '-:a do. NB. space
    r=.  s_gramatize_
 elseif. '=.'-:a do. NB. assign
    r=.  e_gramatize_
 elseif. '=:'-:a do. NB. global assign
     r=. glbl_gramatize_ NB. includes e_gramatize
 elseif. do.
    t=. class_gramatize_ a
    if. 9>t do.
        r =. ((i. 1:) t = 1 4 8 2) NB. returns 0 1 2 or 3. noun, verb, adv, conj
        r =. r{ n_gramatize_ ,a_gramatize_ ,c_gramatize_ ,v_gramatize_ 
    else. NB. name
        r =. 4!:0 < a 
        r =. r{ n_gramatize_ ,a_gramatize_, c_gramatize_ ,v_gramatize_ , 
u_gramatize_ , u_gramatize_
    end.
 end.
  r ; ]t=. Tclassify rest 
)

Tclassify3=: 3 : 0
if. 0=#y do. return. end.
'a rest '=.xxs y
a=.> a
b=. > {.nextnonspace rest
if. ((>{: a) bwand (n_gramatize_ + u_gramatize_)) *. ((e_gramatize_-: 
b)+.(glbl_gramatize_-:b))  do. 
    NB. assignment: update noun phrase to left.
    if. L. a do. a =. b UpdateGroupVal a  
        else. a=. a  23 b. b end.
    NB. updates assigned nouns. presumes valid.
end.    
         
if. 0<L.a do. NB. do full group
    a=.  Aclassify@:  Gclassify@: }: Tclassify3 a
end.
a ; Tclassify3 rest 
)
 Tclassify2=: Aclassify@:Gclassify@:Tclassify3@:}:@:Tclassify

    NB. or's x with exsiting classification of group
UpdateGroupVal=: 4 : '(x 23 b. each {: y) (_1) } y'

preUpdateGroupVal=: 4 : 0
if. (1=#y) *. 1<L.y do. x UpdateGroupVal > y
elseif. 0=L.y do. x 23 b. {: y (_1) } y
elseif. do.x UpdateGroupVal y end.
)
Nestclassify =: 3 : 0
 if. 0=#y do. return. end.
 'a rest '=.xxs y
 a=.> a
 if. 0<L.a do.
    NB. while. 1< L.a do. a=. Nestclassify a end.
    
    a=.  Bigclassify Nestclassify a
    
    NB.smoutput a
     
 end.
a ; Nestclassify rest  
)
Bigclassify =: }:@:Advclassify@:Cclassify
Nestclassify2 =: Advclassify@:Cclassify@:}:@:Nestclassify
    NB. Makes assignment groups
Aclassify=: 3 : 0
    c=. I. (> toplvl y) = glbl_gramatize_ bwand > toplvl y
    c Aclassify2 y
)
Aclassify2=: 4 : 0
    if. 0=#x do. y return. end.
    v=. # >1 { nextnonspace |. ({: x) {. y
    y=. y, {:y
    y=. y GroupUp~ (v -~ {: x), _1 
    y=. (}: x) Aclassify2 y
)
    NB. scan backwards through group(s)
cDyads=: 3 : 0
    NB. If group is Noun, all nouns inside are left dyad args
    NB. If group is Verb, all bonded verb phrases are not nouns
    NB. If group is Noun, all bonded verb phrases to left of nouns are not nouns
    NB. No useful coding for 'not noun' exists, and so not coded.
gval =. {: y
if. 0< +./ > L. each y do.
    y=. (cDyads each)`([: (0&<) [: > L. each) upfltr y NB. recurse through tree.
end.
if. n_gramatize_= n_gramatize_ bwand >gval do.
    NB. strip last noun, update all others as left dyad args    
    y1=. nextnonspace |. toplvl y NB. presumes any noun sentence ends in a noun
    last=. (1{y1) , {:y
    y1=. 2}. y1
    NB.smoutput 'nouns';(n_gramatize_& bwand@:>) toplvl y
    y=. }:`((d_gramatize_ &preUpdateGroupVal each))`(n_gramatize_& bwand@:>@: 
toplvl ) Modupfltr  y
else. y return. end.
)

GroupUp =: 4 : 0
    NB. Turns leaves into node
    NB. x = left right positions. y is data. 
    NB. negative right preferred.
    'l r' =. x
    if. 1=#x do. r=.0 end.
    if. r>0 do. r=.r-#y end.
    t=.< r}. l}. y
    (l {. y),t,(r){.y
)
    NB. drops words from sentence
trimassigns=: }:@: ((e_gramatize_+s_gramatize_) & trimx)
trimassigns2=: }:@: ((e_gramatize_) & trimx)
trimx=: 4 : 0
    if. 0=#y do. return. end.
    'a rest '=.xxs y
    a=.> a
    if. a bwand (x) do. x trimx rest
    else. a ; x trimx rest end.
)

    NB. If len 1, ret class of item\
    NB. If len 2+, check for lead or trail conjunction or lead adv. ret adv if 
conj
    NB. Else if last item Noun, Noun else Verb
Gclassify=: 3 : 0
    NB. For group (paren), id result. Tail contains classification.
    t=. > {:  y
    if. 0=#t do. t=.0 end.
    y1=.y
    y=.  trimassigns toplvl  y
    num=. {. workcount y
    if. 1=num do.
        r=. >{. nextnonspace y
    elseif. 0= num do.    NB. error but return space
        r=.  s_gramatize_
    elseif. do. NB. if begins or ends with conj, then adv.
        rest=. >{.nextnonspace  y
        if.    rest bwand (c_gramatize_+a_gramatize_) do. r=.a_gramatize_  NB. 
if lead edge is adverb or conj then sentence is adverb or syntax error.
        else. rest=. >{.nextnonspace |. y NB. last nonspace
            if. rest= c_gramatize_ do. r=.a_gramatize_  NB. If one of 2 edge 
tokens is conj, then phrase is adverb.
            else. NB. will be noun or verb depending on last term
                if. n_gramatize_ bwand rest do. r=.n_gramatize_
                else. r=.v_gramatize_ end. NB. not conj or noun
            end.
        end.
    end.
    (< r 23 b. t ) _1 } y1 
)
    NB. return head (x) and rest (xs). If #y=1, rest is empty.
xxs =: ({.);(<@:}.)

nextnonspace=: 3 : 0
 NB. returns 3 boxes. 1st non space ; header leading and with non space ; tail
 if. 0=#y  do. '' return. end. 
 t=. {. I. > s_gramatize_ i. each toplvl2 y
 if. (0=#t)  do. '' return. end.
 (>t{y) ; ((t+1){.y); (t+1)}. y
)

    NB. Find all conj and pair them with bondings
    NB. Then in leftover top level bond up adverbs.
    NB. Bonding is first noun/verb on left, and rightmost adverb between 2 
consecutive [n or v]
Cclassify=: 3 : 0
    NB. ID conj/adv and bond their arguments into groups
c=. I. > -. c_gramatize_ i. > toplvl y 
if. 0=#c do. y return. end.
if. 0={.c do. ({.y) , Cclassify }.y return. end. NB. remove leading conj. 
sentence is adverb.
v=. (v_gramatize_+n_gramatize_+vp_gramatize_) & bwand
a=. |.({.c){. toplvl y
l=.{.c
NB. smoutput ({.a) , nextnonspace a
while.0=v >{.a=. nextnonspace a do.
    n=.#>1{a
    a=. 2 }. a
    if. 0=#>{.a do. break. end.
    l=.l-n
smoutput ({.a) , nextnonspace a
end.
if. 0=>#each(2}.a) do. l=. 0 else. l=.l-#>1{a end. NB. else. l=.l-#>1{a
        NB. left bonding position established.  Now find right bond.
if. 1=#c do.
 goto_find1.
else.
 smoutput ] a=. ({.c)}.(1 {c){. toplvl y NB.sentence till next conjunction.  
Find 2nd V+N
 NB.smoutput (0&<) v (> a)
 while. (2> num=. +/ (0&<) v >  a) do. NB. 0 or 1 nouns/verbs. 
  if. 0=num do. goto_find0. end. NB. trailing conj.- separate out. makes adverb.
  c=.}.c
  if. 1=#c do. 
    if. 0=+/ (0&<) v > ({.c)}.toplvl y do. goto_find0.
    else. goto_find1. end. end.
  a=. ({.c)}.(1 {c){. toplvl y NB.sentence till next conjunction.  Find 2nd V+N
 end.
 NB. 2 verbs/nouns
 goto_find1.
end.

return.
NB. J needs gosub, or ability to insert labels within other control structures.
label_find0.
    r=.({.c)
    y=. r insertat (< vp_gramatize_);y
    y =. (l, r+1) GroupUp y
return.
label_find1. NB. first v+n then adv string
    a=. ({.c)+ (0&< i. 1:) v ({.c)}. toplvl y NB. find next v or n
    r=. a + skipadverbs (a+1) }. toplvl y
    NB.smoutput ](<'next r'), (<r) ,(r }. toplvl y)
    r =. r + # > (1&{) :: (<0"0) nextnonspace (r }. toplvl y)
    y=. r insertat (< vp_gramatize_);y
    y =. (l, r+1) GroupUp y
    Cclassify y
)
NB. todo: bug on P '[EMAIL PROTECTED] 2'

insertat =: 4 : 0
    NB. x is pos y is item;list
'item list' =.xxs y
item =. >item
(x {. list), item , x}. list
)

leadadvstring =: (0&= i. 1:)@:((a_gramatize_+s_gramatize_)&bwand)@:>
    NB.return offset (to skip) to first non adverb. Ignore spaces.
    NB. version below is less buggy. above returns position of space (2) if in 
'/ ' while below would return 1.
skipadverbs=: 3 : 0
NB. gets offset to next non adverb that also is not a space.
t =. nextnonspace y
if. 2 bwand >{. t do. 1+(({.t) i.~ >{. }. t ) + skipadverbs 2}. t 
else. 0 end.
)

    NB. Called post Cclassify/Aclassify.  Right to left, bond adv phrases.
Advclassify=: 3 : 0

v=. (v_gramatize_+n_gramatize_+vp_gramatize_) & bwand
c=. I. > -. a_gramatize_ i. > toplvl y 
if. 0=>#each c do. y return. end.
num=. {:c
num=. 1-~num- a=. leadadvstring |. trimassigns2 (num {.  toplvl y )
flag=. 0
if. e_gramatize_ bwand > num{ toplvl y    do. (flag=.1)  [ ('num' ahl >:) end.
if. num <0 do. flag =. -. num =. 0 end.
if. -. flag do. flag=.vp_gramatize_ 
    y=. ({:c+1) insertat (< flag);y
    y =. (num, 2+{:c) GroupUp y
     Advclassify y
else. y end.
)



isname2=: (1:=#@;:) *. isname_gramatize_ NB. determines if word is a name (text 
comment returns true too) instead of primitive or literal number
cocurrent 'gramatize'

    
NB. from strings.ijs
dltbs=: LF&$: : (4 : 0)
txt=. ({.x), y
a=. txt ~: ' '
b=. (a # txt) e. x
c=. b +. }. b, 1
d=. ~: /\ a #^:_1 c ~: }: 0, c
}. (a >: d) # txt
)

NB. from jtrace.ijs
(x)=: 2^i.#x=. ;:'noun verb adv conj lpar rpar asgn name mark'
isname=: ({: e. '.:'"_) < {. e. (a.{~,(i.26)+/65 97)"_
                      NB. 1 iff a string y from the result of ;: is is a name

class=: 3 : 0         NB. the class of the word represented by string y
 if. y-:mark do. mark return. end.
 if. isname y do. name return. end.
 if. 10>i=. (;:'=: =. ( ) m n u v x y')i.<y do. 
  i{asgn,asgn,lpar,rpar,6#name return. 
 end.
 (4!:0 <'x' [ ". 'x=. ',y){noun,adv,conj,verb
)

    NB. NS created to allow for alternate implementations such as rendering to 
canvas.
    NB. routines to format source code as rtf.
coclass 'rtfg'
coinsert 'gramatize'

    NB. walk group depth first.
Walker=: 4 : 0
    NB. x: remaining words, y: remaining structure.
    NB. when {.y group, apply group formatting, insert formatted subtokens in 
place.

)
out=: , v

cocurrent 'base'
NB. Utilities/Tester content
SIDE=:4 : 'x `:6 &. >y'"0
ah =: 2 : ' (m)=: v@:". m'
ahl =: 2 : ' (m)=. v@:". m'

OR =: 4 : 0
for_i. i.(#x) do.
 if. 0 -. (((i_index{x)`: 6) > i_index{y) do.  1 return. end.
end.
0
)
AND =: 4 : 0
for_i. i.(#x) do.
 if. 0 = (((i_index{x)`: 6) > i_index{y) do.  0 return. end.
end.
1
)
    NB. strips spaces, seperates groups with spaces.
Pr =: 3 : 0
 (": ;._2 'n a c v =. () b =: d u '''' ') {~ 1 2 4 8 32 64 128 288 512 1024 i. 
1 2 4 8 32 64 128 288 512 1024 bwand
)

NB. RUN --tester TEST-- to display test cases. P 'sentence' for individual parse
tester =: ,.@:<@: (P"1 , <@:])  @:  ": ;._2 
TEST=: 0 : 0
+:(@: (asd=.+@:+//)) dd=. +/
(+/ % #) 4 5 6
'asdf' ah >: +/\ (fd=. 4 3 2 - 1)
fds AND@:+: 5 ; (gfd=.3)
& +:@*:@:+: fd=./\
4&[EMAIL PROTECTED]:@:+: (fd=./\) @:
as=. 5+(dd=: 2*3)- ass=. 3+4
)
NB. Todo: bug with '+:(@: (asd=.+@:+//)) dd=. +/'

NB. J bug: adding below to TEST will confuse J even if it parses with P
NB. +/ ( ) + () ))-) (4(((fds=.) NB. minor spec bug... extra ) classified as 
undefined isntead of space.





      Be smarter than spam. See how smart SpamGuard is at giving junk email the 
boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to