Bryan, Perhaps we should move this off list soon unless anyone else is following.
On Sun, Oct 2, 2011 at 6:16 AM, Bryan Jurish <[email protected]> wrote: > > defense is Thursday ;-) Good luck. >> In particular what I'm doing which is new is trying to find a >> principled way to syntactically combine these word level >> representations, as vectors, without abstracting them into categories >> first. > > Sounds very cool. I think I heard Katrin Erk talk about something > similar at ACL 2010 That's interesting. I had a long correspondence with Katrin about the idea in 2007. I didn't know she had taken it anywhere. >> In detail it means I'm trying to build context vectors for >> unobserved pairs of words, by substituting contexts for observed >> pairs between their respective elements. > > Sounds a bit like a brand of smoothing; although I'm not really sure > what "substituting contexts for observed pairs between their respective > elements" means here; unless maybe it's a sort of transitivity, e.g. you > have {ab,cd,be,de} and to compare "a" and "c" you compose the adjacency > relation to get {a(b)e,c(d)e} in order to get a shared attribute ("*e") > for "a" and "c"... I guess that would explain your interest in the > cross-product too, but I'm just guessing... It is just an iteration of the common context assumption (which I thought was Harris's, but I'm happy to give Firth and Leibnitz credit.) So, say in the usual formulation you might allow common contexts to license a new combination (taking an actual example of ESL student innovation from Peter Howarth, 1998, 'Phraseology and Second Language Proficiency', Applied Linguistics 19/1): *"Those learners usually _pay_ more _efforts_ in adopting a new language..." *pay effort PAY attention/a call MAKE a call/an effort Think of this new production as being licensed diagrammatically: pay \ a call / make \ an effort Where in terms of categories the student is putting "pay" and "make" in the same category because of their common context "a call". And "a call" and "an effort" in the same category because of their common context "make". I just do the same, but I carry it on without actually explicitly forming a category. So taking the broader context: pay / \ 1 a call / make 2 \ / an effort 1 and 2 are added to a new prior and after context vector for the analogized new production "pay an effort", and you can just go on combining contexts in this way and analogizing new context vectors for their combination. Exactly how much weight you can put on analogized context vectors like this I don't know. I don't know for instance if it is possible to code the syntax of an entire sentence as such an analogized context vector. An earlier implementation did quite well. >> If your CCS's are a sparse implementation with arrays that might be a >> win for me. Would they be a lot smaller than Perl hashes, do you >> think? > > Definitely, since you eliminate the overhead needed for gazillions of > perl scalars. And faster, too; at least in every case I've tested. That sounds good. I'll have to look into it. -Rob _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
