Bryan,

Perhaps we should move this off list soon unless anyone else is following.

On Sun, Oct 2, 2011 at 6:16 AM, Bryan Jurish
<[email protected]> wrote:
>
> defense is Thursday ;-)

Good luck.

>> In particular what I'm doing which is new is trying to find a
>> principled way to syntactically combine these word level
>> representations, as vectors, without abstracting them into categories
>> first.
>
> Sounds very cool.  I think I heard Katrin Erk talk about something
> similar at ACL 2010

That's interesting. I had a long correspondence with Katrin about the
idea in 2007. I didn't know she had taken it anywhere.

>> In detail it means I'm trying to build context vectors for
>> unobserved pairs of words, by substituting contexts for observed
>> pairs between their respective elements.
>
> Sounds a bit like a brand of smoothing; although I'm not really sure
> what "substituting contexts for observed pairs between their respective
> elements" means here; unless maybe it's a sort of transitivity, e.g. you
> have {ab,cd,be,de} and to compare "a" and "c" you compose the adjacency
> relation to get {a(b)e,c(d)e} in order to get a shared attribute ("*e")
> for "a" and "c"... I guess that would explain your interest in the
> cross-product too, but I'm just guessing...

It is just an iteration of the common context assumption (which I
thought was Harris's, but I'm happy to give Firth and Leibnitz
credit.)

So, say in the usual formulation you might allow common contexts to
license a new combination (taking an actual example of ESL student
innovation from Peter Howarth, 1998, 'Phraseology and Second Language
Proficiency', Applied Linguistics 19/1):

*"Those learners usually _pay_ more _efforts_ in adopting a new language..."

*pay effort
PAY attention/a call
MAKE a call/an effort

Think of this new production as being licensed diagrammatically:

pay
     \
      a call
     /
   make
     \
      an effort

Where in terms of categories the student is putting "pay" and "make"
in the same category because of their common context "a call". And "a
call" and "an effort" in the same category because of their common
context "make".

I just do the same, but I carry it on without actually explicitly
forming a category. So taking the broader context:

   pay
    /  \
 1    a call
      /
    make    2
       \        /
       an effort

1 and 2 are added to a new prior and after context vector for the
analogized new production "pay an effort", and you can just go on
combining contexts in this way and analogizing new context vectors for
their combination.

Exactly how much weight you can put on analogized context vectors like
this I don't know. I don't know for instance if it is possible to code
the syntax of an entire sentence as such an analogized context vector.
An earlier implementation did quite well.

>> If your CCS's are a sparse implementation with arrays that might be a
>> win for me. Would they be a lot smaller than Perl hashes, do you
>> think?
>
> Definitely, since you eliminate the overhead needed for gazillions of
> perl scalars.  And faster, too; at least in every case I've tested.

That sounds good. I'll have to look into it.

-Rob

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to