Linas, In reference to your "language learning diary" ... and specifically to equation (1)... I have an intuition it may be better to look at an asymmetric analogue of equation (1) ...
Looking at the case of "Northern Ireland", it seems to me that what we want to compute is -- how much surprise value does "Northern Ireland" give relative to "* Ireland" ? [i.e. how much is Northern "attracted to" Ireland] -- how much surprise value does "Northern Ireland" give relative to "Northern * " ? [i.e. how much is Ireland "attracted to" Northern] more so than a symmetrized version... Considering first the case Attraction(Northern Ireland | * Ireland) what we want here, qualitatively, is something like: "Out of all cases where Ireland occurs, what percentage contain Northern to the left, versus what percentage would one expect to contain Northern to the left based on independence?" Percentage that contain Northern to the left = P(Northern Ireland) / P( * Ireland) Percentage one would expect to contain Northern to the left based on independence = P(Northern *) So then a first-pass measure of attraction might be [P(Northern Ireland) / P( * Ireland) ] - P(Northern *) This is different from P(Northern Ireland) / (P(* Ireland) P(Northern *) ) which is what you seem to be calculating [in eq. (1) ] In this case, in a typical large corpus, "Northern Ireland" will have more attraction value relative to "Ireland" than to "Northern" (i.e. Northern occurs before Ireland quite surprisingly often; whereas Ireland occurs after Northern only somewhat surprisingly often) Think about "big cat" as another example. If bigness is a common attribute of cats, but there are lots of big things besides cats, then the attraction of big to cat will be larger than the attraction of cat to big... My guess is that in natural language, most of the time, the MODIFIER has more attraction to the MODIFYEE than vice versa.... I.e. most nouns have a certain set of key properties; but most properties apply to a lot of different nouns. Similarly, most verbs have certain not that huge set of key properties; but most properties apply to a lot of different verbs... If this guess is correct, then calculating asymmetric attraction as I have suggested, and then finding a maximum-total-attraction spanning digraph of a sentence, will lead to a digraph that approximates a "link parse with directed edges" of a sentence... As you recall the lack of a "head" (a direction) for a link is a big difference btw link grammar and word grammar, and I suspect that having heads for the link is valuable... I also think that having directedness to links is going to be valuable for inferring abstract semantic relations from this statistical data, because directness gives you head-dependent relationships, which tend to translate into semantic predicate-argument relationships, etc. -- ben -- Ben Goertzel, PhD http://goertzel.org “Our first mothers and fathers … were endowed with intelligence; they saw and instantly they could see far … they succeeded in knowing all that there is in the world. When they looked, instantly they saw all around them, and they contemplated in turn the arch of heaven and the round face of the earth. … Great was their wisdom …. They were able to know all.... But the Creator and the Maker did not hear this with pleasure. … ‘Are they not by nature simple creatures of our making? Must they also be gods? … What if they do not reproduce and multiply?’ Then the Heart of Heaven blew mist into their eyes, which clouded their sight as when a mirror is breathed upon. Their eyes were covered and they could see only what was close, only that was clear to them.” — Popol Vuh (holy book of the ancient Mayas) -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBfiVzXSwZATp1NMcaaQ_b9QT5nFZmh_1TcZ2sfBm5c4nQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
