[opencog-dev] Re: 100 sentences for GC

Linas Vepstas Mon, 01 Apr 2019 15:40:13 -0700

OK, There's clearly a lot ow work happening in linguistics these days, that
I have fallen behind on reading.


The nature of the conversations here has been frustrating, because so far,
it sounds like an attempt to evade the  "central limit theorem" --
https://en.wikipedia.org/wiki/Central_limit_theorem

There are two related ideas I'm trying to get across: one is that if you
make enough observations of a phenomenon, eventually, the central-limit
theorem kicks in, and smooths over random variations.  Specifically, I
claim that, despite MST being imperfect, a large number of observations
should smooth over the imperfections. I believe this to be true, (but I
could be wrong).

The other idea is that the golden test corpus must avoid accidentally
testing disjuncts far away from the central limit -- to avoid, as it were,
making statements analogous to "Well, I flipped the coin three times, and I
did not get 50-50 odds, therefore the theory doesn't work". You have to
flip the coin at least N times, for some large N.  Here, for MST, we don't
know  how big N has to be, we don't have a good plan for determining N.
It's worse, cause everything is Zipfian aka 1/f noise. It is possible that
BERT or other approaches allow smaller values of N to work, but this is
also not clear.

Its also not clear that BERT would converge to a different limit than MST -
the central-limit theorem says there is only one limit -- not two. But
perhaps I'm misapplying it, perhaps I'm neglecting some important effect.
Without measurements, its hard to guess what that effect is (if it even
exists).

Anyway, I have a backlog of half-a-dozen important unread papers, so I'll
try to get around to that "real soon now".

--linas



On Mon, Apr 1, 2019 at 12:15 AM Ben Goertzel <[email protected]> wrote:

> "Replacing MST by DNN/BERT" is a strange way to put it...
>
> DNN/BERT builds a pretty complex and comprehensive language model,
> much beyond what is done by calculation of MI values and similar
>
> The extraction of a parse dag satisfying syntactic constraints (no
> links cross, covering all words in the sentence, connected graph) is a
> conceptually simple step, and nobody is spending much time on this
> step indeed...
>
> The question of how to assign a quantitative weight to the relation
> btw two word-instances in a sentence, taking into account the specific
> context in that sentence, but also the history of co-utilization of
> those words (or other similar words), is less conceptually simple and
> this is one place I think DNN language models can help
>
> Using MST or similar parsing based on numbers exported from DNN
> language models is one way of extracting symbolic-ish structured
> knowledge from these big messy subsymbolic probabilistic language
> models...
>
> The DNNs in use now like BERT do not really satisfy me on a
> theoretical or conceptual level, but they have been tuned to work
> pretty nicely and they have been implemented pretty efficiently on
> multi-GPU hardware -- so, given this and given the quality of the
> recent practical results obtained with them -- I consider it well
> worth exploring how to use them as tools in our pursuits for grammar
> and semantics learning
>
> -- Ben
>
> On Mon, Apr 1, 2019 at 2:07 PM Linas Vepstas <[email protected]>
> wrote:
> >
> >
> >
> > On Sun, Mar 31, 2019 at 10:51 PM Anton Kolonin @ Gmail <
> [email protected]> wrote:
> >>
> >> Hi Linas, I like this thread more and more :-)
> >
> > I don't. I use a lot of CAPITALIZED WORDS below.  There is a deep and
> dark fundamental misunderstanding, and I am sometimes at wits end trying to
> figure out why, and how to explain things in an understandable fashion.
> >>
> >> >But somehow, I suspect... Isn't this why OpenCog has "unified rule
> engine" (URE) instead of link grammar at its core,
> >>
> >> Linas, the "extraction of phrasemes" goal approaching has been
> discussed exactly in terms of MST->GL->URL on the last fall in Hong Kong
> discussion:
> https://docs.google.com/document/d/13YyqtGud0GAbVaFcc94kAd2LhGf7jTr5XDYgiuC294c/edit
> >>
> >> That is:
> >>
> >> 1) Do MST-parsing to get word links proto-disjuncts
> >>
> >> 2) Do Grammar Learning to cluster and conclude word categories and
> rules with disjuncts
> >>
> >> 3) Do URE-kind-of-thing to build the rules into "phrasemes" or
> "sections" or "patterns".
> >
> > Yes.
> >>
> >> However, your current discourse and our current results just show that
> "no one is be able to do reasonable MST-parsing" so the above is just waste
> of time, correct?
> >
> > No. Very much no.  I'm saying the opposite of that. You can replace MST
> by almost *ANYTHING* else, and the quality of your results WILL NOT CHANGE!
> >
> > If the quality of your results depends on the quality of MST, you are
> DOING SOMETHING WRONG!
> >
> > I'm utterly flabbergasted. I don't know how many more times I can say
> this: stop wasting time on this unimportant step!
> >>
> >> At the time we speak, Ben, Alexely, Sergey and Asuares are trying to
> use DNN/BERT magic to do the trick 1.
> >
> > I want to call this "a complete waste of time". It will almost surely
> not improve the quality of the results!  I don't understand why four smart
> people think that replacing MST by BERT will make any difference at all!
> It should not matter!  Nothing depends on this step! Anything at all,
> anything with a probability better than random chance, is sufficient!  Why
> isn't this obvious?
> >
> > If Ben is reading this: I recall talking to Ben about this in an
> ice-cream shop in Berlin, for an AGI conference, and he seemed to
> understand back then.  I have no idea why he changed his mind.  I really do
> not understand why everyone spends so much time obsessing about MST. Is
> this a "color of the bike shed" problem?
> https://en.wikipedia.org/wiki/Law_of_triviality
> >
> > MST-vs.-BERT==color-of-bike-shed
> >
> > Just use MST. It's simple. It works. It gives good results.  Stop trying
> to improve it.  The interesting problems are elsewhere!  Just use MST, and
> move on to the good stuff!
> >>
> >> To my mind, that may get possible only if the DNN/BERT magic do the
> trick having the steps 2 and 3 done under the hood. If this is done, in
> such case, we don't need to do 2 and 3 after we have the DNN/BERT-based
> model, because we can simply "milk-out" the grammar rules out of DNN/BERT
> micelium for that. And we don't need the ULL as well by the way, because we
> just need DNN/BERT and rows of different sorts of milk machines around it.
> >
> > So why are you bothering to work on ULL?
> >>
> >> So, instead of solving the problem of constructing the pipeline for
> learning grammar from raw text we need to solve the problem of milking the
> grammar out of DNN/BERT model trained on these texts, right?
> >
> > Because I don't think that you know how to milk lexical functions out of
> DNN/BERT -- We've wasted more than a year talking about MST.  Instead of
> endlessly talking about MST, you could have  JUST USED IT, WITHOUT ANY
> MODIFICATIONS, gotten good results, and spent the year working on something
> interesting!
> >
> > Again: replacing MST by DNN/BERT with something else will NOT IMPROVE
> the accuracy!  You'll have exactly the same accuracy as before, and if your
> accuracy improves, it is because you are doing something wrong!
> >
> >> However, either way, we need to understand algorithmic machinery of how
> the links assemble in disjuncts and disjuncts assemble into sections,
> through the universe-scale combinatorial explosion.
> >
> > No. That is the OPPOSITE of what ACTUALLY HAPPENS!!!!
> >>
> >> And I agree that clustering and categorizing word and links (and then
> disjuncts and sections, right) is part of the process - explicitly in ULL
> pipeline or implicitly deep in DNN/BERT darkness.
> >
> > It is NOT DEEP AND DARK.  I wrote not one but TWO PAPERS on this,
> CASTING LIGHT ON THAT DARKNESS
> >
> > I'm frustrated to the 43rd degree on why I cannot seem to have a
> reasonable conversation with any other human being about any of this.
> >
> > -- Linas
> >
> >> Cheers,
> >>
> >> -Anton
> >>
> >>
> >> 01.04.2019 9:17, Linas Vepstas:
> >>
> >>
> >>
> >> On Thu, Mar 28, 2019 at 10:22 AM Ivan V. <[email protected]> wrote:
> >>>
> >>> Linas Vepstas wrote:
> >>>
> >>> >... knowledge extraction can be done generically, and not just on
> language.
> >>>
> >>> If link grammar would be Turing complete, this might be possible right
> away.
> >>
> >>
> >> In my experience, thinking about Turing completeness is unproductive
> and a distraction.
> >>
> >>> But somehow, I suspect... Isn't this why OpenCog has "unified rule
> engine" (URE) instead of link grammar at its core,
> >>
> >>
> >> No. It has the rule-engine because back then, I did not understand
> sheaves.  I'm starting to think that the rule engine is a strategic
> mistake. The original idea is that rule-application is the main conceptual
> abstraction of term-rewriting.  One rewrites, or proves theorems by
> applying sequences of rules.  It turns out that discovering the right
> sequence is hard. Finding correct long sequences is hard - a combinatorial
> explosion.
> >>
> >> The openpsi system addresses some of these issues. Unfortunately, it's
> current implementation is a tangle of rule-selection mechanisms, and
> theories of human psychology. It's probably better than the URE, but is
> currently not as powerful.
> >>
> >> I'm trying to place a theory of sheaves as a replacement for URE, and
> as the natural generalization of openpsi, but I've successfully
> self-sabotaged myself in these efforts.
> >>
> >>>
> >>> and with URE things get much more complicated. I'm sorry, but that is
> still a Gordian knot to me, considering all of my modest knowledge.
> >>
> >>
> >> We all have modest knowledge. That is the nature of the human condition.
> >>
> >>>
> >>> On the other hand, if someone really smart would provide automatic
> grammar extraction by means of unrestricted grammar, I believe that would
> be it.
> >>
> >>
> >> Yes, that is the goal of the language-learning project.  However, as
> noted in my last email (on the link-grammar list) it is not enough to just
> learn a semi-Thue system, declare victory, and go home.  The example I gave
> there:
> >>
> >>   "I think that you should give that car a second look"
> >>   "you should really give that song a second listen"
> >>   "maybe you should give Sue a second chance".
> >>
> >> Learning to parse these "set phrases" or phrasemes is equivalent to
> learning a semi-Thue system; however, its not enough to realize that all
> three are forms of advice-giving, having "conserved" or "fixed" regions "x
> YOU SHOULD y GIVE z SECOND w" where z is very highly variable having
> millions of variations, and w only has a few dozen allowed variations.
> Note that the words "fixed", "conserved", "variable" are words used in
> genetics and proteomics and antibody structure. Its the same idea.
> >>
> >> The goal of learning lexical functions (LF's) is to learn that all
> three are advice-giving forms, and also to learn what is, and what can be
> plugged in for x,y,z,w.   So, although a super-whiz-bang grammar learner
> capable of learning context-sensitive languages should be able to learn "x
> YOU SHOULD y GIVE z SECOND w", it still will not know the *meaning* of this
> phrase.  To know the *meaning*, you have to know the acceptable ranges (as
> fuzzy-sets) of x,y,z,w.
> >>
> >> To conclude, thinking about Turing-completeness is a waste of time,
> because Turing completeness only tells you that "x YOU SHOULD y GIVE z
> SECOND w" is recursively enumerable; it does not tell you what it actually
> means.
> >>
> >> Put another way:  having a universal Turing machine is not the same as
> knowing how some particular program works. Automagically learning a
> context-sensitive grammar is not enough to know what that grammar is
> "saying/doing".
> >>
> >> -- Linas
> >>
> >>>
> >>>
> >>> Thank you,
> >>> Ivan V.
> >>>
> >>>
> >>> čet, 28. ožu 2019. u 07:58 Anton Kolonin @ Gmail <[email protected]>
> napisao je:
> >>>>
> >>>> Ben, Linas,
> >>>>
> >>>> >But we know that MST parsing is shit.  Stop wasting time on MST or
> trying to "improve" it.
> >>>>
> >>>> I think that sounds like kind of support for the concept of "dumb
> explosive parsing" being advocated for 1+ year ago:
> >>>>
> >>>>
> https://docs.google.com/document/d/14MpKLH5_5eVI39PRZuWLZHa1aUS73pJZNZzgigCWwWg/edit#heading=h.aqo9bumb3doy
> >>>>
> >>>> I also agree we other Linas'es reasoning in this thread. I would
> consider giving it a try starting next month if we don't have a
> breakthrough with DNN-MI-milking-based-MST-Parsing by that time.
> >>>>
> >>>> > can be done generically, and not just on language
> >>>>
> >>>> I think everyone in bio-informatics dreams of extracting secrets of
> "dark side of the genome" with something like that ;-)
> >>>>
> >>>> Cheers,
> >>>>
> >>>> -Anton
> >>>>
> >>>>
> >>>> 28.03.2019 1:24, Linas Vepstas пишет:
> >>>>
> >>>> Hi Anton,
> >>>>
> >>>> I've cc'ed the link-grammar mailing list, because I describe below
> some concepts for word-sense disambiguation. I'm also cc'ing the opencog
> mailing list and ivan vodisek, because after studying hilbert systems, I
> think he's ready to think about how knowledge extraction can be done
> generically, and not just on language.
> >>>>
> >>>> -- Linas
> >>>>
> >>>> On Mon, Mar 25, 2019 at 1:39 AM Anton Kolonin @ Gmail <
> [email protected]> wrote:
> >>>>>
> >>>>> Hi Linas,
> >>>>>
> >>>>> >I'd call it "interesting", but maybe not "golden"
> >>>>>
> >>>>> These are randomly selected sentences from "Gutenberg Children"
> corpus:
> >>>>>
> >>>>>
> http://langlearn.singularitynet.io/data/cleaned/English/Gutenberg-Children-Books/lower_LGEng_token/
> >>>>>
> >>>>> "Gutenberg Children silver standard" is LG-English parses:
> >>>>>
> >>>>>
> http://langlearn.singularitynet.io/data/parses/English/Gutenberg-Children-Books/test/GCB-LG-English-clean.ull
> >>>>>
> >>>>> "Gutenberg Children gold standard" is subset of "silver standard"
> with semi-random selection of sentences skipping direct speech and doing
> manual verification of the links.
> >>>>>
> >>>>> So as long as we are training on "Gutenberg Children" corpus, having
> the test on the same "Gutenberg Children" seems reasonable, right?
> >>>>
> >>>>
> >>>> Yes. You still need to verify that each word in the "golden" corpus
> occurs at least N=10 or 20 times in the training corpus. The dependency of
> accuracy on N is not generally known, but it is very clear that if a word
> occurs only N=3 times in the training corpus, then whatever is learned
> about it will be very low quality.
> >>>>
> >>>>>
> >>>>> But thanks, we may have put mire effort in removal of ancient
> constructions and words even if these are present in the corpus.
> >>>>
> >>>> If you consistently train on 19th century literature, and then
> evaluate 19th-century literature comprehension, that's fine.  Just don't
> expect it to work for 21st century blog posts.
> >>>>
> >>>> The strongest effect will be the N=number of observations effect.
> >>>>
> >>>>>
> >>>>> >Anyway -- you only indicate pair-wise word-links. Is the omission
> of disjuncts intentional?
> >>>>>
> >>>>> If you have all links in the sentence, you can construct all of the
> disjuncts with o ambiguity, correct?
> >>>>
> >>>> No, but only because you did not indicate the link-type.  The whole
> point of a clustering step is to obtain a link-type; if you discard it, you
> will never get  better-than-MST results. The link-type is critical for
> obtaining the word-classes.  The whole point of learning is to learn the
> word-classes; you've learned very little, if you know only word-pairs.
> >>>>
> >>>> Consider this example:
> >>>>
> >>>> I saw wood
> >>>> I saw some wood
> >>>>
> >>>> A solution that would be "almost perfect" (or "golden") would be this:
> >>>>
> >>>> saw: {performer-of-actions}- & {sculptable-mass}+;
> >>>> saw: {observer}-  & {viewable-thing}+;
> >>>>
> >>>> These disambiguate the two different senses of the word "saw".  It's
> impossible to have word-sense disambiguation without actually having these
> disjuncts.  The word-pairs alone are not sufficient to report the link-type
> connecting the words.  Clustering gives the other dictionary entries:
> >>>>
> >>>> I: {performer-of-actions}+ or {observer}+;
> >>>> wood: {sculptable-mass}- or ({quantity-determiner}- &
> {viewable-thing}-);
> >>>> some: {quantity-determiner}+;
> >>>>
> >>>> Thus, the pronoun "I" also belong to two different word-sense
> categories: performers and observers.  Compare to:
> >>>>
> >>>> "The chainsaw saws wood"  -- a "chainsaw" can be  a "performer of
> actions" but cannot be an "observer".
> >>>> "The dog saw some wood" -- dogs can be observers. They can perform
> some actions; like run, jump, but they cannot saw, hammer, cut, stab.
> >>>>
> >>>> The link-type is absolutely crucial to understanding a word.  The
> language-learning project is all about learning the link-types. Without
> correct link-type assignments, you cannot have correct parses.
> >>>>
> >>>> ... which is 100% of the problem with MST.  The problem with MST is
> not so much that "its not accurate" -sure, it is not terribly accurate. But
> even if MST or some MST-replacement was 100% accurate, it would still be
> "wrong" because it fails to indicate the link-type.  If you want to
> understand a sentence, you MUST know the link-types!
> >>>>
> >>>> Otherwise, you just have "green ideas sleep furiously", which parses,
> but only because the link types have been erased, or made stupid.  Here's a
> stupid grammar:
> >>>>
> >>>> ideas:  {adjective}- & {verb}+;
> >>>> green: {adjective}+;
> >>>>
> >>>> which allows "green ideas" to parse.  But of course, this is wrong;
> it should have been:
> >>>>
> >>>> ideas: {noospheric-modifier}- & {concept-manipulating-verb}+;
> >>>> green: {physical-object-modifier}+;
> >>>>
> >>>> and now it is clear that "green ideas" cannot parse, because the
> link-types clash.
> >>>>
> >>>> * If you cluster down to 5 or 6 clusters (adjective, verb, noun ...)
> you will get very low quality grammars.
> >>>>
> >>>> * If you cluster to 200 or 300 clusters, you get sort-of-OK grammars.
> This is what deep-learning/neural-nets do: this is why the deep-learning
> systems seem to give nice results: 200 or 300 features is enough to start
> having adequate functional distinctions (e.g. the famous "king -
> male+female=queen" example, or "paris-france+germany=berlin" example)
> >>>>
> >>>> * If you cluster to 3K to 8K clusters, you start having a quite
> decent model of language
> >>>>
> >>>> * Note that wordnet has 117K "synsets".
> >>>>
> >>>> Note that in the above example:
> >>>> wood: {sculptable-mass}- or ({quantity-determiner}- &
> {viewable-thing}-);
> >>>>
> >>>> the things in the curly-braces are effectively "synsets".
> >>>>
> >>>> The next set of goal-posts is to have disjuncts, of maybe low-medium
> quality, and use these to extract ontologies.  e.g.
> >>>> {sculptable-mass} is-a {mass} is-a {physical-thing} is-a {thing}
> >>>>
> >>>> You can try to do this by clustering but there are probably better
> ways of discovering ontology.
> >>>>
> >>>>
> >>>>>
> >>>>> >Also -- no hint of any word-classes or part-of-speech tagging? This
> is surely important to evaluate as well, or is this to be done in some
> other way?  i.e. to evaluate if "Pivi" was correctly clustered with other
> given names?  Or that lama/llama was clustered with other four-legged
> animals?
> >>>>>
> >>>>> We don't have that in MST-Parsing, right? We need this corpus to
> assess the quality of the MST-Parsing so we don't need part-of-speech
> information for that.
> >>>>
> >>>> But we know that MST parsing is shit.  Stop wasting time on MST or
> trying to "improve" it. We already know that it is close to a high-entropy
> path to structure; trying to squeeze a few more percent of entropy is not
> worth the effort, not at this time.  Focus on finding a high-entropy
> structure extraction algorithm, don't waste time on MST.
> >>>>
> >>>> You should be focusing on extracting disjuncts, word-classes,
> word-senses, and trying to improve the quality of those.  If you obtain a
> high-entropy path to these structures, the quality of your parses will
> automatically improve.  Focus on the entropy numbers. Try to maximize that.
> >>>>
> >>>>> The clustering is able to do that anyway - see the graphs in the end
> of the last year report:
> >>>>>
> >>>>>
> https://docs.google.com/document/d/1gxl-hIqPQCYPb9NNkyA3sBYUyfwvJFvT1hZ5ZpXsaPc/edit#heading=h.twoiv52o0tou
> >>>>>
> >>>>> >Also -- I can't tell -- is it free of loops, or are loops allowed?
> Allowing loops tends to provide stronger, more accurate parses.  Loops act
> as constraints.
> >>>>>
> >>>>> The loops and crossing links are not allowed in the MST-Parser now.
> If we allow them in the test corpus, how could it make assessment of
> MST-Parses better?
> >>>>>
> >>>>> Note, that we ARE working we MST-Parses now - accordingly to Ben's
> directions.
> >>>>
> >>>>
> >>>> Not to say bad things about Ben, but I'm certain he has not actually
> thought about this problem very much. He is very very busy doing other
> things; he is not thinking about this stuff.  I have repeatedly tried to
> explain the issues to him, and its quite clear that he is far away from
> understanding them, from working at the level that I would like to have you
> and your team work at.
> >>>>
> >>>> I'm trying to have you make small, quantified baby-steps, to verify
> the accuracy of your methods and data.  What I'm seeing is that you are
> attempting to make giant-steps, without verification, and then getting
> low-quality results, without understanding the root causes for them.  You
> can't dig yourself out of a ditch, and digging harder and more furiously
> won't raise the accuracy of the parse results.
> >>>>
> >>>> --linas
> >>>>
> >>>>> We have your MST-Parser-less idea on the map but we are NOT trying
> it now:
> >>>>>
> >>>>> https://github.com/singnet/language-learning/issues/170
> >>>>>
> >>>>> We may try it after we explore the account for costs
> >>>>>
> >>>>> https://github.com/singnet/language-learning/issues/183
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> -Anton
> >>>>>
> >>>>> 24.03.2019 9:24, Linas Vepstas пишет:
> >>>>>
> >>>>> Also, BTW, link-grammar cannot parse "I just stood there, my hand on
> the knob, trembling like a leaf." correctly. It is one of a class of
> sentences it does not know about.  Which is maybe OK, because ideally, the
> learned grammar will be able to do this. But today, LG cannot.
> >>>>>
> >>>>> --linas
> >>>>>
> >>>>> On Sat, Mar 23, 2019 at 9:12 PM Linas Vepstas <
> [email protected]> wrote:
> >>>>>>
> >>>>>> Anton,
> >>>>>>
> >>>>>> It's certainly an unusual corpus, and it might give you rather low
> scores. I'd call it "interesting", but maybe not "golden". Although I
> suppose it depends on your training corpus.  Here are some problems that
> pop out:
> >>>>>>
> >>>>>> First sentence --
> >>>>>> "the old beast was whinnying on his shoulder" -- the word
> "whinnying" is a fairly rare English verb -- you could read half-a-million
> wikipedia articles, and not see it once. You could read lots of
> 19th-century or early-20th century cowboy/adventure novels, (like what
> you'd find on Project Gutenberg) and maybe see it some fair amount. Even
> then -- to "whinny on a shoulder" seems bizarre.. I guess he's hugging the
> horse? How often does that happen, in any cowboy novel? "to whinny on
> something" is an extremely rare construction.  It will work only if you've
> correctly categorized "whinny" as a verb that can take a preposition.  Are
> your clustering algos that good, yet, to correctly cluster rare words into
> appropriate verb categories?
> >>>>>>
> >>>>>> Second sentence .. "Jims" is a very uncommon name. Frankly, I've
> never heard of it as a name before.  Your training data is going to be
> extremely slim on this. And lack of training data means poor statistics,
> which means low scores.  Unless -- again, your clustering code is good
> enough to place "Jims" in a "proper name" cluster...
> >>>>>>
> >>>>>> "the lama snuffed blandly" -- "snuffed" is a very uncommon, almost
> archaic verb. These days, everyone spells llama with two ll's not one.
> Unless your talking about Buddhist monks, its a typo.
> >>>>>>
> >>>>>> "you understand?"  is .. awkward. Common in speech, uncommon in
> writing. Unlikely that you'll have enough training data for this.
> >>>>>>
> >>>>>> "Willard" is an uncommon name. Does your training corp[us have a
> sufficient number of mentions of Willard? Do you have clustering working
> well enough to stick "Willard" into a cluster with other names?
> >>>>>>
> >>>>>> "it is so with Sammy Jay" is clearly archaic English.
> >>>>>>
> >>>>>> "he hasn't any relations here" is clearly archaic, an
> olde-fashioned construction.
> >>>>>>
> >>>>>> "Pivi said not one word" - again, a clearly old-fashioned
> construction. Does the training set contain enough examples of "Pivi" to
> recognize it as a name? Are names clustering correctly?
> >>>>>>
> >>>>>> Any sentence with an inversion is going to sound old-fashioned. All
> of the sentences in that corpus sound old-fashioned. Which maybe is OK if
> you are training on 19th century Gutenberg texts .. but its certainly not
> modern English.  Even when I was a child, and I read those old
> crumbly-yellow paper adventure books, part of the fun was that no one
> actually talked that way -- not at school, not at home, not on TV. It was
> clearly from a different time and place -- an adventure.
> >>>>>>
> >>>>>> Anyway -- you only indicate pair-wise word-links. Is the omission
> of disjuncts intentional? Also -- no hint of any word-classes or
> part-of-speech tagging? This is surely important to evaluate as well, or is
> this to be done in some other way?  i.e. to evaluate if "Pivi" was
> correctly clustered with other given names?  Or that lama/llama was
> clustered with other four-legged animals?
> >>>>>>
> >>>>>> Also -- I can't tell -- is it free of loops, or are loops allowed?
> Allowing loops tends to provide stronger, more accurate parses.  Loops act
> as constraints.
> >>>>>>
> >>>>>> -- Linas
> >>>>>>
> >>>>>> On Thu, Mar 21, 2019 at 11:09 PM Anton Kolonin @ Gmail <
> [email protected]> wrote:
> >>>>>>>
> >>>>>>> Hi Linas, Andes and whoever understands LG and English well enough
> both.
> >>>>>>>
> >>>>>>> Attached are first 100 sentences for GC "gold standard" - manually
> checked based on LG parses.
> >>>>>>>
> >>>>>>> We are expecting more to come in the next two weeks.
> >>>>>>>
> >>>>>>> To enable that, please have cursory review of the corpus and let
> us know if there are corrections still needed so your corrections will be
> used as a reference to fix the rest and keep going further.
> >>>>>>>
> >>>>>>> Thank you,
> >>>>>>>
> >>>>>>> -Anton
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> You received this message because you are subscribed to the Google
> Groups "lang-learn" group.
> >>>>>>> To unsubscribe from this group and stop receiving emails from it,
> send an email to [email protected].
> >>>>>>> To post to this group, send email to [email protected].
> >>>>>>> To view this discussion on the web visit
> https://groups.google.com/d/msgid/lang-learn/bde76364-a578-4ab8-8ac5-2f49f794072b%40gmail.com
> .
> >>>>>>> For more options, visit https://groups.google.com/d/optout.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> cassette tapes - analog TV - film cameras - you
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> cassette tapes - analog TV - film cameras - you
> >>>>>
> >>>>> --
> >>>>> -Anton Kolonin
> >>>>> skype: akolonin
> >>>>> cell: +79139250058
> >>>>> [email protected]
> >>>>> https://aigents.com
> >>>>> https://www.youtube.com/aigents
> >>>>> https://www.facebook.com/aigents
> >>>>> https://medium.com/@aigents
> >>>>> https://steemit.com/@aigents
> >>>>> https://golos.blog/@aigents
> >>>>> https://vk.com/aigents
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> cassette tapes - analog TV - film cameras - you
> >>>> --
> >>>> You received this message because you are subscribed to the Google
> Groups "lang-learn" group.
> >>>> To unsubscribe from this group and stop receiving emails from it,
> send an email to [email protected].
> >>>> To post to this group, send email to [email protected].
> >>>> To view this discussion on the web visit
> https://groups.google.com/d/msgid/lang-learn/CAHrUA36dE5ihtcCaqPv_q4qgmbEy-yX6kTkUHyLZmjk6d4VfOg%40mail.gmail.com
> .
> >>>> For more options, visit https://groups.google.com/d/optout.
> >>>>
> >>>> --
> >>>> -Anton Kolonin
> >>>> skype: akolonin
> >>>> cell: +79139250058
> >>>> [email protected]
> >>>> https://aigents.com
> >>>> https://www.youtube.com/aigents
> >>>> https://www.facebook.com/aigents
> >>>> https://medium.com/@aigents
> >>>> https://steemit.com/@aigents
> >>>> https://golos.blog/@aigents
> >>>> https://vk.com/aigents
> >>
> >>
> >>
> >> --
> >> cassette tapes - analog TV - film cameras - you
> >>
> >> --
> >> -Anton Kolonin
> >> skype: akolonin
> >> cell: +79139250058
> >> [email protected]
> >> https://aigents.com
> >> https://www.youtube.com/aigents
> >> https://www.facebook.com/aigents
> >> https://medium.com/@aigents
> >> https://steemit.com/@aigents
> >> https://golos.blog/@aigents
> >> https://vk.com/aigents
> >
> >
> >
> > --
> > cassette tapes - analog TV - film cameras - you
>
>
>
> --
> Ben Goertzel, PhD
> http://goertzel.org
>
> "Listen: This world is the lunatic's sphere,  /  Don't always agree
> it's real.  /  Even with my feet upon it / And the postman knowing my
> door / My address is somewhere else." -- Hafiz
>


-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35rQWNZDg-LmgBVjcLX%3DF6nceWvDXFq%2B-mfc4rJiqqG3g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: 100 sentences for GC

Reply via email to