[opencog-dev] Re: digging into MST-Parser code

Linas Vepstas Wed, 16 Jan 2019 11:38:28 -0800

On Wed, Jan 16, 2019 at 10:04 AM Michael Duncan <[email protected]> wrote:


> hi linus, given your assertion that the atomspace is fundamentally better
> than the other graph databases out there,
>

I don't think I said that. I think I said that it has more advanced
features.  However, the competition is moving faster, and is catching up. I
fear the atomspace will be a forgotten historical footnote in not too many
more years.


> wouldn't it be strategic from an ecosystem point of view to
> reconceptualize the functional distinction between the atomspace repo and
> the opencog repo as a graph database and (proto) agi infrastructure,
> respectively?
>

I think that was always the case.

In practice, I tried to make sure that the atomspace was that part of
opencog that was stable, well-thought-out, finished, reliable, dependable,
whereas the other repos were experimental prototyping sandboxes.  Exactly
where to draw that line is not always clear; for example, the rule-engine
is a part of the atomspace, even though it is less than finished, and can
be re-imagined in several alternative ways (for example, openspi is also a
kind-of-rule-engine, except its totally different.)   Should the
rule-engine live in it's own repo, instead of the atomspace? Maybe.  Should
openpsi live in it's own repo, instead of opencog? Maybe. I don't think I
would resist such a split-up; do it in a conscientious, thoughtful fashion,
and sure, why not?

-- Linas


>
> On Fri, Jan 11, 2019 at 9:31 PM Linas Vepstas <[email protected]>
> wrote:
>
>> Hi Anton!
>>
>> On Thu, Jan 10, 2019 at 11:45 PM Anton Kolonin @ Aigents <
>> [email protected]> wrote:
>>
>>> Hi Linas, while digging into  MST-Parser code, we have found that some
>>> of the NLP Scheme code resides in singnet/opencog and some is in
>>> singnet/atomspace.
>>
>>
>> My goal is that all developers agree that there is a development branch
>> and a stable branch, and to know which one is which, and to work so that
>> all development go into the development branch, and that the stable branch
>> be a branch that is stable according to industry-standard definitions of
>> stability.
>>
>> I am concerned that there continues to be confusion about this. I am
>> concerned that this will just lead to wasted time and bad design and bad
>> code that is buggy, inoperable.
>>
>> I spend vast amounts of my time being "the janitor" who cleans up messes,
>> and this is a thankless job, and I don't enjoy it, and I get concerned
>> whenever I read something that suggests I have a big cleanup job waiting
>> for me in the future.
>>
>>
>>> I wonder, if idea of having Scheme code in AtomSpace
>>> layer has some conceptual justification or it is just historical matter?
>>>
>>
>> There has always been scheme code in the atomspace. The atomspace
>> provides all of the core infrastructure for the scheme bindings.
>>
>>>
>>> For instance we were unraveling the uses and implementation of
>>> add-symmetric-mi-compute.
>>>
>>
>> That function is provided as a part of the "matrix" package. That package
>> provides a way for looking at subsets of the atomspace as if they were
>> (sparse) matrices. Please recall that a matrix (a 2-tensor) is an N x N
>> grid of values.  There are many, many things one can do with a matrix.
>> Almost all of the code in the matrix directory is focused on treating the
>> matrix as a probability P(X,Y) of two random processes.
>>
>> Whenever one has a probability like that, one is typically interested in
>> the marginals (the P(X) which is the P(X,Y) summed over all Y), the
>> conditional probability P(X|Y), the entropy H(X,Y) and the mutual
>> information MI(X,Y). Another very important quantity is the product P(X,Y)
>> P^T(Y,Z) where ^T denotes the matrix transpose.  This product is can be
>> used to build the cosine distance between X and Z; it can also be used to
>> build the symmetric-MI, which is like the cosine distance; but has sums
>> over logarithms in strategic places.
>>
>> You may wonder "why not use an ordinary linear algebra package?" or "why
>> not use Gnu R?"  (or SPSS or SciPy, or whatever) There are three reasons
>> for this:
>>
>> 1) The atomspace matrices are extremely sparse: for the NLP data, only
>> one in a million entries are non-zero.
>>
>> 2) the NxN matrix has N=100K to 1 million for NLP data, which is more RAM
>> than computers can easily provide. The matrix package has to be optimized
>> for sparse data.  Genomic data might have even larger N.
>>
>> 3) It would be marvelous if someone wrote an R wrapper for this stuff.
>> It's not hard. Someone needs to do this. I have been urging the agi-bio
>> guys to do this, because their genomic/proteomic data is also extremely
>> sparse,  and because they like to use R for data analysis.
>>
>> The general justification is that every atom is like a tensor index, and
>> the value attached to that atom is the value of that tensor at that index.
>> Since a collection of atoms is conceptually the same thing as a set of
>> sparse tensors, lets acknowledge that fact, and provide an API that allows
>> ordinary users to access the tensor data as tensors. By "ordinary users" I
>> mean anyone who has ever done statistical analysis,  or more generally, any
>> user who uses SciPy or Gnu R to mangle their data.
>>
>> The atomspace is not for everyone: its only for those people who have
>> very sparse data with a Zipfian distribution. But if that is what they
>> have, let them access it in a "normal" kind of data-analytics kind of way,
>> like how you'd do data analytics in other packages.
>>
>>
>>>
>>> If we keep extending the MST-Parser code
>>
>>
>> The MST parser code is in a different directory; it is not a part of the
>> matrix code.  It is much more experimental. The MST parser is meant to be a
>> part of a generic parsing and theorem-proving infrastructure. Among both
>> the academics, and the readers of this mailing list, there is some general
>> understanding that theorem proving, natural deduction, Hilbert-style
>> deduction, sequent calculus, parsing and constraint-solving are all
>> kind-of-ish "the same thing".  The goal here is to actually try to make
>> them actually be "the same thing" by providing something that accomplishes
>> all of the above with the same code base.
>>
>> To be more explicit: we have the URE, which performs forward and backward
>> chaining. If you look at the chaining algorithm, you promptly realize that
>> it is a certain kind of parsing algorithm.  This insight, that parsing and
>> theorem proving is "the same thing" is what prompted the URE to be
>> created.  It has been used for the proving side; i.e. for PLN, but it has
>> not been used for parsing, yet.  No one has ever attempted to import
>> link-grammar into the URE. (There was also the intent that open-pse would
>> also run on top of, run with the URE, but the current URE does not support
>> that mode of operation, and so open-psi exists as a distinct, separate code
>> base)
>>
>> If the URE had been sufficiently powerful and robust, we would have been
>> able to import the full English link-grammar dictionary into the URE, and
>> run it, and get ordinary LG parses coming out. This is not currently
>> possible with the current URE design.
>>
>> Given that the current URE is unable to support open-psi, and is unable
>> to support LG, it seemed like it was time to redesign it, from the
>> ground-up. Thus, the code in the "sheaf" directory is an attempt to
>> re-imagine how theorem-proving and parsing can be accomplished in a fashion
>> that is much faster, easier and more usable than the current URE is, with a
>> simpler API and a stronger toolset.  The paper on sheafs was an attempt to
>> explain how this could be done.
>>
>>
>>> for account for word, link and
>>> disjunct frequency
>>
>>
>> Please understand that disjuncts are a general concept. They occur not
>> only in natural language, but they also occur in biology, and they also
>> occur in theorem proving.
>>
>>
>>> and provide integration with DNN-s
>>
>>
>> The paper on skip-grams is an attempt to explain how theorem proving is
>> just like deep-learning in neural nets. It attempt to explain how these two
>> different systems are really variations on the same theme.
>>
>> Ideally, the API provided by the code in the "sheaf" directory will be
>> able to provide a common API to deep learning systems, a well as to parsing
>> systems, as well as to PLN, as well as to open-psi, and that, one could
>> choose between different algorithms and implementations that can process
>> your data.
>>
>> Currently, this dream is pre-pre-alpha, and it contains only a generic
>> MST parser.
>>
>>
>>> and add
>>> incremental/iterative learning capabilities to it,
>>
>>
>> The goal of tracking disjunct statistics is that this *is* the learning
>> system.  Yes, there are other ways of learning. Again, the paper on
>> skip-grams attempts to explain all the different ways in which learning can
>> be accomplished.
>>
>>
>>> should the changes be
>>> done both to singnet/opencog and singnet/atomspace following the same
>>> pattern?
>>>
>>
>> See comments at top about stable and development branches.
>>
>>>
>>> Or, we should better pull all NLP code out from singnet/atomspace to
>>> singnet/opencog or even place them in separate project?
>>>
>>
>> The MST parsing code is intended to be a part of a generic learning
>> system that can be applied to NLP or genetics or to logical induction or to
>> robotic motion control.  It is not specific to natural language.
>>
>> -- Linas
>>
>>>
>>> Ben, Man Hin, Amen - any insights on this?
>>>
>>> Thanks,
>>>
>>> --
>>> -Anton Kolonin
>>> skype: akolonin
>>> cell: +79139250058
>>>
>>>
>>
>> --
>> cassette tapes - analog TV - film cameras - you
>>
>

-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35NrxMOgMxLQYABPBnJ1Sc5L539gTM-gupKAZof-hb1jg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: digging into MST-Parser code

Reply via email to