On Wed, Jan 16, 2019 at 10:04 AM Michael Duncan <[email protected]> wrote:
> hi linus, given your assertion that the atomspace is fundamentally better > than the other graph databases out there, > I don't think I said that. I think I said that it has more advanced features. However, the competition is moving faster, and is catching up. I fear the atomspace will be a forgotten historical footnote in not too many more years. > wouldn't it be strategic from an ecosystem point of view to > reconceptualize the functional distinction between the atomspace repo and > the opencog repo as a graph database and (proto) agi infrastructure, > respectively? > I think that was always the case. In practice, I tried to make sure that the atomspace was that part of opencog that was stable, well-thought-out, finished, reliable, dependable, whereas the other repos were experimental prototyping sandboxes. Exactly where to draw that line is not always clear; for example, the rule-engine is a part of the atomspace, even though it is less than finished, and can be re-imagined in several alternative ways (for example, openspi is also a kind-of-rule-engine, except its totally different.) Should the rule-engine live in it's own repo, instead of the atomspace? Maybe. Should openpsi live in it's own repo, instead of opencog? Maybe. I don't think I would resist such a split-up; do it in a conscientious, thoughtful fashion, and sure, why not? -- Linas > > On Fri, Jan 11, 2019 at 9:31 PM Linas Vepstas <[email protected]> > wrote: > >> Hi Anton! >> >> On Thu, Jan 10, 2019 at 11:45 PM Anton Kolonin @ Aigents < >> [email protected]> wrote: >> >>> Hi Linas, while digging into MST-Parser code, we have found that some >>> of the NLP Scheme code resides in singnet/opencog and some is in >>> singnet/atomspace. >> >> >> My goal is that all developers agree that there is a development branch >> and a stable branch, and to know which one is which, and to work so that >> all development go into the development branch, and that the stable branch >> be a branch that is stable according to industry-standard definitions of >> stability. >> >> I am concerned that there continues to be confusion about this. I am >> concerned that this will just lead to wasted time and bad design and bad >> code that is buggy, inoperable. >> >> I spend vast amounts of my time being "the janitor" who cleans up messes, >> and this is a thankless job, and I don't enjoy it, and I get concerned >> whenever I read something that suggests I have a big cleanup job waiting >> for me in the future. >> >> >>> I wonder, if idea of having Scheme code in AtomSpace >>> layer has some conceptual justification or it is just historical matter? >>> >> >> There has always been scheme code in the atomspace. The atomspace >> provides all of the core infrastructure for the scheme bindings. >> >>> >>> For instance we were unraveling the uses and implementation of >>> add-symmetric-mi-compute. >>> >> >> That function is provided as a part of the "matrix" package. That package >> provides a way for looking at subsets of the atomspace as if they were >> (sparse) matrices. Please recall that a matrix (a 2-tensor) is an N x N >> grid of values. There are many, many things one can do with a matrix. >> Almost all of the code in the matrix directory is focused on treating the >> matrix as a probability P(X,Y) of two random processes. >> >> Whenever one has a probability like that, one is typically interested in >> the marginals (the P(X) which is the P(X,Y) summed over all Y), the >> conditional probability P(X|Y), the entropy H(X,Y) and the mutual >> information MI(X,Y). Another very important quantity is the product P(X,Y) >> P^T(Y,Z) where ^T denotes the matrix transpose. This product is can be >> used to build the cosine distance between X and Z; it can also be used to >> build the symmetric-MI, which is like the cosine distance; but has sums >> over logarithms in strategic places. >> >> You may wonder "why not use an ordinary linear algebra package?" or "why >> not use Gnu R?" (or SPSS or SciPy, or whatever) There are three reasons >> for this: >> >> 1) The atomspace matrices are extremely sparse: for the NLP data, only >> one in a million entries are non-zero. >> >> 2) the NxN matrix has N=100K to 1 million for NLP data, which is more RAM >> than computers can easily provide. The matrix package has to be optimized >> for sparse data. Genomic data might have even larger N. >> >> 3) It would be marvelous if someone wrote an R wrapper for this stuff. >> It's not hard. Someone needs to do this. I have been urging the agi-bio >> guys to do this, because their genomic/proteomic data is also extremely >> sparse, and because they like to use R for data analysis. >> >> The general justification is that every atom is like a tensor index, and >> the value attached to that atom is the value of that tensor at that index. >> Since a collection of atoms is conceptually the same thing as a set of >> sparse tensors, lets acknowledge that fact, and provide an API that allows >> ordinary users to access the tensor data as tensors. By "ordinary users" I >> mean anyone who has ever done statistical analysis, or more generally, any >> user who uses SciPy or Gnu R to mangle their data. >> >> The atomspace is not for everyone: its only for those people who have >> very sparse data with a Zipfian distribution. But if that is what they >> have, let them access it in a "normal" kind of data-analytics kind of way, >> like how you'd do data analytics in other packages. >> >> >>> >>> If we keep extending the MST-Parser code >> >> >> The MST parser code is in a different directory; it is not a part of the >> matrix code. It is much more experimental. The MST parser is meant to be a >> part of a generic parsing and theorem-proving infrastructure. Among both >> the academics, and the readers of this mailing list, there is some general >> understanding that theorem proving, natural deduction, Hilbert-style >> deduction, sequent calculus, parsing and constraint-solving are all >> kind-of-ish "the same thing". The goal here is to actually try to make >> them actually be "the same thing" by providing something that accomplishes >> all of the above with the same code base. >> >> To be more explicit: we have the URE, which performs forward and backward >> chaining. If you look at the chaining algorithm, you promptly realize that >> it is a certain kind of parsing algorithm. This insight, that parsing and >> theorem proving is "the same thing" is what prompted the URE to be >> created. It has been used for the proving side; i.e. for PLN, but it has >> not been used for parsing, yet. No one has ever attempted to import >> link-grammar into the URE. (There was also the intent that open-pse would >> also run on top of, run with the URE, but the current URE does not support >> that mode of operation, and so open-psi exists as a distinct, separate code >> base) >> >> If the URE had been sufficiently powerful and robust, we would have been >> able to import the full English link-grammar dictionary into the URE, and >> run it, and get ordinary LG parses coming out. This is not currently >> possible with the current URE design. >> >> Given that the current URE is unable to support open-psi, and is unable >> to support LG, it seemed like it was time to redesign it, from the >> ground-up. Thus, the code in the "sheaf" directory is an attempt to >> re-imagine how theorem-proving and parsing can be accomplished in a fashion >> that is much faster, easier and more usable than the current URE is, with a >> simpler API and a stronger toolset. The paper on sheafs was an attempt to >> explain how this could be done. >> >> >>> for account for word, link and >>> disjunct frequency >> >> >> Please understand that disjuncts are a general concept. They occur not >> only in natural language, but they also occur in biology, and they also >> occur in theorem proving. >> >> >>> and provide integration with DNN-s >> >> >> The paper on skip-grams is an attempt to explain how theorem proving is >> just like deep-learning in neural nets. It attempt to explain how these two >> different systems are really variations on the same theme. >> >> Ideally, the API provided by the code in the "sheaf" directory will be >> able to provide a common API to deep learning systems, a well as to parsing >> systems, as well as to PLN, as well as to open-psi, and that, one could >> choose between different algorithms and implementations that can process >> your data. >> >> Currently, this dream is pre-pre-alpha, and it contains only a generic >> MST parser. >> >> >>> and add >>> incremental/iterative learning capabilities to it, >> >> >> The goal of tracking disjunct statistics is that this *is* the learning >> system. Yes, there are other ways of learning. Again, the paper on >> skip-grams attempts to explain all the different ways in which learning can >> be accomplished. >> >> >>> should the changes be >>> done both to singnet/opencog and singnet/atomspace following the same >>> pattern? >>> >> >> See comments at top about stable and development branches. >> >>> >>> Or, we should better pull all NLP code out from singnet/atomspace to >>> singnet/opencog or even place them in separate project? >>> >> >> The MST parsing code is intended to be a part of a generic learning >> system that can be applied to NLP or genetics or to logical induction or to >> robotic motion control. It is not specific to natural language. >> >> -- Linas >> >>> >>> Ben, Man Hin, Amen - any insights on this? >>> >>> Thanks, >>> >>> -- >>> -Anton Kolonin >>> skype: akolonin >>> cell: +79139250058 >>> >>> >> >> -- >> cassette tapes - analog TV - film cameras - you >> > -- cassette tapes - analog TV - film cameras - you -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA35NrxMOgMxLQYABPBnJ1Sc5L539gTM-gupKAZof-hb1jg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
