Re: [opencog-dev] Indexing in the AtomSpace

Ben Goertzel Thu, 27 Aug 2020 09:14:39 -0700

I think perhaps what Xabush wants is to be able to query

" Find me all Atoms whose name string contains the substring "ABDPDQ".  "


even if he doesn't know what types these Atoms may be ?

ben

On Thu, Aug 27, 2020 at 9:09 AM Linas Vepstas <[email protected]>
wrote:

> This statement I find confusing: "I can’t write a pattern matching query
> to retrieve an atom using its id/name" There is one and only one such atom,
> ever, by definition... There is nothing to query; if you know the name, you
> know the atom.
>
> There was talk previously about "substring matching", for example, you
> have atoms  named "Uniprot: 1234" and "Uniprot: 5678" and you want to find
> all atoms that start with the eight characters "Uniprot:". There are (at
> least) three solutions for this. One is to create a RegexNode, but this is
> ugly from a theoretical standpoint. A second is to create a UniProtNode and
> use that; queries are then simple because you just ask for all
> UniprotNodes.  A third (recommended) way is to write  (MemberLink (Node
> "Uniprot: 1234") (Concept "the-set-of-all-uniprots"))
>
> This third way is recommended because, in a sense, the atomspace is
> nothing but one giant network of interconnected partial indexes. There is
> an index from (Node "Uniprot: 1234") to everything that makes use of it --
> its called "the incoming set" and it is a real index - a c++ std::set  if I
> recall. Same for (Concept "the-set-of-all-uniprots") and what the pattern
> matcher "actually does" is to stitch together these partial indexes into a
> whole, and then prune away the irrelevant parts.
>
> -- Linas
>
> ... unless you mean "can I ask if (Node "uniprot: 1234") exists, without
> accidentally creating it if it does not?" ... you can do this from the C++,
> scheme and python API's, but you cannot do this in Atomese.
>
>
>
>
> On Thu, Aug 27, 2020 at 4:07 AM Abdulrahman Semrie <[email protected]>
> wrote:
>
>>
>>
>> TL;DR: you can already do that.  It's already supported.
>>
>> It’s partially supported. As you’ve described, we can cache the result of
>> a pattern matching query and it is already supported. However, since I
>> can’t write a pattern matching query to retrieve an atom using its id/name
>> from the atomspace, there is no way to cache/index. If there was some
>> ExistsLink that inherits from QueryLink where you can use to retrieve an
>> atom by its name if it exists or return a false truth value, then what
>> you’ve described can be done.
>>
>> —
>>
>> Regards,
>>
>> Abdulrahman Semrie
>> <https://canarymail.io>
>>
>> On Thursday, Aug 27, 2020 at 2:46 AM, Linas Vepstas <
>> [email protected]> wrote:
>> TL;DR: you can already do that.  It's already supported.
>>
>> Please follow me on this train of thought.
>>
>> 1) What is an "index"? Well, its a pre-defined cache of all atoms of some
>> shape or pattern.
>>
>> 2) How can one specify an index?  Well, if its a pattern, then a pattern
>> query can be used.
>>
>> 3) Where should the index be stored, or kept? Well, it can be stored or
>> kept with the pattern that defines the shape of the index.
>>
>> Before I move on to the next thought, let me point out that 1-2-3 can be
>> directly solved today. Define a pattern, e.g. a query link. Run it. Store
>> the results on the query, as a value. You can "do this yourself", today,
>> its easy, but it becomes even easier if you are willing to read the docs
>> for `cog-execute-cache!` (appended below)
>>
>> 4) How should the index be updated? Ah, well, that is actually the tricky
>> question, the hard question, the place where all of the interesting
>> technology debates and thinking are centered.  One strategy is to update
>> the index every single time an Atom is added to/removed from the atomspace.
>> But recomputing the index every time is wildly inefficient, burning through
>> vast quantities of CPU time. What else can one do? Well, maybe recompute on
>> demand. Or recompute every few minutes. Or maybe once a night. (aka
>> "eventually consistent")  Maybe store a time-stamp on the index, to tell
>> you how old it is. Or maybe have an append-only log of atomspace changes...
>> I can propose many different kinds of solutions. They all have space and
>> time-overhead, and/or assorted usability issues. Which of these best suits
>> your needs, I have trouble guessing, so you would have to explain what the
>> problem is (if any).
>>
>> --linas
>>
>> Here's the docs:
>>  cog-execute-cache! EXEC KEY [METADATA [FRESH]]
>>
>>    Execute or return cached execution results. This is a caching version
>>    of the `cog-execute!` call.
>>
>>    If the optional FRESH boolean flag is #f, then if there is a Value
>>    stored at KEY on EXEC, return that Value. The default value of FRESH
>>    is #f, so the default behavior is always to return the cached value.
>>    If the optional FRESH boolean flag is #t, or if there is no Value
>>    stored at KEY, then the `cog-execute!` function is called on EXEC,
>>    and the result is stored at KEY.
>>
>>    The METADATA Atom is optional.  If it is specified, then metadata
>>    about the execution is placed on EXEC at the key METADATA.
>>    Currently, this is just a timestamp of when this execution was
>>    performed. The format of the meta-data is subject to change; this
>>    is currently an experimental feature, driven by user requirements.
>>
>>    At this time, execution is synchronous. It may be worthwhile to have
>>    an asynchronous version of this call, where the execution is performed
>>    at some other time. This has not been done yet.
>>
>> On Wed, Aug 26, 2020 at 7:41 AM Abdulrahman Semrie <[email protected]>
>> wrote:
>>
>>>
>>> In the current atomspace, atoms are indexed by their type, i.e given a
>>> type we can retrieve all the atoms that have that type. But there is no
>>> other away of adding custom indices in the atomspace. For example, if we
>>> want to index nodes by their name, there is no way of doing this.
>>>
>>> As discussed in this issue
>>> <https://github.com/MOZI-AI/annotation-scheme/issues/192>, we plan to
>>> expand the annotation-service, which uses the AtomSpace to store genomics
>>> data, to support the annotation of more types in addition to genes.
>>> Currently, when I user submits a list of ids to the service, it is assumed
>>> that these ids/symbols represent `GeneNode`s. But in the case where the
>>> input can be a protein, a drug molecule, pathway or a gene, there is no
>>> direct way of retrieving what type of the atom with the given name is
>>> unless we iterate through all atoms searching for that particular id. This
>>> isn't be a good approach from performance standpoint. But if we had a
>>> custom index - e.g `name_index`, on the ids/names of the atoms, it will be
>>> easier to search the atoms by name and identify the type that the atom
>>> belongs to.
>>>
>>> Hence, if there is a way to add custom indices to the atomspace, it will
>>> greatly simplify some searches. Or maybe there is a way to do what I
>>> described above without the need for an index. If so, please share it.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "opencog" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/opencog/27892502-0dfb-4042-a805-30a1520f6250n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/opencog/27892502-0dfb-4042-a805-30a1520f6250n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>>         --Peter da Silva
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "opencog" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/opencog/5uE2lw6b-5E/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CAHrUA34qoTA90pcSC3GwXsGy8xpK5yn-1U7k%2Ba10nuDTWcrBLQ%40mail.gmail.com
>> <https://groups.google.com/d/msgid/opencog/CAHrUA34qoTA90pcSC3GwXsGy8xpK5yn-1U7k%2Ba10nuDTWcrBLQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/2a5214b7-c083-40c0-801d-0a3595783046%40Canary
>> <https://groups.google.com/d/msgid/opencog/2a5214b7-c083-40c0-801d-0a3595783046%40Canary?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> --
> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>         --Peter da Silva
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA37N%3Dbjr7QDQzS-uUpcwaSP%3D44QEYfkmUXQC9mrVEZATEQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAHrUA37N%3Dbjr7QDQzS-uUpcwaSP%3D44QEYfkmUXQC9mrVEZATEQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Ben Goertzel, PhD
http://goertzel.org

“The only people for me are the mad ones, the ones who are mad to live, mad
to talk, mad to be saved, desirous of everything at the same time, the ones
who never yawn or say a commonplace thing, but burn, burn, burn like
fabulous yellow roman candles exploding like spiders across the stars.” --
Jack Kerouac

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CACYTDBeqdq0vixYq1M0kceBqyywkAvQMPsMOd51X-0V5Oagr2Q%40mail.gmail.com.

Re: [opencog-dev] Indexing in the AtomSpace

Reply via email to