Re: [opencog-dev] Indexing in the AtomSpace

Abdulrahman Semrie Thu, 27 Aug 2020 10:56:39 -0700

> A second is to create a UniProtNode and use that; queries are then simple 
because you just ask for all UniprotNodes.


We are already using this approach. We have added new, data-source specific 
types to the atomspace and we use those types in pattern matching query.

> A third (recommended) way is to write  (MemberLink (Node "Uniprot: 1234") 
(Concept "the-set-of-all-uniprots")) 

can you please explain why this approach is recommended compared to the 
second one? Doesn't using this approach add many links that can be avoided 
by having a specific type?

> . unless you mean "can I ask if (Node "uniprot: 1234") exists, without 
accidentally creating it if it does not?" 

More like "can I ask if any node with name "uniprot:1234" exists? If so, 
can you return that node."

> you can do this from the C++, scheme and python API's, but you cannot do 
this in Atomese.

If I know the type and the name, yes I can do this from the C++, scheme and 
python - I'm actually doing this in the C++ code for the rpc server. But in 
the case I'm describing, I only know the name and not the type. And to 
create a Handle to retrieve the atom, I need both the type and the name.

On Thursday, August 27, 2020 at 8:33:29 PM UTC+3 linas wrote:

> I just provided three different solutions to that task... -- linas
>
> On Thu, Aug 27, 2020 at 11:14 AM Ben Goertzel <[email protected]> wrote:
>
>>
>> I think perhaps what Xabush wants is to be able to query
>>
>> " Find me all Atoms whose name string contains the substring "ABDPDQ".  "
>>
>> even if he doesn't know what types these Atoms may be ?
>>
>> ben
>>
>> On Thu, Aug 27, 2020 at 9:09 AM Linas Vepstas <[email protected]> 
>> wrote:
>>
>>> This statement I find confusing: "I can’t write a pattern matching query 
>>> to retrieve an atom using its id/name" There is one and only one such atom, 
>>> ever, by definition... There is nothing to query; if you know the name, you 
>>> know the atom. 
>>>
>>> There was talk previously about "substring matching", for example, you 
>>> have atoms  named "Uniprot: 1234" and "Uniprot: 5678" and you want to find 
>>> all atoms that start with the eight characters "Uniprot:". There are (at 
>>> least) three solutions for this. One is to create a RegexNode, but this is 
>>> ugly from a theoretical standpoint. A second is to create a UniProtNode and 
>>> use that; queries are then simple because you just ask for all 
>>> UniprotNodes.  A third (recommended) way is to write  (MemberLink (Node 
>>> "Uniprot: 1234") (Concept "the-set-of-all-uniprots")) 
>>>
>>> This third way is recommended because, in a sense, the atomspace is 
>>> nothing but one giant network of interconnected partial indexes. There is 
>>> an index from (Node "Uniprot: 1234") to everything that makes use of it -- 
>>> its called "the incoming set" and it is a real index - a c++ std::set  if I 
>>> recall. Same for (Concept "the-set-of-all-uniprots") and what the pattern 
>>> matcher "actually does" is to stitch together these partial indexes into a 
>>> whole, and then prune away the irrelevant parts.
>>>
>>> -- Linas
>>>
>>> ... unless you mean "can I ask if (Node "uniprot: 1234") exists, without 
>>> accidentally creating it if it does not?" ... you can do this from the C++, 
>>> scheme and python API's, but you cannot do this in Atomese.
>>>
>>>
>>>
>>>
>>> On Thu, Aug 27, 2020 at 4:07 AM Abdulrahman Semrie <[email protected]> 
>>> wrote:
>>>
>>>>
>>>>
>>>> TL;DR: you can already do that.  It's already supported.
>>>>
>>>> It’s partially supported. As you’ve described, we can cache the result 
>>>> of a pattern matching query and it is already supported. However, since I 
>>>> can’t write a pattern matching query to retrieve an atom using its id/name 
>>>> from the atomspace, there is no way to cache/index. If there was some 
>>>> ExistsLink that inherits from QueryLink where you can use to retrieve 
>>>> an atom by its name if it exists or return a false truth value, then what 
>>>> you’ve described can be done. 
>>>>
>>>> —
>>>>
>>>> Regards,
>>>>
>>>> Abdulrahman Semrie
>>>> <https://canarymail.io>
>>>>
>>>> On Thursday, Aug 27, 2020 at 2:46 AM, Linas Vepstas <
>>>> [email protected]> wrote:
>>>> TL;DR: you can already do that.  It's already supported.
>>>>
>>>> Please follow me on this train of thought.
>>>>
>>>> 1) What is an "index"? Well, its a pre-defined cache of all atoms of 
>>>> some shape or pattern.
>>>>
>>>> 2) How can one specify an index?  Well, if its a pattern, then a 
>>>> pattern query can be used.
>>>>
>>>> 3) Where should the index be stored, or kept? Well, it can be stored or 
>>>> kept with the pattern that defines the shape of the index.
>>>>
>>>> Before I move on to the next thought, let me point out that 1-2-3 can 
>>>> be directly solved today. Define a pattern, e.g. a query link. Run it. 
>>>> Store the results on the query, as a value. You can "do this yourself", 
>>>> today, its easy, but it becomes even easier if you are willing to read the 
>>>> docs for `cog-execute-cache!` (appended below)
>>>>
>>>> 4) How should the index be updated? Ah, well, that is actually the 
>>>> tricky question, the hard question, the place where all of the interesting 
>>>> technology debates and thinking are centered.  One strategy is to update 
>>>> the index every single time an Atom is added to/removed from the 
>>>> atomspace. 
>>>> But recomputing the index every time is wildly inefficient, burning 
>>>> through 
>>>> vast quantities of CPU time. What else can one do? Well, maybe recompute 
>>>> on 
>>>> demand. Or recompute every few minutes. Or maybe once a night. (aka 
>>>> "eventually consistent")  Maybe store a time-stamp on the index, to tell 
>>>> you how old it is. Or maybe have an append-only log of atomspace 
>>>> changes... 
>>>> I can propose many different kinds of solutions. They all have space and 
>>>> time-overhead, and/or assorted usability issues. Which of these best suits 
>>>> your needs, I have trouble guessing, so you would have to explain what the 
>>>> problem is (if any).
>>>>
>>>> --linas
>>>>
>>>> Here's the docs:
>>>>  cog-execute-cache! EXEC KEY [METADATA [FRESH]]
>>>>
>>>>    Execute or return cached execution results. This is a caching version
>>>>    of the `cog-execute!` call.
>>>>
>>>>    If the optional FRESH boolean flag is #f, then if there is a Value
>>>>    stored at KEY on EXEC, return that Value. The default value of FRESH
>>>>    is #f, so the default behavior is always to return the cached value.
>>>>    If the optional FRESH boolean flag is #t, or if there is no Value
>>>>    stored at KEY, then the `cog-execute!` function is called on EXEC,
>>>>    and the result is stored at KEY.
>>>>
>>>>    The METADATA Atom is optional.  If it is specified, then metadata
>>>>    about the execution is placed on EXEC at the key METADATA.
>>>>    Currently, this is just a timestamp of when this execution was
>>>>    performed. The format of the meta-data is subject to change; this
>>>>    is currently an experimental feature, driven by user requirements.
>>>>
>>>>    At this time, execution is synchronous. It may be worthwhile to have
>>>>    an asynchronous version of this call, where the execution is 
>>>> performed
>>>>    at some other time. This has not been done yet.
>>>>
>>>> On Wed, Aug 26, 2020 at 7:41 AM Abdulrahman Semrie <[email protected]> 
>>>> wrote:
>>>>
>>>>>
>>>>> In the current atomspace, atoms are indexed by their type, i.e given a 
>>>>> type we can retrieve all the atoms that have that type. But there is no 
>>>>> other away of adding custom indices in the atomspace. For example, if we 
>>>>> want to index nodes by their name, there is no way of doing this. 
>>>>>
>>>>> As discussed in this issue 
>>>>> <https://github.com/MOZI-AI/annotation-scheme/issues/192>, we plan to 
>>>>> expand the annotation-service, which uses the AtomSpace to store genomics 
>>>>> data, to support the annotation of more types in addition to genes. 
>>>>> Currently, when I user submits a list of ids to the service, it is 
>>>>> assumed 
>>>>> that these ids/symbols represent `GeneNode`s. But in the case where the 
>>>>> input can be a protein, a drug molecule, pathway or a gene, there is no 
>>>>> direct way of retrieving what type of the atom with the given name is 
>>>>> unless we iterate through all atoms searching for that particular id. 
>>>>> This 
>>>>> isn't be a good approach from performance standpoint. But if we had a 
>>>>> custom index - e.g `name_index`, on the ids/names of the atoms, it will 
>>>>> be 
>>>>> easier to search the atoms by name and identify the type that the atom 
>>>>> belongs to. 
>>>>>
>>>>> Hence, if there is a way to add custom indices to the atomspace, it 
>>>>> will greatly simplify some searches. Or maybe there is a way to do what I 
>>>>> described above without the need for an index. If so, please share it.
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "opencog" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/opencog/27892502-0dfb-4042-a805-30a1520f6250n%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/opencog/27892502-0dfb-4042-a805-30a1520f6250n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>
>>>>
>>>> -- 
>>>> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>>>>         --Peter da Silva
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to a topic in the 
>>>> Google Groups "opencog" group.
>>>> To unsubscribe from this topic, visit 
>>>> https://groups.google.com/d/topic/opencog/5uE2lw6b-5E/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to 
>>>> [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/opencog/CAHrUA34qoTA90pcSC3GwXsGy8xpK5yn-1U7k%2Ba10nuDTWcrBLQ%40mail.gmail.com
>>>>  
>>>> <https://groups.google.com/d/msgid/opencog/CAHrUA34qoTA90pcSC3GwXsGy8xpK5yn-1U7k%2Ba10nuDTWcrBLQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "opencog" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/opencog/2a5214b7-c083-40c0-801d-0a3595783046%40Canary
>>>>  
>>>> <https://groups.google.com/d/msgid/opencog/2a5214b7-c083-40c0-801d-0a3595783046%40Canary?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> -- 
>>> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>>>         --Peter da Silva
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "opencog" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/opencog/CAHrUA37N%3Dbjr7QDQzS-uUpcwaSP%3D44QEYfkmUXQC9mrVEZATEQ%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/opencog/CAHrUA37N%3Dbjr7QDQzS-uUpcwaSP%3D44QEYfkmUXQC9mrVEZATEQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> -- 
>> Ben Goertzel, PhD
>> http://goertzel.org
>>
>> “The only people for me are the mad ones, the ones who are mad to live, 
>> mad to talk, mad to be saved, desirous of everything at the same time, the 
>> ones who never yawn or say a commonplace thing, but burn, burn, burn like 
>> fabulous yellow roman candles exploding like spiders across the stars.” -- 
>> Jack Kerouac
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/opencog/CACYTDBeqdq0vixYq1M0kceBqyywkAvQMPsMOd51X-0V5Oagr2Q%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/opencog/CACYTDBeqdq0vixYq1M0kceBqyywkAvQMPsMOd51X-0V5Oagr2Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>         --Peter da Silva
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/8e6d763a-9b4d-4a68-810e-d6f16e80e118n%40googlegroups.com.

Re: [opencog-dev] Indexing in the AtomSpace

Reply via email to