Re: Atomspace RAM & CPU usage ... was Re: [opencog-dev] Indexing in the AtomSpace

Abdulrahman Semrie Tue, 15 Sep 2020 09:00:25 -0700

>  But I don't understand what you are saying. What is the problem that is 
being solved? How would this be better than what there is now?


In this particular case, I’m pointing out that you aren’t using the 
suggestion you gave me for indexing atoms by their type in the current 
Atomspace - i.e MemberLinks instead of a hash table. So I am wondering why 
is the TypIndex not implemented using a MemberLink (as you suggested above) 
and according to the example in the document you linked ?

Anyways, as it pertains to the annotation work, I'm using a simple hack to 
get the type using only the name. It works and the performances isn't bad 
because we only have about two dozen types. You can see the code here : 
https://github.com/Habush/atomspace-rpc/blob/7b01ef18f70473457a97159c0b120743428272cb/src/manager/AtomSpaceManager.cpp#L103

On Wednesday, September 9, 2020 at 1:24:09 AM UTC+3 linas wrote:

> On Tue, Sep 8, 2020 at 3:32 PM Abdulrahman Semrie <[email protected]> 
> wrote:
>
>> Great write up Linas!
>>
>> I'm fuzzy on the formula for the size of vertex and edge tables on page 
>> 6. It'd be great if you added an explanation to make it more clear.
>>
>
> Which part is unclear?  
>
>>
>> With regards to indexing, the benefit of using graph dbs for partial 
>> indices is clear. But I have one question  with regards to the current 
>> Atomspace design. In your example, you represent the departments as 
>> "privileged vertices" and connect them to their respective employee 
>> vertices. In the current AtomSpace, there is a TypeIndex which is 
>> represented using a hash table (std::unordered_multimap to be exact). Why 
>> not represent the types using vertices and connect every other atom to the 
>> type vertice it belongs to? Like you suggested above, something like 
>> (MemberLink (Concept "Uniprot:12233") (Concept "ProteinNode")). This will 
>> lead to some type vertices being "Supernodes" in that a single vertex will 
>> be connected to many vertices, perhaps millions of vertices. This will 
>> result in a performance issue with naive graph db representations because 
>> the outgoing set of the type vertices will be very large. Titandb solves 
>> this by having the concept of unidirectional edge where only the 
>> destination vertex is aware of its connection to the supernode. But looking 
>> at the hypergraph tables in the document, this problem is already solved. 
>> So why not use this approach for the TypeIndex?
>>
>
> Minor correction: it would by (TypeNode 'Protein) -- the TypeNode performs 
> a check to avoid mis-spellings, and slots in naturally into things that 
> expect types.
>
> But I don't understand what you are saying. What is the problem that is 
> being solved? How would this be better than what there is now?
>  
>
>>
>> Re: using MemberLinks as a way of indexing by name
>>
>> Where do you think all of that RAM usage is going? Where do you think 
>> indexes are kept? The MemberLink maintains indexes in the 
>> incoming/outgoing sets, those are just c++ std::set and std::vector, 
>> respectively. If you create some other index, you are just moving around 
>> where the RAM is being used.  You're talking about shifting around the 
>> internal representation; you are not proposing anything that will actually 
>> decrease RAM usage. 
>>
>> Correct me if I'm wrong but won't using std::unorder_multimap<string, 
>> Handle> will have less RAM usage than creating new ConceptNode and 
>> MemberLinks for indexing?
>>
>
> What problem does this solve?
>
> Also how about adding an api in the atomspace so that we can use external 
>> index stores like ElasticSearch/Apache Solr? This is especially useful if 
>> we want to do full-text search on atom names.  I can help with this 
>> integration if you think this idea is worthwhile.
>>
>
> What problem does this solve? My knee-jerk reaction is that it is a 
> terrible idea -- you were complaining about RAM usage, but now you want to 
> introduce large, complex, RAM-intensive applications?  Yuck!
>
> Previous emails offered 3-4 different simple, small light-weight solutions 
> that would be faster, easier, less RAM-intensive than Solr/Elastic.  But 
> the key problem is that you still haven't explained what the actual problem 
> is -- so it is effectively impossible to discuss solutions, if the problem 
> remains hidden.
>
> I have the general impression that you are suffering from a case of "the 
> grass is greener on the other side" -- you are looking at these corporate 
> mega-solutions and are imagining how wonderful it might be to work for some 
> large corporation, solving some kind of mega-problem where you have 300 
> people attending division-wide meetings to organize and plan. 
>
> I'm not sure what to say to that -- there might be more money, but it is 
> definitely NOT more pleasant or fun. Thrilling, at first -- sure. Like any 
> good drug. I could say something like "try it- get a job at mega-corp" but 
> you will then disappear for a decade, before you realize what a terrible 
> mistake you've made, and will be quite unable to find the exit door.(*) If 
> you are young .. I dunno. What's 10 years, when you're young? You will get 
> a hands-on education in corporate structure and business administration, 
> which is worth-while if your end-goals are to run departments or manage 
> large projects.  
>
> Just be aware that managing large projects requires exceptional political 
> skills. Politics is a snake-pit -- things are always reaching up from 
> unexpected locations and biting you in unexpected ways. Random people will 
> stab you in the back, and you will be nicely set up in situations where you 
> will be knocked down, forced to fail in the most ugly, public fashion 
> possible.  The competition for executive jobs is fierce and brutal. Winning 
> is only 2/3rds luck and 1/3rd skill.
>
> -- Linas
>
> (*) Those Hollywood movies about unhappy Americans and unhappy housewives? 
> They may be fictional, but they have a basis in reality.
>
> -- 
> Patrick: Are they laughing at us?
> Sponge Bob: No, Patrick, they are laughing next to us.
>  
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/cdec27bb-84c2-416e-8ed9-be25d311d8dbn%40googlegroups.com.

Re: Atomspace RAM & CPU usage ... was Re: [opencog-dev] Indexing in the AtomSpace

Reply via email to