OK,

I was hoping to have a technical conversation about chunks; my fault
because perhaps I was not clear about that.

When one attempts to store data in a computer, it is usually best if
related things are nearby. Literally -- nearby on disk, RAM, cache. When
one attempts to store data in a distributed database, it is best to if
related things are on the same network node, and travel together in the
same packets.  That way, you can issue fewer requests, initiate fewer
network connections and make fewer queries.

The actual technical problem that we actually currently have right now as a
real problem (and not a hypothetical pie-in-the-sky sci-fi problem, but as
a true real right-now-it-is-affecting-the-code problem, like
its-a-bug-in-the-current-codebase problem) is that we do not have any
effective strategy for identifying chunks in the atomspace.

We do have hashes on atoms -- these are currently 64-bit hashes, so not
cryptographic, but "good enough", but these hashes destroy locality.  Two
closely related atoms will have completely different hashes.  So hashes
cannot be used to determine locality.

What can? I don't know. That's why I'm asking.

For this to work, the chunking has to be immutable. The goal is that user A
on the other side of the planet can agree with user B on the name of the
chunk, without having to talk between each other first. (because forcing
users to talk is not scalable -- talk scales like N^2 but we want storage
to scale like log N or better.)

What I am fishing for, is either some example pseudocode, or the name of
some algorithm or some wikipedia article that describes that algorithm,
which can compute chunks.  Ideally, an algo that can run in fewer than a
few thousand CPU cycles, because, after that, performance becomes a
problem.  If none of this, then some brainstorming for how to find a
reasonable algorithm.

-- Linas


On Thu, Jul 16, 2020 at 3:51 PM Matt Chapman <[email protected]> wrote:

> All general intelligences we know of today are time-bound. Human brains
> can't solve problems that take more than ~100 years of processing, or more
> than ~100 hours of conscious attention. We could also use time bounds to
> limit chunks.
>
> If I understand correctly, the "chunk" is the set of atoms in some
> subgraphs. So then the question is, how much of our locally cached
> atomspace do we publish to the distributed atomspace, given that newly
> created atoms locally may have low probability of surviving attention
> allocation in the distributed atomspace.
>
> If you are the mind agent deciding this question, then the answer is, you
> push as many as you can in a fixed time, ordering the set of outgoing links
> from each success atom pushed by the local attention value in a
> breadth-first search. Each "message" in the (immutable) "distributed update
> event stream" should have a header stating whether it is the beginning of a
> new chunk, or the continuation of a chunk. Then each consumer of that
> update stream can likewise use a time limit to determine how many of the
> atoms in a chunk it can consume. If a given local pattern matching process
> fails to find a complete pattern match, but finds a partial match in atoms
> updated by one or more incompletely received chunks, then it can allocate
> additional time to retrieve the rest of the atoms in those chunks and
> continue it's search.
>
> I think this even generalizes to Execution Nodes -- if an executed atomese
> program fails to return a result of the expected form, and the program
> contains atoms from incomplete chunks, then retrieve the rest of those
> chunks. Alternatively, a specialized mind agent that is expected to handle
> executions might implement a rule to always alocat as much time as
> necessary to retrieve all execution nodes from all published chunks.
>
> Or I might be totally missing the point, but these are my thoughts after
> today's presentations. I've definitely made some assumptions about the
> architecture of distributed atomspaces (i.e., the use of immutable event
> streams) that may not be consistent with better established ideas, since
> I've not been paying much attention to OpenCog development lately.
>
> If you've read this far, I'll throw out 2 more brief thoughts:
> 1. Rust is awesome.
> 2. Semantic version is essential. Hyperon can be released as 1.0.0 and
> Tacyon can be 2.0.0 if it does not maintain backward compatibility.
>
> I can try to defend these claims when I have more time later, if they are
> controversial.
>
> Be well,
>
> Matt
>
> --
> Please interpret brevity as me valuing your time, and not as any negative
> intention.
>
>
> On Thu, Jul 16, 2020 at 11:53 AM Linas Vepstas <[email protected]>
> wrote:
>
>> Hi Cassio,
>>
>> Just got done listening to your CogCon presentation ... so .. chunks!  I
>> think you hit the nail on the head with the concept of "chunks". It is
>> maybe the #1 most important (hardest) part of the design.  Let me explain
>> why...
>>
>> I (recently) took two extremely-naive attempts at implementing
>> distributed atomspace -- atomspace-ipfs and atomspace-dht (see the github
>> repos) and since I was naive, these efforts... well, they work, but they
>> don't scale.
>>
>> The core problem that wrecks both of those was the problem of "chunks" --
>> of knowing when two atoms are "near" each other.  Of knowing when a glob of
>> atoms should travel together, should be lumped together.  Without knowing
>> what belongs to a glob, a chunk, it was hard/impossible to have a good,
>> scalable backend.
>>
>> ... and if you do know how to identify the boundaries of a glob, then
>> almost all of the other aspects of building a distributed atomspace become
>> "easy".
>>
>> Clarifying this makes all the other problems go away (or turn into
>> problems that any programmer can implement, without major stumbling blocks)
>> .. so any thoughts, work to clarify the idea of "chunks" would be ... well,
>> for me, it would be extremely important.
>>
>> -- linas
>>
>> --
>> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>>         --Peter da Silva
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CAHrUA37bz5CN%2BvDGXsy%3DFCcNjp_TY4ivNuDWN51LgdPCdawnSg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/opencog/CAHrUA37bz5CN%2BvDGXsy%3DFCcNjp_TY4ivNuDWN51LgdPCdawnSg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAPE4pjDc-H%3DSKmEM%3DAb%2BfjJPZ-rSgv39e9Lwhr8zPyJoH7EnPg%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAPE4pjDc-H%3DSKmEM%3DAb%2BfjJPZ-rSgv39e9Lwhr8zPyJoH7EnPg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
        --Peter da Silva

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA34jh1HJ-y7MUynkSpxJf2WyNdD9dps5W8cSq%2BdrR5FHPQ%40mail.gmail.com.

Reply via email to