Linas, So it doesn't directly address the issue, but this paper looks vaguely relevant
https://www.sandia.gov/~srajama/publications/twolevel_ipdps18.pdf They are looking at graph chunking for a different purpose (graph algorithms on high bandwidth memory machines) and they experiment with vertex chunking, edge chunking, random chunking and partitioning based chunking... An important point they note is that for real-world graphs, vertex and edge chunking (which are very simple) work about as well as partitioning based chunking, whereas this is not so true for random graphs. The point being that in real-world graphs, much-interlinked nodes tend to be highly related in terms of usage ... In the Gemini distributed graph processing system, https://www.usenix.org/sites/default/files/conference/protected-files/osdi16_slides_zhu.pdf https://github.com/thu-pacman/GeminiGraph they divide graphs into chunks based on "# vertices + # edges" , so a mix of vertex chunking and edge chunking... They report this works quite well emphasize that this is because of the nice locality properties (nearby nodes tend to be used together) of the graphs they're dealing with -- Ben On Thu, Jul 23, 2020 at 3:54 PM Linas Vepstas <[email protected]> wrote: > > > > On Thu, Jul 23, 2020 at 11:34 AM Ben Goertzel <[email protected]> wrote: >> >> > What I am fishing for, is either some example pseudocode, or the name of >> > some algorithm or some wikipedia article that describes that algorithm, >> > which can compute chunks. Ideally, an algo that can run in fewer than a >> > few thousand CPU cycles, because, after that, performance becomes a >> > problem. If none of this, then some brainstorming for how to find a >> > reasonable algorithm. >> > >> >> Linas, just to be sure we're in sync -- how large of a chunk are you >> thinking this algorithm would typically find? > > > Arbitrary. If you look at what happened with opencog-ipfs or opencog-dht, > there are several key operations. One is, of course, "who's got this atom?" > but that's easy: each atom has a 64-bit hash (or 80-bit on opendht by > default, but that's settable). Next, "what's the incoming set of this atom?" > Whoops, can't compute the hash of that, because we don't know what it is. So > you can ask, and get back a list of N other atoms (or hashes) that are in the > incoming set. Where are they? Well, each different atom gets a totally > different hash, so they spread all over the planet (because that's how > Kademlia works), when in fact, what we really wanted to say was "gee golly, > the incoming set of an atom is 'close to' the atom itself, get me the ball of > close-by stuffs". But I can't figure out how to "say that". > > Anyway, that is what I am trying to define as a chunk: an atom and everything > "nearby", with a variable conception of "nearby". > > atomspace-ipfs had multiple major stumbling blocks. One is that the IPFS > documents are immutable, so for each new atomspace, you have to publish a > brand new document -- which has a completely different hash, so whoops, how > do findout out the hash of that?. Well, IPFS has a DNS-like naming system, > but it was horridly slow, totally unusuable (multi-secnod lookups with > 60-second timeouts). The second problem is that its "centralized" -- you have > to jam the *entire* atomspace into the document. So its klunky. Won't scale > for large atomspaces. Some notion of chunks alleviates that. But maybe > something less klunky than IPFS would be better. > > So that suggests a lower-level building block - e.g. opendht. and that is how > atomspace-dht was born. But that now seems to be maybe "too low". It suffered > from the chunking problem. > > Here's one, somewhere in the middle: "earthstar" -- > https://github.com/cinnamon-bun/earthstar is a decentralized document store. > Cross out "document" and write "atomspace" instead. Or rather cross out > "atomspace" and write "chunk" instead. Or something like that. Quite unclear. > > The reason atomspace-cog got created is it seems best to have "seeders", same > idea as in bittorent, so at least one server that is the source of truth for > a given atomspace, even if all the other servers are down/offline. The > current ipfs and dht backends do not use seeders, but I've got extremely > vague plans to change that. > > --linas > > > -- > Verbogeny is one of the pleasurettes of a creatific thinkerizer. > --Peter da Silva > > -- > You received this message because you are subscribed to the Google Groups > "opencog" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CAHrUA34VPBdCw-hoXH-Y8RWa-eQRAjvUGQ%2BbRNSdU8wu95dbgA%40mail.gmail.com. -- Ben Goertzel, PhD http://goertzel.org “The only people for me are the mad ones, the ones who are mad to live, mad to talk, mad to be saved, desirous of everything at the same time, the ones who never yawn or say a commonplace thing, but burn, burn, burn like fabulous yellow roman candles exploding like spiders across the stars.” -- Jack Kerouac -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBcZcMFZi1y%3Dk0P2V76%3Df-ncSuZL6jgyosvgbKG2LsfJkg%40mail.gmail.com.
