Related to the updating of indexes. I'm working on a P2P capability which will make a JCR Repo behave essentially like a distributed blockchain database (i.e. "ledger"), where every node has a full copy of the DB/repo. One capability required for that which i've already completed is the implementation of a Merkle-Tree-like capability where I can tell if the full content under any given subgraph is identical to that located on some separate "peer" (network node), simply by comparing a SHA256 hash at both nodes (each node being on totally independent repositories).
The method for maintaining 'identical' copies of the repos (technically a subgraph in each) will be to use the Merkle-tree to perform a "sync" doing the "least effort" data transfers from peer to peer to perform the updates (syncing). I may end up using an open source BitTorrent library to perform the transmission of data between clients efficiently. So John, that kind of technique (BitTorrent protocol) could theoretically help you distribute index files across nodes rather than regenerating index files manually every time you spin one up. I admit I haven't even researched "Clusters" (in jackrabbit), and I don't know if those are sharded/federated, or whether they use a full "copy" on each node. Interestingly, if you're a fan of blockchain, i will also be using a public-key encryption system on this app to be able to authenticate who added what content, by having each 'edit' (node property modification) get hashed and then encrypted with the user's private key, and storing that encrypted hash on the tree. So the entire app I am implementing will BE a true blockchain, implemented as a layer built on top of the JCR. I think of what I'm doing as a "reference implementation" of what could eventually become a blockchain specification for the JCR which will be an extension to the JCR API specifically adding a blockchain protocol/layer on top of JCR, and hopefully will become an Apache Project of it's own, and a formal spec for how to use JCR to build out Blockchains. What I am doing is along the lines of Ethereum, by making blockchain be a more generic, accessible, reusable technology, but afaik Ethereum is not built on JCR, and I believe in building on top of JCR. Anyone who understands Merkle Trees AND the JCR and also is fully cognizant of blockchain would come to this same conclusion, I believe. So I hope at least a couple of the guys who are well-connected in Adobe will pass the word up the chain of command regarding this concept. In 10yrs nobody will want to use a content repository that doesn't have the level of 'trust' that can only come from a blockchain. I think in 10 to 20yrs even RDBs will have 'blockchain verifiable' transactions as built-in functions, in them also. But for now, a protocol layer on top of and separate from the JCR that specifically does blockchain functionality seems like the next step for blockchain technology and also for JCR. Who knows, maybe the world is ready for Adobe to start a cryptocurrency of their own!? Perhaps that would be the financial incentive to get them interested in this? I have $10K for that ICO ready and waiting!! I've probably violated the terms and conditions of this mailing list and I apologize if so. I went slightly beyond a reply to John. Best regards, Clay Ferguson https://github.com/Clay-Ferguson/meta64 [email protected] On Sat, Jun 24, 2017 at 6:52 AM, John Chilton <[email protected]> wrote: > Thanks Galo, this is useful information. > > When you say, “large” working sets, how large is large — just looking for > order of magnitude (Gig, Tera, Peta….)? > > Also, are you aware if any Mesos frameworks that offer similar > capabilities as K8s stateful sets? > > Thanks again, > > -John > > > On Jun 23, 2017, at 6:37 PM, Galo Gimenez <[email protected]> > wrote: > > > > One issue you will find on Jackrabbit is indexing, local storage is > ephemeral so new nodes need to re index and on large working sets this can > take hours. > > > > Kubernetes introduced stateful sets, this allows you to have very stable > naming and storage inside the cluster, and a consistent ordering when nodes > are started -https://kubernetes.io/docs/concepts/workloads/ > controllers/statefulset/ <https://kubernetes.io/docs/concepts/workloads/ > controllers/statefulset/>. > > > > — Galo > > > >> On Jun 23, 2017, at 11:03 PM, John Chilton <[email protected]> wrote: > >> > >> We are running in an orchestration environment — either > Mesos/Chronos/Marathon or Kubernetes. > >> > >> Each docker container needs to join the Jackrabbit cluster for the > lifetime of that container and then leave the Jackrabbit cluster when its > work is complete. > >> When each container joins the Jackrabbit cluster it is assigned a > unique cluster node id (repository.xml). We also have no upper bound on the > number of our containers that may join the cluster at any given time. > >> > >> Will this “dynamic” clustering work or will we encounter issues? Is > this ill-advised? or are there things we need to do beyond uniquely > identify each cluster node. > >> I Am trying to get ahead of issues that may arise when exercising this. > Any thoughts at all would be appreciated. > >> > >> Thanks, > >> > >> -John > >> > > > >
