Hi,
I'm just evaluating what best to use for a distributed in-memory and
on-disk data storage system / NoSQL data store.
That is for instance:
- single master receiving writes => distributing them (the transaction
log) to a number of nodes synchronously and to all others asynchronously
- providing real ACID transactions (maybe locking database changes until
most (and we need to define what most means) nodes respond that they
wrote the value, as some nodes can simply fail / shutdown / whatever).
The old revisions can be read regardless of the lock.
- if the transaction on a node in the cluster fails send an event to a
queue to rollback the most recent revision (maybe if a .commit-file is
existing remove the most recent revision up to the latest committed).
- need to know on which node in the cluster a specific resource of a
database resides (indexes are always part of the resource).
- sending events exactly once semantics maybe
- maybe multi-master replication between two master-nodes in different
networks (but that's maybe nice to have some time).
...
But I'm sure you know a lot more about all the problems in distributed
systems ;-)
As of now the storage system simply is on a single node (and the storage
engine very similar to how ZFS works internally with indirect blocks...
has been written from Scratch). I want to distribute it in the future to
provide horizontal scalability like MongoDB (without the eventual
consistency probably), CockroachDB, Cassandra...
I know it's not simple and likely needs a few years, but I think it's
doable :)
First, I guess replicating a resource in a database to a bunch of nodes
within a transaction, then look into partitioning and then how to ship
queries to specific nodes...
kind regards
Johannes
On 15.03.19 11:51, Ilya Kasnacheev wrote:
Hello!
Unfortunately, after re-reading your message several times, I still do
not understand:
- What did you actually do.
- Whether you have any questions for community.
- Whether you have any specific use cases to share.
Regards,
--
Ilya Kasnacheev
пт, 15 мар. 2019 г. в 11:19, Johannes Lichtenberger
<[email protected]
<mailto:[email protected]>>:
Hi,
as we are working with Ignite in the company I work for, basically
for
in-memory Grids and horizontal scaling in the cloud I guess Ignite is
also a perfect fit for adding replication/partitioning to a temporal
NoSQL storage system capable of storing revisions of both XML- and
JSON-documents (could also store any other kind of data) in a binary
format efficiently (https://sirix.io or
https://github.com/sirixdb/sirix
-- at least for the storage part itself). It started as a university
project, but now I'm really eager to put forth the idea of keeping
the
history of your data as efficiently as possible (minimal storage- and
query-overhead within the same asymptotic space and time
complexity as
other database systems, which usually do not keep the history -- for
instance through a novel sliding snapshot algorithm and copy-on-write
semantics at the per page / per record level, heavily inspired by
ZFS).
Maybe for the query plan rewriting (the AST of the query) and
distribution Apache Spark is better suited, but for distributing
transaction logs and executing transactions I think Ignite is the
way to go.
What do you think?
I just have to finish the JSONiq query language implementation, but
after releasing 1.0 in summer and stabilizing the core as well as
keeping the APIs stable (and defining a spec for the binary
representation, saving some space in page-headers for future
encryption
at rest for instance) I'm eager to work on clustering for Sirix :-)
Oh and if you're interested, go ahead, clone it, download the Zip,
the
Docker image, whatever and let me know what you think :)
kind regards
Johannes