Hi,

I'm just evaluating what best to use for a distributed in-memory and on-disk data storage system / NoSQL data store.

That is for instance:

- single master receiving writes => distributing them (the transaction log) to a number of nodes synchronously and to all others asynchronously

- providing real ACID transactions (maybe locking database changes until most (and we need to define what most means) nodes respond that they wrote the value, as some nodes can simply fail / shutdown / whatever). The old revisions can be read regardless of the lock.

- if the transaction on a node in the cluster fails send an event to a queue to rollback the most recent revision (maybe if a .commit-file is existing remove the most recent revision up to the latest committed).

- need to know on which node in the cluster a specific resource of a database resides (indexes are always part of the resource).

- sending events exactly once semantics maybe

- maybe multi-master replication between two master-nodes in different networks (but that's maybe nice to have some time).

...

But I'm sure you know a lot more about all the problems in distributed systems ;-)

As of now the storage system simply is on a single node (and the storage engine very similar to how ZFS works internally with indirect blocks... has been written from Scratch). I want to distribute it in the future to provide horizontal scalability like MongoDB (without the eventual consistency probably), CockroachDB, Cassandra...

I know it's not simple and likely needs a few years, but I think it's doable :)

First, I guess replicating a resource in a database to a bunch of nodes within a transaction, then look into partitioning and then how to ship queries to specific nodes...

kind regards

Johannes


On 15.03.19 11:51, Ilya Kasnacheev wrote:
Hello!

Unfortunately, after re-reading your message several times, I still do not understand:

- What did you actually do.
- Whether you have any questions for community.
- Whether you have any specific use cases to share.

Regards,
--
Ilya Kasnacheev


пт, 15 мар. 2019 г. в 11:19, Johannes Lichtenberger <[email protected] <mailto:[email protected]>>:

    Hi,

    as we are working with Ignite in the company I work for, basically
    for
    in-memory Grids and horizontal scaling in the cloud I guess Ignite is
    also a perfect fit for adding replication/partitioning to a temporal
    NoSQL storage system capable of storing revisions of both XML- and
    JSON-documents (could also store any other kind of data) in a binary
    format efficiently (https://sirix.io or
    https://github.com/sirixdb/sirix
    -- at least for the storage part itself). It started as a university
    project, but now I'm really eager to put forth the idea of keeping
    the
    history of your data as efficiently as possible (minimal storage- and
    query-overhead within the same asymptotic space and time
    complexity as
    other database systems, which usually do not keep the history -- for
    instance through a novel sliding snapshot algorithm and copy-on-write
    semantics at the per page / per record level, heavily inspired by
    ZFS).

    Maybe for the query plan rewriting (the AST of the query) and
    distribution Apache Spark is better suited, but for distributing
    transaction logs and executing transactions I think Ignite is the
    way to go.

    What do you think?

    I just have to finish the JSONiq query language implementation, but
    after releasing 1.0 in summer and stabilizing the core as well as
    keeping the APIs stable (and defining a spec for the binary
    representation, saving some space in page-headers for future
    encryption
    at rest for instance) I'm eager to work on clustering for Sirix :-)

    Oh and if you're interested, go ahead, clone it, download the Zip,
    the
    Docker image, whatever and let me know what you think :)

    kind regards

    Johannes

Reply via email to