Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

2020-12-29 Thread Courtney Robinson
Hey Denis,

It's been a while. Hope you've been keeping well!
A meet the second week of Jan will be great. The agenda looks good too.

We've been watching the calcite SQL engine

wiki
and task list

but
I got the impression it wouldn't be that soon. May be worth getting
involved there to keep some of the calcite APIs exposed for us to be able
to tailor it because my ideal design would have Ignite in front of anything
we end up doing.

We'll sort out the call details closer to the time. Thanks Denis, speak to
you and Val in two weeks.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) 


https://hypi.io


On Wed, Dec 30, 2020 at 12:02 AM Denis Magda  wrote:

> Hi Courtney,
>
> Glad to hear from you! It's been a while since we met last time. It's
> truly disappointing seeing you struggle with Ignite that much. Thanks for
> being open and kicking the discussion off.
>
> How about three of us (you, Val, and I) meet the second week of January
> and talk out the issues? Then you can share a talk summary here with a
> broader community, if you wish. Between us, I've been personally
> championing the column-type-change feature for a while, and with this IEP
> of Ignite 3.0
> 
> it should be doable.
>
> In the meantime, some items for our agenda and pointers:
>
>- How does your upgrade procedure look like? Ignite doesn't have a
>rolling-upgrade feature; thus, to upgrade from version A to B in a
>*consistent way* the one should stop the cluster.
>- re: the frustration about spending too much time on the
>administration and infrastructure-related activities. Are you considering
>any managed service options? As a developer, I understand this frustration.
>I was just lucky to stay away from a need to administer Postgres, Oracle,
>MySQL, or Ignite just because we either had an IT administrator or made use
>of a managed-service option. Not selling anything here, just know that all
>those administration routines are unavoidable. The thing with Ignite is
>that distributed nature makes things more complicated.
>- Ignite ML and SQL: in fact, you can already call Ignite ML models
>from a SQL query. I need to search for pointers. @Alexey Zinoviev
>, you can probably share some examples quicker.
>- Calcite and Ignite: in Q2 we're planning to release the
>Calcite-powered SQL engine in Ignite. That would be a dramatic improvement
>in our SQL capabilities. It should be possible to enable push-downs much
>more easily.
>
>
> -
> Denis
>
>
> On Mon, Dec 28, 2020 at 11:50 PM Courtney Robinson <
> courtney.robin...@hypi.io> wrote:
>
>> Hi Val,
>> Thanks. You're not missing anything and we have been using Ignite
>> persistence for years.
>> Among the reasons we want to switch to in-memory only is how easily we
>> seem to get corrupt nodes. I mentioned in the first email. We haven't had a
>> situation where upgrading corrupts more than 1 of our 3 replicas but we
>> genuinely fear upgrading sometimes as a result and is why we put so much
>> effort into our backup/restore solution.
>>
>> One of the arguments for disabling persistence is that other DBs seems to
>> have had longer to solve the issue. We've operated Postgres for longer than
>> Ignite persistence and have never had it corrupt its data on disk
>> (admittedly it's not distributed so that could play a role) but we've also
>> been running Cassandra for longer as well without ever having any data
>> corruption. All 3 in Kubernetes. With Ignite we've tried many things and a
>> few settings between Ignite rebalanceDelay (I forget which other ones) and
>> k8s readiness/liveness probes seems to have landed in a sweet spot that's
>> reduced how often it happens but if we happen to have any issues with the
>> k8s control plane, scheduling delays or network issues then the chances of
>> it skyrockets. It then requires manual intervention I believe the most
>> frequent issue is that a node will start and the topology diverged so the
>> cluster doesn't think it's a part of it, the most effective thing we've
>> found is deleting the data on that node and having it join as a new node
>> for re-balancing to kicking and push data back to it. I think we have a few
>> tickets somewhere with details.
>>
>> This is the primary motivator for wanting to replace Ignite persistence.
>>
>>  Here's our use case:
>> We provide a low-code app development platform. As a developer you define
>> tables in a visual interface or via GraphQL. We map these on to Ignite
>> tables. It is a multi-tenant system complete with
>> authentication/authorisation, serverless functions, API Gateway and
>> more...all the bells 

Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

2020-12-29 Thread Denis Magda
Hi Courtney,

Glad to hear from you! It's been a while since we met last time. It's truly
disappointing seeing you struggle with Ignite that much. Thanks for being
open and kicking the discussion off.

How about three of us (you, Val, and I) meet the second week of January and
talk out the issues? Then you can share a talk summary here with a broader
community, if you wish. Between us, I've been personally championing the
column-type-change feature for a while, and with this IEP of Ignite 3.0

it should be doable.

In the meantime, some items for our agenda and pointers:

   - How does your upgrade procedure look like? Ignite doesn't have a
   rolling-upgrade feature; thus, to upgrade from version A to B in a
   *consistent way* the one should stop the cluster.
   - re: the frustration about spending too much time on the administration
   and infrastructure-related activities. Are you considering any managed
   service options? As a developer, I understand this frustration. I was just
   lucky to stay away from a need to administer Postgres, Oracle, MySQL, or
   Ignite just because we either had an IT administrator or made use of a
   managed-service option. Not selling anything here, just know that all those
   administration routines are unavoidable. The thing with Ignite is that
   distributed nature makes things more complicated.
   - Ignite ML and SQL: in fact, you can already call Ignite ML models from
   a SQL query. I need to search for pointers. @Alexey Zinoviev
   , you can probably share some examples quicker.
   - Calcite and Ignite: in Q2 we're planning to release the
   Calcite-powered SQL engine in Ignite. That would be a dramatic improvement
   in our SQL capabilities. It should be possible to enable push-downs much
   more easily.


-
Denis


On Mon, Dec 28, 2020 at 11:50 PM Courtney Robinson <
courtney.robin...@hypi.io> wrote:

> Hi Val,
> Thanks. You're not missing anything and we have been using Ignite
> persistence for years.
> Among the reasons we want to switch to in-memory only is how easily we
> seem to get corrupt nodes. I mentioned in the first email. We haven't had a
> situation where upgrading corrupts more than 1 of our 3 replicas but we
> genuinely fear upgrading sometimes as a result and is why we put so much
> effort into our backup/restore solution.
>
> One of the arguments for disabling persistence is that other DBs seems to
> have had longer to solve the issue. We've operated Postgres for longer than
> Ignite persistence and have never had it corrupt its data on disk
> (admittedly it's not distributed so that could play a role) but we've also
> been running Cassandra for longer as well without ever having any data
> corruption. All 3 in Kubernetes. With Ignite we've tried many things and a
> few settings between Ignite rebalanceDelay (I forget which other ones) and
> k8s readiness/liveness probes seems to have landed in a sweet spot that's
> reduced how often it happens but if we happen to have any issues with the
> k8s control plane, scheduling delays or network issues then the chances of
> it skyrockets. It then requires manual intervention I believe the most
> frequent issue is that a node will start and the topology diverged so the
> cluster doesn't think it's a part of it, the most effective thing we've
> found is deleting the data on that node and having it join as a new node
> for re-balancing to kicking and push data back to it. I think we have a few
> tickets somewhere with details.
>
> This is the primary motivator for wanting to replace Ignite persistence.
>
>  Here's our use case:
> We provide a low-code app development platform. As a developer you define
> tables in a visual interface or via GraphQL. We map these on to Ignite
> tables. It is a multi-tenant system complete with
> authentication/authorisation, serverless functions, API Gateway and
> more...all the bells and whistles for building modern applications in
> record time. We make it easy and convenient to get the scale and
> performance you expect from a large company with as little as the effort of
> an intern.
>
> The thing about convenience is that there are expectations set by others
> (the industry in this case) that the convenient things you provide include
> a baseline that matches industry or whoever has set the expectations.
> It is a sizeable task to implement/enforce referential integrity and
> delete cascade application side and I don't know of any distributed
> database which has this built in but the lack of transactions in SQL means
> we can't really implement it without pretty much implementing what the DB
> should be doing to keep track and rollback on failure.
>
> In an earlier version of the platform we implemented this using the key
> value APIs but we were not using SQL at all. We moved to SQL 1.5yrs ago or
> so because we needed greater and greater query capability and we were
> implementing a 

Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

2020-12-28 Thread Courtney Robinson
Hi Val,
Thanks. You're not missing anything and we have been using Ignite
persistence for years.
Among the reasons we want to switch to in-memory only is how easily we seem
to get corrupt nodes. I mentioned in the first email. We haven't had a
situation where upgrading corrupts more than 1 of our 3 replicas but we
genuinely fear upgrading sometimes as a result and is why we put so much
effort into our backup/restore solution.

One of the arguments for disabling persistence is that other DBs seems to
have had longer to solve the issue. We've operated Postgres for longer than
Ignite persistence and have never had it corrupt its data on disk
(admittedly it's not distributed so that could play a role) but we've also
been running Cassandra for longer as well without ever having any data
corruption. All 3 in Kubernetes. With Ignite we've tried many things and a
few settings between Ignite rebalanceDelay (I forget which other ones) and
k8s readiness/liveness probes seems to have landed in a sweet spot that's
reduced how often it happens but if we happen to have any issues with the
k8s control plane, scheduling delays or network issues then the chances of
it skyrockets. It then requires manual intervention I believe the most
frequent issue is that a node will start and the topology diverged so the
cluster doesn't think it's a part of it, the most effective thing we've
found is deleting the data on that node and having it join as a new node
for re-balancing to kicking and push data back to it. I think we have a few
tickets somewhere with details.

This is the primary motivator for wanting to replace Ignite persistence.

 Here's our use case:
We provide a low-code app development platform. As a developer you define
tables in a visual interface or via GraphQL. We map these on to Ignite
tables. It is a multi-tenant system complete with
authentication/authorisation, serverless functions, API Gateway and
more...all the bells and whistles for building modern applications in
record time. We make it easy and convenient to get the scale and
performance you expect from a large company with as little as the effort of
an intern.

The thing about convenience is that there are expectations set by others
(the industry in this case) that the convenient things you provide include
a baseline that matches industry or whoever has set the expectations.
It is a sizeable task to implement/enforce referential integrity and delete
cascade application side and I don't know of any distributed database which
has this built in but the lack of transactions in SQL means we can't really
implement it without pretty much implementing what the DB should be doing
to keep track and rollback on failure.

In an earlier version of the platform we implemented this using the key
value APIs but we were not using SQL at all. We moved to SQL 1.5yrs ago or
so because we needed greater and greater query capability and we were
implementing a custom query layer on the KV APIs that were starting to look
remarkably close to what SQL could do.

In our use case, there are only 4 tables known at development time. Every
app in the platform defines its tables dynamically and this has an impact
on what Ignite capabilities are realistic to use. Including our own UI that
lets users create other apps, everything is user defined.

One issue that our customers are constantly complaining about is the
inability for them to change a column's type.
We tried generating Java classes, compiling and including Ignite
annotations but found that the JVM ends up doing a lot more GC leading to
crazy long GC pauses. Generating classes meant we could re-map the same
field to a different field in the generated class and silently migrate the
data on known types like int to float etc or with a user provided function
to do the conversion. It proved impractical as we'd sometimes get JVMs
unusable for long times and if we had this happening on 3 nodes that happen
to have all 3 replicas of your data then your entire app's response time
goes from 50 milliseconds to several seconds or however long it lasts for
(10s of seconds sometimes).

This got worse as the JVM pauses would lead to Ignite map (I forget the
exact error but it was something about failing to execute a map during a
SQL query) failures on unrelated queries causing a cascade of failures
across tenants.
We're investigating a new approach for this by storing mappings from user
fields to different table names so we'd be able to get back the ability to
change the type of a field and migrate old data. We're not sure of the
overhead this will introduce yet, still being investigated. Your thoughts
here are appreciated.

Next is on extending the available SQL dialect, we include indexing
capabilities for our customers. Before migrating to the SQL APIs, we
integrated Apache Lucene (we have a filter language that mapped to lucene
queries). Indices were replicated but not partitioned and we used affinity
compute to send queries to the right 

Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

2020-12-28 Thread Valentin Kulichenko
Hi Courtney,

Thanks for your feedback!

To cut the story short, Ignite implements page memory architecture. All
data is split into fixed-sized pages and any page can reside either both in
memory and on disk or on disk only. Since the in-memory layer and
persistence layer are natively integrated - i.e. there is a unified
mechanism for page lookup - Ignite storage engine always knows where a
particular page is and can transparently load it into memory if needed.
>From this perspective, Ignite is actually very similar to how other
databases work, although it has a much bigger focus on scalability and
in-memory processing, richer compute capabilities, etc.

With an external database it's much more complicated - generally speaking,
there is no way of knowing which data needs to be loaded from the database
into memory for the purpose of an arbitrary query. This is a fundamental
limitation and quite frankly, I don't think there is any other
technology that can do that. Not in a consistent and performant way at
least. As a matter of fact, this limitation is exactly what used to be the
primary driver behind the Native Persistent development in the first place.

Is there any particular reason why you can't use Native Persistence instead
of an external database? It sounds like that's what you need for your use
case unless I'm missing something. Can you tell us a little bit more about
your use case and requirements? I'm sure we can come up with a solution
that would satisfy those requirements without a need for re-implementing
the whole thing.

Thanks,
Val

On Mon, Dec 28, 2020 at 12:58 PM Courtney Robinson <
courtney.robin...@hypi.io> wrote:

> I know this was over the holiday so bumping. Can anyone provide any
> pointers on where to start looking or anything else mentioned in the
> previous email?
> Thanks
>
> On Sat, Dec 26, 2020 at 8:39 PM Courtney Robinson <
> courtney.robin...@hypi.io> wrote:
>
>> We've been using Ignite in production for almost 3 years and we love the
>> platform but there are some increasingly frustrating points we run into.
>> Before the holidays a few of our engineers started looking around and have
>> now presented a serious case for migrating from Ignite. We would end up
>> using at least 3 technologies they've identified to bridge the gap left by
>> Ignite features but have presented good cases for why managing these
>> individually would be a more flexible solution we could grow with.
>>
>> I am not keen on the idea as it presents a major refactor that would
>> likely take 6 months to get to production but I understand and agree with
>> the points they've made. I'm trying to find a middle ground as this seems
>> like the nuclear option to me.
>>
>> Of the top of my head some things they've raised are:
>>
>>1. Lack of tooling
>>   1. Inability to change a column type + no support in schema
>>   migration tools that we've found (even deleting the column we can't 
>> reuse
>>   the name)
>>   2. We had to build our own backup solution and even now backup has
>>   landed in 2.9 we can't use it directly because we have implemented
>>   relatively granular backup to be able to push to S3 compatible APIs 
>> (Ceph
>>   in our case) and restore partially to per hour granularity. Whilst 
>> we've
>>   done it, it took some serious engineering effort and time. We 
>> considered
>>   open sourcing it but it was done in a way that's tightly coupled to our
>>   internal stack and APIs.
>>   2. Inconsistency between various Ignite APIs.
>>   1. Transactions on KV, none of SQL (or now in beta but work seem
>>   seems to have paused?)
>>   2. SQL limitations - SELECT queries never read through data from
>>   the external database
>>   3. Even if we implemented a CacheStore we have to load all data
>>   into Ignite to use SELECT
>>   4. No referential integrity enforcement
>>3. It is incredibly easy to corrupt the data in Ignite persistence.
>>We've gotten better due to operational experience but upgrades (in k8s)
>>still on occasion lead to one or two nodes being corrupt when their pod 
>> was
>>stopped
>>
>> I'll stop there but the point is, after 3yrs in production the team feels
>> like they're always running up against a wall and that Ignite has created
>> that wall.
>>
>> My goal in writing this is to find out more about why the limitation
>> around CacheStore exists, how does Ignite persistence achieve partially
>> caching data in memory and pulling from disk if the data is not in memory
>> and why can't that apply to a CacheStore as well?
>>
>> What would it take to make it so that Ignite's SQL operations could be
>> pushed down to a CacheStore implementation?
>> Ignite's a relatively large code base, so hints about which
>> classes/interfaces to investigate if we're looking to replace Ignite
>> persistence would be incredibly useful. My idea at the moment is to have
>> Ignite as the in-memory SQL layer 

Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

2020-12-28 Thread Courtney Robinson
I know this was over the holiday so bumping. Can anyone provide any
pointers on where to start looking or anything else mentioned in the
previous email?
Thanks

On Sat, Dec 26, 2020 at 8:39 PM Courtney Robinson 
wrote:

> We've been using Ignite in production for almost 3 years and we love the
> platform but there are some increasingly frustrating points we run into.
> Before the holidays a few of our engineers started looking around and have
> now presented a serious case for migrating from Ignite. We would end up
> using at least 3 technologies they've identified to bridge the gap left by
> Ignite features but have presented good cases for why managing these
> individually would be a more flexible solution we could grow with.
>
> I am not keen on the idea as it presents a major refactor that would
> likely take 6 months to get to production but I understand and agree with
> the points they've made. I'm trying to find a middle ground as this seems
> like the nuclear option to me.
>
> Of the top of my head some things they've raised are:
>
>1. Lack of tooling
>   1. Inability to change a column type + no support in schema
>   migration tools that we've found (even deleting the column we can't 
> reuse
>   the name)
>   2. We had to build our own backup solution and even now backup has
>   landed in 2.9 we can't use it directly because we have implemented
>   relatively granular backup to be able to push to S3 compatible APIs 
> (Ceph
>   in our case) and restore partially to per hour granularity. Whilst we've
>   done it, it took some serious engineering effort and time. We considered
>   open sourcing it but it was done in a way that's tightly coupled to our
>   internal stack and APIs.
>   2. Inconsistency between various Ignite APIs.
>   1. Transactions on KV, none of SQL (or now in beta but work seem
>   seems to have paused?)
>   2. SQL limitations - SELECT queries never read through data from
>   the external database
>   3. Even if we implemented a CacheStore we have to load all data
>   into Ignite to use SELECT
>   4. No referential integrity enforcement
>3. It is incredibly easy to corrupt the data in Ignite persistence.
>We've gotten better due to operational experience but upgrades (in k8s)
>still on occasion lead to one or two nodes being corrupt when their pod was
>stopped
>
> I'll stop there but the point is, after 3yrs in production the team feels
> like they're always running up against a wall and that Ignite has created
> that wall.
>
> My goal in writing this is to find out more about why the limitation
> around CacheStore exists, how does Ignite persistence achieve partially
> caching data in memory and pulling from disk if the data is not in memory
> and why can't that apply to a CacheStore as well?
>
> What would it take to make it so that Ignite's SQL operations could be
> pushed down to a CacheStore implementation?
> Ignite's a relatively large code base, so hints about which
> classes/interfaces to investigate if we're looking to replace Ignite
> persistence would be incredibly useful. My idea at the moment is to have
> Ignite as the in-memory SQL layer with a SQL MPP providing persistence.
>
> To me right now the path forward is for us to put the work into removing
> these Ignite limitations if possible. We have a mixture of on-premise
> clients for our product as well as a multi-tenant SaaS version - some of
> these on-prem clients depend on Ignite's in-memory capabilities and so we
> can't easily take this away.
>
> FYI it doesn't have to be CacheStore, I realise this just inherits the
> JCache interface just generally can something like CacheStore be
> implemented to replace the integration that Ignite persistence provides?
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> https://hypi.io
>


Ignite in-memory + other SQL store without fully loading all data into Ignite

2020-12-26 Thread Courtney Robinson
We've been using Ignite in production for almost 3 years and we love the
platform but there are some increasingly frustrating points we run into.
Before the holidays a few of our engineers started looking around and have
now presented a serious case for migrating from Ignite. We would end up
using at least 3 technologies they've identified to bridge the gap left by
Ignite features but have presented good cases for why managing these
individually would be a more flexible solution we could grow with.

I am not keen on the idea as it presents a major refactor that would likely
take 6 months to get to production but I understand and agree with the
points they've made. I'm trying to find a middle ground as this seems like
the nuclear option to me.

Of the top of my head some things they've raised are:

   1. Lack of tooling
  1. Inability to change a column type + no support in schema migration
  tools that we've found (even deleting the column we can't reuse the name)
  2. We had to build our own backup solution and even now backup has
  landed in 2.9 we can't use it directly because we have implemented
  relatively granular backup to be able to push to S3 compatible APIs (Ceph
  in our case) and restore partially to per hour granularity. Whilst we've
  done it, it took some serious engineering effort and time. We considered
  open sourcing it but it was done in a way that's tightly coupled to our
  internal stack and APIs.
  2. Inconsistency between various Ignite APIs.
  1. Transactions on KV, none of SQL (or now in beta but work seem
  seems to have paused?)
  2. SQL limitations - SELECT queries never read through data from the
  external database
  3. Even if we implemented a CacheStore we have to load all data into
  Ignite to use SELECT
  4. No referential integrity enforcement
   3. It is incredibly easy to corrupt the data in Ignite persistence.
   We've gotten better due to operational experience but upgrades (in k8s)
   still on occasion lead to one or two nodes being corrupt when their pod was
   stopped

I'll stop there but the point is, after 3yrs in production the team feels
like they're always running up against a wall and that Ignite has created
that wall.

My goal in writing this is to find out more about why the limitation around
CacheStore exists, how does Ignite persistence achieve partially caching
data in memory and pulling from disk if the data is not in memory and why
can't that apply to a CacheStore as well?

What would it take to make it so that Ignite's SQL operations could be
pushed down to a CacheStore implementation?
Ignite's a relatively large code base, so hints about which
classes/interfaces to investigate if we're looking to replace Ignite
persistence would be incredibly useful. My idea at the moment is to have
Ignite as the in-memory SQL layer with a SQL MPP providing persistence.

To me right now the path forward is for us to put the work into removing
these Ignite limitations if possible. We have a mixture of on-premise
clients for our product as well as a multi-tenant SaaS version - some of
these on-prem clients depend on Ignite's in-memory capabilities and so we
can't easily take this away.

FYI it doesn't have to be CacheStore, I realise this just inherits the
JCache interface just generally can something like CacheStore be
implemented to replace the integration that Ignite persistence provides?

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) 


https://hypi.io