Re: Best practices on how to approach data centre aware affinity

2021-08-05 Thread Courtney Robinson
Hi Alex,
Thanks for the reply. I'm glad I asked before the team went any further.
So we can achieve this with the built in affinity function and the backup
filter. The real complexity is going to be in migrating our existing caches.

So to clarify the steps involved here are

   1. because Ignite registers all env. vars as node attributes we can set
   e.g. NODE_DC= as an environment var in each k8s
   cluster
   2. Then set the backup filter's constructor-arg.value to be NODE_DC.
   This will tell Ignite that two backups cannot be placed on any two nodes
   with the same NODE_DC value - correct?
   3. When we call create table, we must set template=myTemplateName
   4. Before creating any tables, myTemplateName must be created and must
   include the backup filter with NODE_DC

Have I got that right?

If so, it seems simple enough. Now the real challenge is where you said
the cache has to be re-created.

I can't see how we do this without major down time, we have functionality
in place that allows customers to effectively do a "copy from table A to B
and then delete A" but it will be impossible to get all of them to do this
any time soon.

Has anyone else had to do something similar, how is the community generally
doing migrations like this?

Side note: The only thing that comes to mind is that we will need to build
a virtual catalog that we maintain so that there isn't a one to one mapping
between customer tables and the actual Ignite table name.
So if a table is currently called A and we add a virtual catalog then we
keep a mapping that says when the user wants to call "A" it should really
go to table "A_v2" or something. This comes with its own challenge and a
massive testing overhead.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) 


https://hypi.io


On Thu, Aug 5, 2021 at 11:43 AM Alex Plehanov 
wrote:

> Hello,
>
> You can create your own cache templates with the affinity function you
> require (currently you use a predefined "partitioned" template, which only
> sets cache mode to "PARTITIONED"). See [1] for more information about cache
> templates.
>
> > Is this the right approach
> > How do we handle existing data, changing the affinity function will
> cause Ignite to not be able to find existing data right?
> You can't change cache configuration after cache creation. In your example
> these changes will be just ignored. The only way to change cache
> configuration - is to create the new cache and migrate data.
>
> > How would you recommend implementing the affinity function to be aware
> of the data centre?
> It's better to use the standard affinity function with a backup filter for
> such cases. There is one shipped with Ignite (see [2]).
>
> [1]:
> https://ignite.apache.org/docs/latest/configuring-caches/configuration-overview#cache-templates
> [2]:
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.html
>
> чт, 5 авг. 2021 г. в 09:40, Courtney Robinson :
>
>> Hi all,
>> Our growth with Ignite continues and as we enter the next phase, we need
>> to support multi-cluster deployments for our platform.
>> We deploy Ignite and the rest of our stack in Kubernetes and we're in the
>> early stages of designing what a multi-region deployment should look like.
>> We are 90% SQL based when using Ignite, the other 10% includes Ignite
>> messaging, Queues and compute.
>>
>> In our case we have thousands of tables
>>
>> CREATE TABLE IF NOT EXISTS Person (
>>   id int,
>>   city_id int,
>>   name varchar,
>>   company_id varchar,
>>   PRIMARY KEY (id, city_id)) WITH "template=...";
>>
>> In our case, most tables use a template that looks like this:
>>
>>
>> partitioned,backups=2,data_region=hypi,cache_group=hypi,write_synchronization_mode=primary_sync,affinity_key=instance_id,atomicity=ATOMIC,cache_name=Person,key_type=PersonKey,value_type=PersonValue
>>
>> I'm aware of affinity co-location (
>> https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation)
>> and in the past when we used the key value APIs more than SQL we also used
>> custom affinity a function to control placement.
>>
>> What I don't know is how to best do this with SQL defined caches.
>> We will have at least 3 Kubernetes clusters, each in a different data
>> centre, let's say EU_WEST, EU_EAST, CAN0
>>
>> Previously we provided environment variables that our custom affinity
>> function would use and we're thinking of providing the data centre name
>> this way.
>>
>> We have 2 backups in all cases + the primary and so we want the primary
>> in one DC and each backup to be in a different DC.
>>
>> There is no syntax in the SQL template that we could find to enables
>> specifying a custom affinity function.
>> Our instance_id column currently used has no common prefix or anything to
>> associate with a DC.
>>
>> We're thinking of getting the cache for each table and then 

Re: Best practices on how to approach data centre aware affinity

2021-08-05 Thread Alex Plehanov
Hello,

You can create your own cache templates with the affinity function you
require (currently you use a predefined "partitioned" template, which only
sets cache mode to "PARTITIONED"). See [1] for more information about cache
templates.

> Is this the right approach
> How do we handle existing data, changing the affinity function will cause
Ignite to not be able to find existing data right?
You can't change cache configuration after cache creation. In your example
these changes will be just ignored. The only way to change cache
configuration - is to create the new cache and migrate data.

> How would you recommend implementing the affinity function to be aware of
the data centre?
It's better to use the standard affinity function with a backup filter for
such cases. There is one shipped with Ignite (see [2]).

[1]:
https://ignite.apache.org/docs/latest/configuring-caches/configuration-overview#cache-templates
[2]:
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.html

чт, 5 авг. 2021 г. в 09:40, Courtney Robinson :

> Hi all,
> Our growth with Ignite continues and as we enter the next phase, we need
> to support multi-cluster deployments for our platform.
> We deploy Ignite and the rest of our stack in Kubernetes and we're in the
> early stages of designing what a multi-region deployment should look like.
> We are 90% SQL based when using Ignite, the other 10% includes Ignite
> messaging, Queues and compute.
>
> In our case we have thousands of tables
>
> CREATE TABLE IF NOT EXISTS Person (
>   id int,
>   city_id int,
>   name varchar,
>   company_id varchar,
>   PRIMARY KEY (id, city_id)) WITH "template=...";
>
> In our case, most tables use a template that looks like this:
>
>
> partitioned,backups=2,data_region=hypi,cache_group=hypi,write_synchronization_mode=primary_sync,affinity_key=instance_id,atomicity=ATOMIC,cache_name=Person,key_type=PersonKey,value_type=PersonValue
>
> I'm aware of affinity co-location (
> https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation)
> and in the past when we used the key value APIs more than SQL we also used
> custom affinity a function to control placement.
>
> What I don't know is how to best do this with SQL defined caches.
> We will have at least 3 Kubernetes clusters, each in a different data
> centre, let's say EU_WEST, EU_EAST, CAN0
>
> Previously we provided environment variables that our custom affinity
> function would use and we're thinking of providing the data centre name
> this way.
>
> We have 2 backups in all cases + the primary and so we want the primary in
> one DC and each backup to be in a different DC.
>
> There is no syntax in the SQL template that we could find to enables
> specifying a custom affinity function.
> Our instance_id column currently used has no common prefix or anything to
> associate with a DC.
>
> We're thinking of getting the cache for each table and then setting the
> affinity function to replace the default RendevousAffinityFunction the way
> we did before we switched to SQL.
> Something like this:
>
> repo.ctx.ignite.cache("Person").getConfiguration(org.apache.ignite.configuration.CacheConfiguration)
> .setAffinity(new org.apache.ignite.cache.affinity.AffinityFunction() {
> ...
> })
>
>
> There are a few things unclear about this:
>
>1. Is this the right approach?
>2. How do we handle existing data, changing the affinity function will
>cause Ignite to not be able to find existing data right?
>3. How would you recommend implementing the affinity function to be
>aware of the data centre?
>4. Are there any other caveats we need to be thinking about?
>
> There is a lot of existing data, we want to try to avoid a full copy/move
> to new tables if possible, that will prove to be very difficult in
> production.
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) 
>
> 
> https://hypi.io
>


Best practices on how to approach data centre aware affinity

2021-08-05 Thread Courtney Robinson
Hi all,
Our growth with Ignite continues and as we enter the next phase, we need to
support multi-cluster deployments for our platform.
We deploy Ignite and the rest of our stack in Kubernetes and we're in the
early stages of designing what a multi-region deployment should look like.
We are 90% SQL based when using Ignite, the other 10% includes Ignite
messaging, Queues and compute.

In our case we have thousands of tables

CREATE TABLE IF NOT EXISTS Person (
  id int,
  city_id int,
  name varchar,
  company_id varchar,
  PRIMARY KEY (id, city_id)) WITH "template=...";

In our case, most tables use a template that looks like this:

partitioned,backups=2,data_region=hypi,cache_group=hypi,write_synchronization_mode=primary_sync,affinity_key=instance_id,atomicity=ATOMIC,cache_name=Person,key_type=PersonKey,value_type=PersonValue

I'm aware of affinity co-location (
https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation)
and in the past when we used the key value APIs more than SQL we also used
custom affinity a function to control placement.

What I don't know is how to best do this with SQL defined caches.
We will have at least 3 Kubernetes clusters, each in a different data
centre, let's say EU_WEST, EU_EAST, CAN0

Previously we provided environment variables that our custom affinity
function would use and we're thinking of providing the data centre name
this way.

We have 2 backups in all cases + the primary and so we want the primary in
one DC and each backup to be in a different DC.

There is no syntax in the SQL template that we could find to enables
specifying a custom affinity function.
Our instance_id column currently used has no common prefix or anything to
associate with a DC.

We're thinking of getting the cache for each table and then setting the
affinity function to replace the default RendevousAffinityFunction the way
we did before we switched to SQL.
Something like this:

repo.ctx.ignite.cache("Person").getConfiguration(org.apache.ignite.configuration.CacheConfiguration)
.setAffinity(new org.apache.ignite.cache.affinity.AffinityFunction() {
...
})


There are a few things unclear about this:

   1. Is this the right approach?
   2. How do we handle existing data, changing the affinity function will
   cause Ignite to not be able to find existing data right?
   3. How would you recommend implementing the affinity function to be
   aware of the data centre?
   4. Are there any other caveats we need to be thinking about?

There is a lot of existing data, we want to try to avoid a full copy/move
to new tables if possible, that will prove to be very difficult in
production.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) 


https://hypi.io


Issue while loading CSV data to ignite table

2021-08-05 Thread Karthik Nandagiri
Trying to Load CSV data containing string and numeric format to Ignite. But
the data loading is failing. Need help to understand the issue to resolve

I am evaluating Ignite and trying to load a csv data to Apache ignite. I
have created a table in Ignite

jdbc:ignite:thin://127.0.0.1/> create table if not exists
SAMPLE_DATA_PK(SID varchar(30),id_status varchar(50), active varchar,
count_opening int,count_updated int,ID_caller varchar(50),opened_time
varchar(50),created_at varchar(50),type_contact varchar, location
varchar,support_incharge varchar,pk varchar(10) primary key);

Now trying to load data to this table with command

copy from '/home/kkn/data/sample_data_pk.csv' into
SAMPLE_DATA_PK(SID,ID_status,active,count_opening,count_updated,ID_caller,opened_time,created_at,type_contact,location,support_incharge,pk)
format csv;

But the data loading is failing with this error

Error: Server error: class
org.apache.ignite.internal.processors.query.IgniteSQLException: Value
conversion failed [column=COUNT_OPENING, from=java.lang.String,
to=java.lang.Integer] (state=5,code=1) java.sql.SQLException: Server
error: class
org.apache.ignite.internal.processors.query.IgniteSQLException: Value
conversion failed [column=COUNT_OPENING, from=java.lang.String,
to=java.lang.Integer] at
org.apache.ignite.internal.jdbc.thin.JdbcThinConnection.sendRequest(JdbcThinConnection.java:1009)
at
org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.sendFile(JdbcThinStatement.java:336)
at
org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.execute0(JdbcThinStatement.java:243)
at
org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.execute(JdbcThinStatement.java:560)
at sqlline.Commands.executeSingleQuery(Commands.java:1054) at
sqlline.Commands.execute(Commands.java:1003) at
sqlline.Commands.sql(Commands.java:967) at
sqlline.SqlLine.dispatch(SqlLine.java:734) at
sqlline.SqlLine.begin(SqlLine.java:541) at
sqlline.SqlLine.start(SqlLine.java:267) at
sqlline.SqlLine.main(SqlLine.java:206)

Below is the sample data i am trying to load
SID,ID_status,active,count_opening,count_updated,ID_caller,opened_time,created_at,type_contact,location,support_incharge,pk
INC045,New,true,1000,0,Caller2403,29-02-2016 01:16,29-02-2016
01:23,Phone,Location143,,1
INC045,Resolved,true,0,3,Caller2403,29-02-2016 01:16,29-02-2016
01:23,Phone,Location143,,2
INC045,Closed,false,0,1,Caller2403,29-02-2016 01:16,29-02-2016
01:23,Phone,Location143,,3 INC047,Active,true,0,1,Caller2403,29-02-2016
04:40,29-02-2016 04:57,Phone,Location165,,4
INC047,Active,true,0,2,Caller2403,29-02-2016 04:40,29-02-2016
04:57,Phone,Location165,,5
INC047,Active,true,0,489,Caller2403,29-02-2016 04:40,29-02-2016
04:57,Phone,Location165,,6 INC047,Active,true,0,5,Caller2403,29-02-2016
04:40,29-02-2016 04:57,Phone,Location165,,7
INC047,AwaitingUserInfo,true,0,6,Caller2403,29-02-2016 04:40,29-02-2016
04:57,Phone,Location165,,8
INC047,Closed,false,0,8,Caller2403,29-02-2016 04:40,29-02-2016
04:57,Phone,Location165,,9 INC057,New,true,0,0,Caller4416,29-02-2016
06:10,,Phone,Location204,,10

Need help to understand how to figure out what is the issue and resolve it



Thank you

Regards

Karthik