Re: Test suit failures after upgrade to 2.13 (because IN no longer accepts a Java array)

2022-05-17 Thread Courtney Robinson
Hey,
I just checked and it doesn't seem as if any of the test cases use arrays
with more than 1 element in them.
It also gives the wrong results if I use an array with multiple values.


On Tue, 17 May 2022 at 09:56, Николай Ижиков  wrote:

> Hello, Courtney.
>
> I’m able to reproduce your issue [1]
>
> Can you, please, confirm - Do you have use-case when memberIds contains
> more then one element?
> Is it return correct results?
>
> Before 2.11 SQL query with IN clause and array argument executed and
> return some results.
> But, in my reproducer results are wrong when I pass more then one element
> array.
>
> Anyway, I will continue investigation of the issue [1]
>
> // This work as expected.assertEquals(1, sql(
> "SELECT IID FROM T1 WHERE ID IN (?) AND IID = ?",
> new Object[] {Arrays.asList("1").toArray(), "1"}
> ).size());
>
> // And this works OK.assertEquals(2, sql(
> "SELECT IID FROM T1 WHERE ID IN ('1', '4') AND IID = ?",
> new Object[] {"1"}
> ).size());
>
> // Executed without exception but return no results.
>
>   assertEquals(2, sql(
> "SELECT IID FROM T1 WHERE ID IN (?) AND IID = ?",
> new Object[] {Arrays.asList("1", "4").toArray(), "1"}).size()
>   );
>
>
> [1] https://issues.apache.org/jira/browse/IGNITE-16991
>
>
>
> 17 мая 2022 г., в 08:36, Courtney Robinson 
> написал(а):
>
> Hey Николай,
>
> Java code:
>
> private FieldsQueryCursor> doQuery(boolean isMutation, String sql, 
> Object... args) {
>   var timer = isMutation ? rawMutTimer : rawQryTimer;
>   return timer.record(() -> {
> try {
>   var query = new SqlFieldsQuery(sql)
> .setTimeout(5, SECONDS)
> //.setDistributedJoins(true)
> .setSchema(PUBLIC_SCHEMA_NAME);
>   if (args != null && args.length > 0) {
> query.setArgs(args);
>   }
>   var res = cache.query(query);
>   if (isMutation) {
> ctx.backup(sql, args);
>   }
>   return res;
> } catch (Exception e) {
>   this.rawDBErrCntr.count();
>   if (e.getCause() instanceof CacheStoppedException) {
> log.error("Ignite cache stopped unexpectedly. No further queries are 
> possible so must exit. Shutting down node");
> System.exit(-1);
>   }
>   if (e instanceof DBException) {
> throw e;
>   } else {
> throw new DBException("Unexpected error whilst executing database 
> query", e);
>   }
> }
>   });
> }
>
> The call that used to work is:
>
> var results = repo.query(false, "SELECT HYPI_INSTANCEID, COUNT(HYPI_ID) FROM 
> " +
>   TABLE_ACC + " WHERE HYPI_ID IN (?) AND HYPI_INSTANCEID=? GROUP BY 
> HYPI_INSTANCEID", memberIds.toArray(), instanceId);
>
>
> We had to change this to:
>
> List args = new ArrayList<>();
> String qs = memberIds.stream()
>   .peek(args::add)
>   .map(v -> "?")
>   .collect(Collectors.joining(","));
> args.add(instanceId);
> var results = repo.query(
>   false,
>   "SELECT HYPI_INSTANCEID, COUNT(HYPI_ID) FROM " +
> TABLE_ACC + " WHERE HYPI_ID IN (" + qs + ") AND HYPI_INSTANCEID=? GROUP 
> BY HYPI_INSTANCEID",
>   args.toArray()
> );
>
>
> memberIds is a List.
> repo.query is the public method that will eventually call doQuery after
> some internal stuff.
>
> The table here referred to as TABLE_ACC is
> *CREATE* *TABLE* PUBLIC.ACCOUNT (
> VERIFIED *BOOLEAN*,
> ENABLED *BOOLEAN*,
> HYPI_INSTANCEID *VARCHAR*,
> HYPI_ID *VARCHAR*,
> USERNAME *VARCHAR*,
> *CONSTRAINT* PK_PUBLIC_HYPI_01E8NPNFADNKECH7BR0K5FDE2C_ACCOUNT *PRIMARY*
> *KEY* (HYPI_INSTANCEID,HYPI_ID)
> );
>
> I removed most fields as they're not necessary to reproduce
>
>
> On Fri, 13 May 2022 at 15:24, Николай Ижиков  wrote:
>
>> Hello, Courtney.
>>
>> Can, you, please, send SQL table definition and example of query (java
>> code and SQL) that worked on 2.8 and start failing on 2.13.
>>
>> > This seems like a regression, was this intentional?
>>
>> Looks like a bug to me.
>> Details of your schema and query can help in further investigation.
>>
>>
>> 10 мая 2022 г., в 10:14, Courtney Robinson 
>> написал(а):
>>
>> Hi all,
>>
>> We're looking to do a major upgrade from 2.8.0 to 2.13.0
>> After the initial upgrade our test suite sta

Re: Test suit failures after upgrade to 2.13 (because IN no longer accepts a Java array)

2022-05-16 Thread Courtney Robinson
Hey Николай,

Java code:

private FieldsQueryCursor> doQuery(boolean isMutation, String
sql, Object... args) {
  var timer = isMutation ? rawMutTimer : rawQryTimer;
  return timer.record(() -> {
try {
  var query = new SqlFieldsQuery(sql)
.setTimeout(5, SECONDS)
//.setDistributedJoins(true)
.setSchema(PUBLIC_SCHEMA_NAME);
  if (args != null && args.length > 0) {
query.setArgs(args);
  }
  var res = cache.query(query);
  if (isMutation) {
ctx.backup(sql, args);
  }
  return res;
} catch (Exception e) {
  this.rawDBErrCntr.count();
  if (e.getCause() instanceof CacheStoppedException) {
log.error("Ignite cache stopped unexpectedly. No further
queries are possible so must exit. Shutting down node");
System.exit(-1);
  }
  if (e instanceof DBException) {
throw e;
  } else {
throw new DBException("Unexpected error whilst executing
database query", e);
  }
}
  });
}

The call that used to work is:

var results = repo.query(false, "SELECT HYPI_INSTANCEID, COUNT(HYPI_ID) FROM " +
  TABLE_ACC + " WHERE HYPI_ID IN (?) AND HYPI_INSTANCEID=? GROUP BY
HYPI_INSTANCEID", memberIds.toArray(), instanceId);


We had to change this to:

List args = new ArrayList<>();
String qs = memberIds.stream()
  .peek(args::add)
  .map(v -> "?")
  .collect(Collectors.joining(","));
args.add(instanceId);
var results = repo.query(
  false,
  "SELECT HYPI_INSTANCEID, COUNT(HYPI_ID) FROM " +
TABLE_ACC + " WHERE HYPI_ID IN (" + qs + ") AND HYPI_INSTANCEID=?
GROUP BY HYPI_INSTANCEID",
  args.toArray()
);


memberIds is a List.
repo.query is the public method that will eventually call doQuery after
some internal stuff.

The table here referred to as TABLE_ACC is

*CREATE* *TABLE* PUBLIC.ACCOUNT (

VERIFIED *BOOLEAN*,

ENABLED *BOOLEAN*,

HYPI_INSTANCEID *VARCHAR*,

HYPI_ID *VARCHAR*,

USERNAME *VARCHAR*,

*CONSTRAINT* PK_PUBLIC_HYPI_01E8NPNFADNKECH7BR0K5FDE2C_ACCOUNT *PRIMARY*
*KEY* (HYPI_INSTANCEID,HYPI_ID)

);

I removed most fields as they're not necessary to reproduce


On Fri, 13 May 2022 at 15:24, Николай Ижиков  wrote:

> Hello, Courtney.
>
> Can, you, please, send SQL table definition and example of query (java
> code and SQL) that worked on 2.8 and start failing on 2.13.
>
> > This seems like a regression, was this intentional?
>
> Looks like a bug to me.
> Details of your schema and query can help in further investigation.
>
>
> 10 мая 2022 г., в 10:14, Courtney Robinson 
> написал(а):
>
> Hi all,
>
> We're looking to do a major upgrade from 2.8.0 to 2.13.0
> After the initial upgrade our test suite started failing (about 15% of
> tests now fail).
> No other change has been made other than the Ignite version number.
>
> org.apache.ignite.internal.processors.query.IgniteSQLException: General
>> error: "class org.apache.ignite.IgniteException: Failed to wrap
>> value[type=17, value=[Ljava.lang.Object;@667eb78]"; SQL statement:
>> SELECT HYPI_INSTANCEID, COUNT(HYPI_ID) FROM
>> hypi_01E8NPNFADNKECH7BR0K5FDE2C_Account WHERE HYPI_ID IN (?) AND
>> HYPI_INSTANCEID=? GROUP BY HYPI_INSTANCEID [5-197]
>> at
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQuery(IgniteH2Indexing.java:898)
>> at
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQueryWithTimer(IgniteH2Indexing.java:985)
>> at
>> org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:471)
>> at
>> org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:284)
>> at
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2219)
>> at
>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor$1.applyx(GridReduceQueryExecutor.java:157)
>> at
>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor$1.applyx(GridReduceQueryExecutor.java:152)
>> at
>> org.apache.ignite.internal.util.lang.IgniteInClosure2X.apply(IgniteInClosure2X.java:38)
>> at
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.send(IgniteH2Indexing.java:2344)
>> at
>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.send(GridReduceQueryExecutor.java:1201)
>> at
>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:463)
>> at
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$7.iterator(IgniteH2Indexing.java:1846)
>> at
>>

Test suit failures after upgrade to 2.13 (because IN no longer accepts a Java array)

2022-05-10 Thread Courtney Robinson
Hi all,

We're looking to do a major upgrade from 2.8.0 to 2.13.0
After the initial upgrade our test suite started failing (about 15% of
tests now fail).
No other change has been made other than the Ignite version number.

org.apache.ignite.internal.processors.query.IgniteSQLException: General
> error: "class org.apache.ignite.IgniteException: Failed to wrap
> value[type=17, value=[Ljava.lang.Object;@667eb78]"; SQL statement:
> SELECT HYPI_INSTANCEID, COUNT(HYPI_ID) FROM
> hypi_01E8NPNFADNKECH7BR0K5FDE2C_Account WHERE HYPI_ID IN (?) AND
> HYPI_INSTANCEID=? GROUP BY HYPI_INSTANCEID [5-197]
> at
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQuery(IgniteH2Indexing.java:898)
> at
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQueryWithTimer(IgniteH2Indexing.java:985)
> at
> org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:471)
> at
> org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:284)
> at
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2219)
> at
> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor$1.applyx(GridReduceQueryExecutor.java:157)
> at
> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor$1.applyx(GridReduceQueryExecutor.java:152)
> at
> org.apache.ignite.internal.util.lang.IgniteInClosure2X.apply(IgniteInClosure2X.java:38)
> at
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.send(IgniteH2Indexing.java:2344)
> at
> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.send(GridReduceQueryExecutor.java:1201)
> at
> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:463)
> at
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$7.iterator(IgniteH2Indexing.java:1846)
> at
> org.apache.ignite.internal.processors.cache.QueryCursorImpl.iter(QueryCursorImpl.java:102)
> at
> org.apache.ignite.internal.processors.cache.query.RegisteredQueryCursor.iter(RegisteredQueryCursor.java:91)
> at
> org.apache.ignite.internal.processors.cache.QueryCursorImpl.getAll(QueryCursorImpl.java:124)
>

Investigating this I found that IndexKeyFactory has since been added in a
release after 2.8.0.
It is the source of the exception

> throw new IgniteException("Failed to wrap value[type=" + keyType + ",
> value=" + o + "]");
>

The key type 17 is ARRAY, defined in `org.h2.value.Value` (ARRAY enum value
line 137)
Looking further I can see that IndexKeyFactory registers:

> IndexKeyFactory.register(IndexKeyTypes.DATE, DateIndexKey::new);
> IndexKeyFactory.register(IndexKeyTypes.TIME, TimeIndexKey::new);
> IndexKeyFactory.register(IndexKeyTypes.TIMESTAMP,
> TimestampIndexKey::new);--


And these are the only additional key types registered anywhere in the
2.13.0 code base.

Looking further, I found that the problem is wherever we use the `IN` clause
In 2.8.0 we had a query like this:

> DELETE FROM permission_cause WHERE instanceId = ? AND policyId = ? AND
> rowId IN (?) AND accountId = ?

And we would pass in a Java array as the 3rd argument

> instanceId, policyId, toDelete.toArray(), accountId


This would work fine with the toDelete.toArray()
Now, we have to change it and expand from IN(?) to IN(?,?,?) putting in as
many ? as there are entries in the array and pass in the values
individually.

This seems like a regression, was this intentional?

Best,
Courtney


Re: Ignite JOIN fails to return results using WHERE clause

2022-02-11 Thread Courtney Robinson
Hi Maksim,

Interesting, thanks for your reply.
Okay, I misunderstood (I also thought being on a single node that it didn't
matter).

Is it the case that as long as the affinity key is in the join predicate
that it would be a colocated JOIN (I'm concerned about the impact of
setDistributedJoins(true))?
Or is it the case that if you're joining on partitioned tables, you must do
so with ONLY the affinity key in the join predicate?

SELECT tbl.releaseId, tbl.name FROM T0 tbl

 INNER JOIN T1 col ON tbl.releaseId = 


In the previous tables, T1 does not have the releaseId as a column so does
that mean it is impossible to do a co-located JOIN with this setup?

If we modify T1 so that it also has releaseId and we make releaseId the
affinity key of T1 will both of these work?

SELECT tbl.releaseId, tbl.name FROM T0 tbl

 INNER JOIN T1 col ON tbl.releaseId = col.releaseId


AND

SELECT tbl.releaseId, tbl.name FROM T0 tbl

 INNER JOIN T1 col ON tbl.releaseId = col.releaseId AND col.tableId = tbl.id
> AND col.x = y


In other words, if both tables share the same affinity key is it still a
collocated join if there are other filters in the join predicate?

If the answer to this yes, does it matter if the filters in the join
predicate are all = i.e. does it have to be an equi-join? or could the
predicate be

> ON tbl.releaseId = col.releaseId AND* col.tableId > tbl.id
> <http://tbl.id/>* AND *col.x >= y*
>

Thanks

On Fri, Feb 11, 2022 at 6:42 PM Maksim Timonin 
wrote:

> Hi Courtney,
>
> > I don't expect collocation issues to be in play here
>
> Do you check this doc:
> https://ignite.apache.org/docs/latest/SQL/distributed-joins ?
>
> It says: "A distributed join is a SQL statement with a join clause that
> combines two or more partitioned tables. If the tables are joined on the
> partitioning column (affinity key), the join is called a colocated join.
> Otherwise, it is called a non-colocated join"
>
> You definitely have a collocation issue due to non-collocated join: T0
> partitioned by "releaseId", T1 by "t0Id", and you make a join by columns
> that aren't affinity columns (id = tableId).
>
> You should specify the flag "SqlFieldsQuery.setDistributedJoins(true)" to
> make your join return correct results.
>
> Maksim
>
>
> On Fri, Feb 11, 2022 at 8:09 PM Courtney Robinson <
> courtney.robin...@crlog.info> wrote:
>
>>
>> I have a query like this:
>>
>> SELECT
>>> tbl.id AS tbl_id, tbl.releaseId AS tbl_releaseId, col.type AS col_type
>>> FROM T0 tbl
>>> INNER JOIN T1 col ON tbl.id = col.tableId
>>> *WHERE tbl.releaseId = ? AND tbl.name <http://tbl.name> = ?*
>>> LIMIT 100
>>>
>>
>> This returns no results so after investigating, I ended up changing it to
>> the below
>>
>> SELECT
>>> tbl.id AS tbl_id, tbl.releaseId AS tbl_releaseId, col.type AS col_type
>>> FROM *(SELECT * FROM T0 t WHERE t.releaseId = ? AND t.name
>>> <http://t.name> = ?) *tbl
>>> INNER JOIN T1 col ON tbl.id = col.tableId
>>> LIMIT 100
>>>
>>
>>  This returns the results expected.
>> Can anyone offer any insight into what is going wrong here?
>>
>> The tables here look like this (I removed some columns from the tables
>> and the query to help make it easier on the eyes to parse):
>>
>> CREATE TABLE IF NOT EXISTS T0
>>> (
>>>   idLONG,
>>>   releaseId VARCHAR,
>>>   name  VARCHAR,
>>>   PRIMARY KEY (releaseId, id)
>>> ) WITH "template=hypi_tpl,affinity_key=releaseId";
>>
>> CREATE INDEX IF NOT EXISTS VirtualTable_idx0 ON VirtualTable (releaseId,
>>> name);
>>>
>>> CREATE TABLE IF NOT EXISTS T1
>>> (
>>>   id  LONG,
>>>   t0Id LONG,
>>>   nameVARCHAR,
>>>   typeVARCHAR,
>>>   PRIMARY KEY (t0Id, id)
>>> ) WITH "template=hypi_tpl,affinity_key=t0Id";
>>>
>>
>> Note here it is a single node locally (so I don't expect collocation
>> issues to be in play here) - in development so not in a production cluster
>> yet.
>> Running Ignite 2.8.0
>>
>> This is not the first time we've had something like this but it's the
>> first time I've been able to reproduce it myself and consistently.
>>
>> Best,
>> Courtney
>>
>


Ignite JOIN fails to return results using WHERE clause

2022-02-11 Thread Courtney Robinson
I have a query like this:

SELECT
> tbl.id AS tbl_id, tbl.releaseId AS tbl_releaseId, col.type AS col_type
> FROM T0 tbl
> INNER JOIN T1 col ON tbl.id = col.tableId
> *WHERE tbl.releaseId = ? AND tbl.name  = ?*
> LIMIT 100
>

This returns no results so after investigating, I ended up changing it to
the below

SELECT
> tbl.id AS tbl_id, tbl.releaseId AS tbl_releaseId, col.type AS col_type
> FROM *(SELECT * FROM T0 t WHERE t.releaseId = ? AND t.name
>  = ?) *tbl
> INNER JOIN T1 col ON tbl.id = col.tableId
> LIMIT 100
>

 This returns the results expected.
Can anyone offer any insight into what is going wrong here?

The tables here look like this (I removed some columns from the tables and
the query to help make it easier on the eyes to parse):

CREATE TABLE IF NOT EXISTS T0
> (
>   idLONG,
>   releaseId VARCHAR,
>   name  VARCHAR,
>   PRIMARY KEY (releaseId, id)
> ) WITH "template=hypi_tpl,affinity_key=releaseId";

CREATE INDEX IF NOT EXISTS VirtualTable_idx0 ON VirtualTable (releaseId,
> name);
>
> CREATE TABLE IF NOT EXISTS T1
> (
>   id  LONG,
>   t0Id LONG,
>   nameVARCHAR,
>   typeVARCHAR,
>   PRIMARY KEY (t0Id, id)
> ) WITH "template=hypi_tpl,affinity_key=t0Id";
>

Note here it is a single node locally (so I don't expect collocation issues
to be in play here) - in development so not in a production cluster yet.
Running Ignite 2.8.0

This is not the first time we've had something like this but it's the first
time I've been able to reproduce it myself and consistently.

Best,
Courtney


Debugging long persistence recovery on restart

2021-09-15 Thread Courtney Robinson
Hey all,
We're trying to debug an issue in production where Ignite 2.8.1 is taking 1
hour *per node* to start.
This cluster has 3 nodes and caches/tables have 2 backups i.e. each node
has a replica so it takes 3 hours to restart all nodes.
The nodes get stuck after outputting:

> 2021-09-15 10:21:16.889  INFO [ArcOS,,,] 8 --- [   main]
> o.a.i.i.p.cache.GridCacheProcessor  [285] :  Started cache in recovery
> mode [name=*cache1*, id=-1556141001, group=hypi, dataRegionName=hypi,
> mode=PARTITIONED, atomicity=ATOMIC, backups=2, mvcc=false]
>
then after it logs a similar message about *cache2* and carries on as if
nothing happened.
The log is always in this order and it is always these two caches.
I believe this log happens after the cache is recovered so the problem is
with cache2.

There is only about 1GB in this cache2 that appears to have the problem.

How can we find out what's causing Ignite to take an hour each on this
cache?

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


Virtual Ignite meetup - Building Hypi's lowcode BaaS on Ignite

2021-08-31 Thread Courtney Robinson
Hi all,

If you've got time I'd like to invite you to attend our virtual meetup. 02
Sept, 5:00 pm BST

https://hopin.com/events/low-code-baas-platform-on-apache-ignite

In this talk we're going to take a deep dive into Hypi's journey designing
and scaling its low-code backend as a service platform with Apache Ignite
at its core.

The talk will look at the general architecture, playing nice with other
technologies and the future reactive-streams architecture Hypi's team started
work on with a view for being in production in 2022.

We'll look at why we're transitioning to reactive streams and how Ignite is
being used to accelerate other technologies in the stack. Just like Ignite
3, the next-gen of Hypi itself is using Apache Calcite as a data
virtualisation layer. The talk will explore a little of how this is being
done and finally will discuss some challenges and pitfalls to avoid.
Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


Re: Best practices on how to approach data centre aware affinity

2021-08-16 Thread Courtney Robinson
Hi Stephen,
We've been considering those points you've raised. The challenge with
having isolated clusters is how to deal with synchronisation issues. In one
cluster, Ignite will handle an offline node re-joining the cluster. If
there are multiple clusters we'd need to detect and replay changes from the
application side effectively duplicating a part of what Ignite's doing.

Did I miss anything and if not, how would you suggest handling this in the
case of multiple clusters - one in each data centre?

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Mon, Aug 16, 2021 at 10:19 AM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> A word of caution: you’re generally better replicating your data across
> clusters than stretching a single cluster across data centres. If the
> latency is very low it should work, but it could degrade your throughput
> and you need to be careful about split-brain and other networking issues.
>
> Regards,
> Stephen
>
> On 5 Aug 2021, at 15:24, Courtney Robinson 
> wrote:
>
> Hi Alex,
> Thanks for the reply. I'm glad I asked before the team went any further.
> So we can achieve this with the built in affinity function and the backup
> filter. The real complexity is going to be in migrating our existing caches.
>
> So to clarify the steps involved here are
>
>1. because Ignite registers all env. vars as node attributes we can
>set e.g. NODE_DC= as an environment var in each k8s
>cluster
>2. Then set the backup filter's constructor-arg.value to be NODE_DC.
>This will tell Ignite that two backups cannot be placed on any two nodes
>with the same NODE_DC value - correct?
>3. When we call create table, we must set template=myTemplateName
>4. Before creating any tables, myTemplateName must be created and must
>include the backup filter with NODE_DC
>
> Have I got that right?
>
> If so, it seems simple enough. Now the real challenge is where you said
> the cache has to be re-created.
>
> I can't see how we do this without major down time, we have functionality
> in place that allows customers to effectively do a "copy from table A to B
> and then delete A" but it will be impossible to get all of them to do this
> any time soon.
>
> Has anyone else had to do something similar, how is the community
> generally doing migrations like this?
>
> Side note: The only thing that comes to mind is that we will need to build
> a virtual catalog that we maintain so that there isn't a one to one mapping
> between customer tables and the actual Ignite table name.
> So if a table is currently called A and we add a virtual catalog then we
> keep a mapping that says when the user wants to call "A" it should really
> go to table "A_v2" or something. This comes with its own challenge and a
> massive testing overhead.
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io/>
>
> <https://hypi.io/>
> https://hypi.io
>
>
> On Thu, Aug 5, 2021 at 11:43 AM Alex Plehanov 
> wrote:
>
>> Hello,
>>
>> You can create your own cache templates with the affinity function you
>> require (currently you use a predefined "partitioned" template, which only
>> sets cache mode to "PARTITIONED"). See [1] for more information about cache
>> templates.
>>
>> > Is this the right approach
>> > How do we handle existing data, changing the affinity function will
>> cause Ignite to not be able to find existing data right?
>> You can't change cache configuration after cache creation. In your
>> example these changes will be just ignored. The only way to change cache
>> configuration - is to create the new cache and migrate data.
>>
>> > How would you recommend implementing the affinity function to be aware
>> of the data centre?
>> It's better to use the standard affinity function with a backup filter
>> for such cases. There is one shipped with Ignite (see [2]).
>>
>> [1]:
>> https://ignite.apache.org/docs/latest/configuring-caches/configuration-overview#cache-templates
>> [2]:
>> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.html
>>
>> чт, 5 авг. 2021 г. в 09:40, Courtney Robinson > >:
>>
>>> Hi all,
>>> Our growth with Ignite continues and as we enter the next phase, we need
>>> to support multi-cluster deployments for our platform.
>>> We deploy Ignite and the rest of our stack in Kubernetes 

Re: Best practices on how to approach data centre aware affinity

2021-08-05 Thread Courtney Robinson
Hi Alex,
Thanks for the reply. I'm glad I asked before the team went any further.
So we can achieve this with the built in affinity function and the backup
filter. The real complexity is going to be in migrating our existing caches.

So to clarify the steps involved here are

   1. because Ignite registers all env. vars as node attributes we can set
   e.g. NODE_DC= as an environment var in each k8s
   cluster
   2. Then set the backup filter's constructor-arg.value to be NODE_DC.
   This will tell Ignite that two backups cannot be placed on any two nodes
   with the same NODE_DC value - correct?
   3. When we call create table, we must set template=myTemplateName
   4. Before creating any tables, myTemplateName must be created and must
   include the backup filter with NODE_DC

Have I got that right?

If so, it seems simple enough. Now the real challenge is where you said
the cache has to be re-created.

I can't see how we do this without major down time, we have functionality
in place that allows customers to effectively do a "copy from table A to B
and then delete A" but it will be impossible to get all of them to do this
any time soon.

Has anyone else had to do something similar, how is the community generally
doing migrations like this?

Side note: The only thing that comes to mind is that we will need to build
a virtual catalog that we maintain so that there isn't a one to one mapping
between customer tables and the actual Ignite table name.
So if a table is currently called A and we add a virtual catalog then we
keep a mapping that says when the user wants to call "A" it should really
go to table "A_v2" or something. This comes with its own challenge and a
massive testing overhead.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Thu, Aug 5, 2021 at 11:43 AM Alex Plehanov 
wrote:

> Hello,
>
> You can create your own cache templates with the affinity function you
> require (currently you use a predefined "partitioned" template, which only
> sets cache mode to "PARTITIONED"). See [1] for more information about cache
> templates.
>
> > Is this the right approach
> > How do we handle existing data, changing the affinity function will
> cause Ignite to not be able to find existing data right?
> You can't change cache configuration after cache creation. In your example
> these changes will be just ignored. The only way to change cache
> configuration - is to create the new cache and migrate data.
>
> > How would you recommend implementing the affinity function to be aware
> of the data centre?
> It's better to use the standard affinity function with a backup filter for
> such cases. There is one shipped with Ignite (see [2]).
>
> [1]:
> https://ignite.apache.org/docs/latest/configuring-caches/configuration-overview#cache-templates
> [2]:
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.html
>
> чт, 5 авг. 2021 г. в 09:40, Courtney Robinson :
>
>> Hi all,
>> Our growth with Ignite continues and as we enter the next phase, we need
>> to support multi-cluster deployments for our platform.
>> We deploy Ignite and the rest of our stack in Kubernetes and we're in the
>> early stages of designing what a multi-region deployment should look like.
>> We are 90% SQL based when using Ignite, the other 10% includes Ignite
>> messaging, Queues and compute.
>>
>> In our case we have thousands of tables
>>
>> CREATE TABLE IF NOT EXISTS Person (
>>   id int,
>>   city_id int,
>>   name varchar,
>>   company_id varchar,
>>   PRIMARY KEY (id, city_id)) WITH "template=...";
>>
>> In our case, most tables use a template that looks like this:
>>
>>
>> partitioned,backups=2,data_region=hypi,cache_group=hypi,write_synchronization_mode=primary_sync,affinity_key=instance_id,atomicity=ATOMIC,cache_name=Person,key_type=PersonKey,value_type=PersonValue
>>
>> I'm aware of affinity co-location (
>> https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation)
>> and in the past when we used the key value APIs more than SQL we also used
>> custom affinity a function to control placement.
>>
>> What I don't know is how to best do this with SQL defined caches.
>> We will have at least 3 Kubernetes clusters, each in a different data
>> centre, let's say EU_WEST, EU_EAST, CAN0
>>
>> Previously we provided environment variables that our custom affinity
>> function would use and we're thinking of providing the data centre name
>> this way.
>>
>> We have 2 backups in all 

Best practices on how to approach data centre aware affinity

2021-08-05 Thread Courtney Robinson
Hi all,
Our growth with Ignite continues and as we enter the next phase, we need to
support multi-cluster deployments for our platform.
We deploy Ignite and the rest of our stack in Kubernetes and we're in the
early stages of designing what a multi-region deployment should look like.
We are 90% SQL based when using Ignite, the other 10% includes Ignite
messaging, Queues and compute.

In our case we have thousands of tables

CREATE TABLE IF NOT EXISTS Person (
  id int,
  city_id int,
  name varchar,
  company_id varchar,
  PRIMARY KEY (id, city_id)) WITH "template=...";

In our case, most tables use a template that looks like this:

partitioned,backups=2,data_region=hypi,cache_group=hypi,write_synchronization_mode=primary_sync,affinity_key=instance_id,atomicity=ATOMIC,cache_name=Person,key_type=PersonKey,value_type=PersonValue

I'm aware of affinity co-location (
https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation)
and in the past when we used the key value APIs more than SQL we also used
custom affinity a function to control placement.

What I don't know is how to best do this with SQL defined caches.
We will have at least 3 Kubernetes clusters, each in a different data
centre, let's say EU_WEST, EU_EAST, CAN0

Previously we provided environment variables that our custom affinity
function would use and we're thinking of providing the data centre name
this way.

We have 2 backups in all cases + the primary and so we want the primary in
one DC and each backup to be in a different DC.

There is no syntax in the SQL template that we could find to enables
specifying a custom affinity function.
Our instance_id column currently used has no common prefix or anything to
associate with a DC.

We're thinking of getting the cache for each table and then setting the
affinity function to replace the default RendevousAffinityFunction the way
we did before we switched to SQL.
Something like this:

repo.ctx.ignite.cache("Person").getConfiguration(org.apache.ignite.configuration.CacheConfiguration)
.setAffinity(new org.apache.ignite.cache.affinity.AffinityFunction() {
...
})


There are a few things unclear about this:

   1. Is this the right approach?
   2. How do we handle existing data, changing the affinity function will
   cause Ignite to not be able to find existing data right?
   3. How would you recommend implementing the affinity function to be
   aware of the data centre?
   4. Are there any other caveats we need to be thinking about?

There is a lot of existing data, we want to try to avoid a full copy/move
to new tables if possible, that will prove to be very difficult in
production.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


Re: Reliably duplicate SQL cache

2021-02-18 Thread Courtney Robinson
Hi Illya,
Thanks for responding.
That makes sense - I figured something like that but didn't know exactly
what.
Is it possible to get the existing key_type and value_type for tables?
The reason is because we have tables in production and they were not
created with key_type and value_type. We actually thought this only applied
when you use Java classes with annotations.

In the SYS table somewhere perhaps?


On Mon, Feb 15, 2021 at 11:19 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> Two tables have different name of an indexed binary type by default.
>
> Try
> repo.query("create table page1(a varchar, b varchar, c varchar, PRIMARY
> KEY (a, b)) WITH \"cache_name=page1, key_type=PageKey, value_type=Page\"")
> repo.query("create table page2(a varchar, b varchar, c varchar, PRIMARY
> KEY (a, b)) WITH \"cache_name=page2, key_type=PageKey, value_type=Page\"")
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> сб, 13 февр. 2021 г. в 19:32, Courtney Robinson  >:
>
>> Due to an issue I posted about in a previous thread
>> http://apache-ignite-users.70518.x6.nabble.com/Basic-SQL-pagination-returning-incorrect-results-td35443.html
>>
>> I've written a work around to use the streamer interface with a ScanQuery
>> to duplicate a cache.
>> Both are created from SQL using something like this:
>>
>> repo.query("create table page1(a varchar, b varchar, c varchar, PRIMARY KEY 
>> (a, b)) WITH \"cache_name=page1\"")
>> repo.query("create table page2(a varchar, b varchar, c varchar, PRIMARY KEY 
>> (a, b)) WITH \"cache_name=page2\"")
>>
>> The data is copied, printing the size shows 100 as expected in the test
>> but a SQL query on page2 table returns 0 rows.
>>
>> def copied = repo.query("SELECT * FROM page2 LIMIT 101")
>>
>> Gets nothing. The copy function used is below. I'm presuming I've missed
>> a step and the SQL index or something else is not being done. How should
>> this be written to duplicate all data from page1 into page2 table/cache.
>>
>> public void copy(String fromTableName, String toTableName) {
>>   var ignite = ctx.ignite;
>>   try (
>> IgniteCache from = ignite.cache(fromTableName);
>> IgniteCache to = ignite.cache(toTableName)
>>   ) {
>> if (from == null || to == null) {
>>   throw new IllegalArgumentException(format("Both from and to tables 
>> must exist. from: %s, to: %s", fromTableName, toTableName));
>> }
>> try (
>>   IgniteDataStreamer strmr = 
>> ignite.dataStreamer(toTableName/*from.getName()*/);
>>   var cursor = from.withKeepBinary().query(new ScanQuery<>())
>> ) {
>>   strmr.allowOverwrite(true);
>>   strmr.keepBinary(true);
>>   //strmr.receiver(StreamVisitor.from((cache, e) -> to.put(e.getKey(), 
>> e.getValue(;
>>   for (Cache.Entry e : cursor) {
>> strmr.addData(e.getKey(), e.getValue());
>>   }
>>   //strmr.flush();
>> }
>> log.info("Total in target cache {}", to.sizeLong(CachePeekMode.ALL));
>>   }
>> }
>>
>>
>>
>> Regards,
>> Courtney Robinson
>> Founder and CEO, Hypi
>> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>>
>> <https://hypi.io>
>> https://hypi.io
>>
>


Reliably duplicate SQL cache

2021-02-13 Thread Courtney Robinson
Due to an issue I posted about in a previous thread
http://apache-ignite-users.70518.x6.nabble.com/Basic-SQL-pagination-returning-incorrect-results-td35443.html

I've written a work around to use the streamer interface with a ScanQuery
to duplicate a cache.
Both are created from SQL using something like this:

repo.query("create table page1(a varchar, b varchar, c varchar,
PRIMARY KEY (a, b)) WITH \"cache_name=page1\"")
repo.query("create table page2(a varchar, b varchar, c varchar,
PRIMARY KEY (a, b)) WITH \"cache_name=page2\"")

The data is copied, printing the size shows 100 as expected in the test but
a SQL query on page2 table returns 0 rows.

def copied = repo.query("SELECT * FROM page2 LIMIT 101")

Gets nothing. The copy function used is below. I'm presuming I've missed a
step and the SQL index or something else is not being done. How should this
be written to duplicate all data from page1 into page2 table/cache.

public void copy(String fromTableName, String toTableName) {
  var ignite = ctx.ignite;
  try (
IgniteCache from = ignite.cache(fromTableName);
IgniteCache to = ignite.cache(toTableName)
  ) {
if (from == null || to == null) {
  throw new IllegalArgumentException(format("Both from and to
tables must exist. from: %s, to: %s", fromTableName, toTableName));
}
try (
  IgniteDataStreamer strmr =
ignite.dataStreamer(toTableName/*from.getName()*/);
  var cursor = from.withKeepBinary().query(new ScanQuery<>())
) {
  strmr.allowOverwrite(true);
  strmr.keepBinary(true);
  //strmr.receiver(StreamVisitor.from((cache, e) ->
to.put(e.getKey(), e.getValue(;
  for (Cache.Entry e : cursor) {
strmr.addData(e.getKey(), e.getValue());
  }
  //strmr.flush();
}
log.info("Total in target cache {}", to.sizeLong(CachePeekMode.ALL));
  }
}



Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


Re: Cannot create SQL index with INLINE_SIZE if ` is used

2021-02-06 Thread Courtney Robinson
Test case to reproduce (added to JdbcThinMultiStatementSelfTest):


@Test
public void testInlineSizeOptionWithBacktick() throws Exception {
execute(
"CREATE TABLE public.backticks (pk INT, id INT, k VARCHAR, v
VARCHAR, PRIMARY KEY (pk, id)); " +
"CREATE INDEX backticks_id_k_v ON public.backticks (`id`, `k`,
`v`) INLINE_SIZE 150; "
);
}


On Sat, Feb 6, 2021 at 2:38 PM Courtney Robinson 
wrote:

> If you use
>>
>> CREATE INDEX IF NOT EXISTS myIdx ON myTbl(myCol) INLINE_SIZE 200;
>
> then this works as expected. If the column names are escaped then it fails
> with a syntax error
> so
>
>> CREATE INDEX IF NOT EXISTS myIdx ON myTbl(`myCol`) INLINE_SIZE 200;
>
> will fail with a syntax error.
> I've found a workaround is to use " instead of `
>
>> CREATE INDEX IF NOT EXISTS myIdx ON myTbl("myCol") INLINE_SIZE 200;
>
>
> First found in 2.8.1 and verified to still fail on 2.9.1.
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>
> <https://hypi.io>
> https://hypi.io
>


Cannot create SQL index with INLINE_SIZE if ` is used

2021-02-06 Thread Courtney Robinson
If you use
>
> CREATE INDEX IF NOT EXISTS myIdx ON myTbl(myCol) INLINE_SIZE 200;

then this works as expected. If the column names are escaped then it fails
with a syntax error
so

> CREATE INDEX IF NOT EXISTS myIdx ON myTbl(`myCol`) INLINE_SIZE 200;

will fail with a syntax error.
I've found a workaround is to use " instead of `

> CREATE INDEX IF NOT EXISTS myIdx ON myTbl("myCol") INLINE_SIZE 200;


First found in 2.8.1 and verified to still fail on 2.9.1.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

2020-12-29 Thread Courtney Robinson
Hey Denis,

It's been a while. Hope you've been keeping well!
A meet the second week of Jan will be great. The agenda looks good too.

We've been watching the calcite SQL engine
<https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine>
wiki
and task list
<https://cwiki.apache.org/confluence/display/IGNITE/Apache+Calcite-powered+SQL+Engine+Roadmap>
but
I got the impression it wouldn't be that soon. May be worth getting
involved there to keep some of the calcite APIs exposed for us to be able
to tailor it because my ideal design would have Ignite in front of anything
we end up doing.

We'll sort out the call details closer to the time. Thanks Denis, speak to
you and Val in two weeks.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Wed, Dec 30, 2020 at 12:02 AM Denis Magda  wrote:

> Hi Courtney,
>
> Glad to hear from you! It's been a while since we met last time. It's
> truly disappointing seeing you struggle with Ignite that much. Thanks for
> being open and kicking the discussion off.
>
> How about three of us (you, Val, and I) meet the second week of January
> and talk out the issues? Then you can share a talk summary here with a
> broader community, if you wish. Between us, I've been personally
> championing the column-type-change feature for a while, and with this IEP
> of Ignite 3.0
> <https://cwiki.apache.org/confluence/display/IGNITE/IEP-54%3A+Schema-first+Approach>
> it should be doable.
>
> In the meantime, some items for our agenda and pointers:
>
>- How does your upgrade procedure look like? Ignite doesn't have a
>rolling-upgrade feature; thus, to upgrade from version A to B in a
>*consistent way* the one should stop the cluster.
>- re: the frustration about spending too much time on the
>administration and infrastructure-related activities. Are you considering
>any managed service options? As a developer, I understand this frustration.
>I was just lucky to stay away from a need to administer Postgres, Oracle,
>MySQL, or Ignite just because we either had an IT administrator or made use
>of a managed-service option. Not selling anything here, just know that all
>those administration routines are unavoidable. The thing with Ignite is
>that distributed nature makes things more complicated.
>- Ignite ML and SQL: in fact, you can already call Ignite ML models
>from a SQL query. I need to search for pointers. @Alexey Zinoviev
>, you can probably share some examples quicker.
>- Calcite and Ignite: in Q2 we're planning to release the
>Calcite-powered SQL engine in Ignite. That would be a dramatic improvement
>in our SQL capabilities. It should be possible to enable push-downs much
>more easily.
>
>
> -
> Denis
>
>
> On Mon, Dec 28, 2020 at 11:50 PM Courtney Robinson <
> courtney.robin...@hypi.io> wrote:
>
>> Hi Val,
>> Thanks. You're not missing anything and we have been using Ignite
>> persistence for years.
>> Among the reasons we want to switch to in-memory only is how easily we
>> seem to get corrupt nodes. I mentioned in the first email. We haven't had a
>> situation where upgrading corrupts more than 1 of our 3 replicas but we
>> genuinely fear upgrading sometimes as a result and is why we put so much
>> effort into our backup/restore solution.
>>
>> One of the arguments for disabling persistence is that other DBs seems to
>> have had longer to solve the issue. We've operated Postgres for longer than
>> Ignite persistence and have never had it corrupt its data on disk
>> (admittedly it's not distributed so that could play a role) but we've also
>> been running Cassandra for longer as well without ever having any data
>> corruption. All 3 in Kubernetes. With Ignite we've tried many things and a
>> few settings between Ignite rebalanceDelay (I forget which other ones) and
>> k8s readiness/liveness probes seems to have landed in a sweet spot that's
>> reduced how often it happens but if we happen to have any issues with the
>> k8s control plane, scheduling delays or network issues then the chances of
>> it skyrockets. It then requires manual intervention I believe the most
>> frequent issue is that a node will start and the topology diverged so the
>> cluster doesn't think it's a part of it, the most effective thing we've
>> found is deleting the data on that node and having it join as a new node
>> for re-balancing to kicking and push data back to it. I think we have a few
>> tickets somewhere with details.
>>
>> This is the primary motivator for wanting to replace Ignite persistence.

Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

2020-12-28 Thread Courtney Robinson
 nodes which would send back a list of
IDs we could get using the KV APIs. This worked well and provided superior
filtering capabilities, migrating to the SQL APIs meant we lost this as
there is no way we found to extend SQL to provide a custom function that;d
be resolved to a custom affinity run, importantly, using the results of
this function to reduce/filter the data coming back from SQL queries. Right
now we're hacking it with some SQL operators but the search quality doesn't
even compare to what we could do with the Lucene + KV APIs. This is
something we're actively investigating as the current approach is showing
its limits as we grow. We've also got on-premise customers set to make this
unusable in Q1 as their low-code app data volumes begin to exceed what the
query hacks can reasonably filter on within acceptable time.

Right now we're looking at two options: First is using the old approach we
had of affinity run over Lucene and using the results in an IN clause in a
subsequent query or forking Ignite to extend the SQL dialect and
integrating the lucene results in Ignite's map/filter. MVP here is doing
this with all supported lucene queries over a lucene index of a 1TB table.

This last point is more of a preference than anything. We're working to
bring the ease and flexibility of low-code to machine learning. Ignite's ML
capabilities are pretty broad, my issue is that it's completely Java based.
It would be amazing if Ignite had taken the SQL approach or had SQL
integration mapped to the Java APIs (I'm thinking of Apache Madlib).

I guess one of the frustrations is we're spending a lot of time at the DB
level instead of at the application level adding useful features for
customers. As a small startup we don't really have the bandwidth so it's a
constant battle between product features and infrastructure solutions.

We are not expecting anything to happen immediately with our Ignite setup,
it'll stay for a while but how we use it may change drastically over time.
We can't easily abandon the in-memory capabilities either. One suggestion
being considered (to have in place by end of 2021) is using Apache Calcite
as the SQL layer with an Ignite driver (mapping to in-memory Ignite). In
this setup we'd use Calcite to push down writes to Ignite and Greenplum
(MPP Postgres). We'd also push down queries to Ignite in our driver and
where it didn't match, we'd push to Greenplum. This would give us the
ability to drop in the custom SQL functions to map to Lucene or indeed
Elasticsearch. This has the added benefit of getting things from Calcite
like SQL window functions. We'd effectively be using Ignite in-memory to
accelerate Greenplum and ES queries through a consistent SQL API whilst
getting the benefits of the well established and battle tested Postgres
tools and ecosystem...and SQL base ML. I'd appreciate your thoughts here on
this around the suggested Ignite use. Almost everything can be pushed down
to Ignite from Calcite and anything Ignite doesn't support Calcite can
emulate.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Tue, Dec 29, 2020 at 4:01 AM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Hi Courtney,
>
> Thanks for your feedback!
>
> To cut the story short, Ignite implements page memory architecture. All
> data is split into fixed-sized pages and any page can reside either both in
> memory and on disk or on disk only. Since the in-memory layer and
> persistence layer are natively integrated - i.e. there is a unified
> mechanism for page lookup - Ignite storage engine always knows where a
> particular page is and can transparently load it into memory if needed.
> From this perspective, Ignite is actually very similar to how other
> databases work, although it has a much bigger focus on scalability and
> in-memory processing, richer compute capabilities, etc.
>
> With an external database it's much more complicated - generally speaking,
> there is no way of knowing which data needs to be loaded from the database
> into memory for the purpose of an arbitrary query. This is a fundamental
> limitation and quite frankly, I don't think there is any other
> technology that can do that. Not in a consistent and performant way at
> least. As a matter of fact, this limitation is exactly what used to be the
> primary driver behind the Native Persistent development in the first place.
>
> Is there any particular reason why you can't use Native Persistence
> instead of an external database? It sounds like that's what you need for
> your use case unless I'm missing something. Can you tell us a little bit
> more about your use case and requirements? I'm sure we can come up with a
> solution that would satisfy those requirements without a need for
> re-implementing the whole thing.
>
> Thanks,
&g

Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

2020-12-28 Thread Courtney Robinson
I know this was over the holiday so bumping. Can anyone provide any
pointers on where to start looking or anything else mentioned in the
previous email?
Thanks

On Sat, Dec 26, 2020 at 8:39 PM Courtney Robinson 
wrote:

> We've been using Ignite in production for almost 3 years and we love the
> platform but there are some increasingly frustrating points we run into.
> Before the holidays a few of our engineers started looking around and have
> now presented a serious case for migrating from Ignite. We would end up
> using at least 3 technologies they've identified to bridge the gap left by
> Ignite features but have presented good cases for why managing these
> individually would be a more flexible solution we could grow with.
>
> I am not keen on the idea as it presents a major refactor that would
> likely take 6 months to get to production but I understand and agree with
> the points they've made. I'm trying to find a middle ground as this seems
> like the nuclear option to me.
>
> Of the top of my head some things they've raised are:
>
>1. Lack of tooling
>   1. Inability to change a column type + no support in schema
>   migration tools that we've found (even deleting the column we can't 
> reuse
>   the name)
>   2. We had to build our own backup solution and even now backup has
>   landed in 2.9 we can't use it directly because we have implemented
>   relatively granular backup to be able to push to S3 compatible APIs 
> (Ceph
>   in our case) and restore partially to per hour granularity. Whilst we've
>   done it, it took some serious engineering effort and time. We considered
>   open sourcing it but it was done in a way that's tightly coupled to our
>   internal stack and APIs.
>   2. Inconsistency between various Ignite APIs.
>   1. Transactions on KV, none of SQL (or now in beta but work seem
>   seems to have paused?)
>   2. SQL limitations - SELECT queries never read through data from
>   the external database
>   3. Even if we implemented a CacheStore we have to load all data
>   into Ignite to use SELECT
>   4. No referential integrity enforcement
>3. It is incredibly easy to corrupt the data in Ignite persistence.
>We've gotten better due to operational experience but upgrades (in k8s)
>still on occasion lead to one or two nodes being corrupt when their pod was
>stopped
>
> I'll stop there but the point is, after 3yrs in production the team feels
> like they're always running up against a wall and that Ignite has created
> that wall.
>
> My goal in writing this is to find out more about why the limitation
> around CacheStore exists, how does Ignite persistence achieve partially
> caching data in memory and pulling from disk if the data is not in memory
> and why can't that apply to a CacheStore as well?
>
> What would it take to make it so that Ignite's SQL operations could be
> pushed down to a CacheStore implementation?
> Ignite's a relatively large code base, so hints about which
> classes/interfaces to investigate if we're looking to replace Ignite
> persistence would be incredibly useful. My idea at the moment is to have
> Ignite as the in-memory SQL layer with a SQL MPP providing persistence.
>
> To me right now the path forward is for us to put the work into removing
> these Ignite limitations if possible. We have a mixture of on-premise
> clients for our product as well as a multi-tenant SaaS version - some of
> these on-prem clients depend on Ignite's in-memory capabilities and so we
> can't easily take this away.
>
> FYI it doesn't have to be CacheStore, I realise this just inherits the
> JCache interface just generally can something like CacheStore be
> implemented to replace the integration that Ignite persistence provides?
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> https://hypi.io
>


Ignite in-memory + other SQL store without fully loading all data into Ignite

2020-12-26 Thread Courtney Robinson
We've been using Ignite in production for almost 3 years and we love the
platform but there are some increasingly frustrating points we run into.
Before the holidays a few of our engineers started looking around and have
now presented a serious case for migrating from Ignite. We would end up
using at least 3 technologies they've identified to bridge the gap left by
Ignite features but have presented good cases for why managing these
individually would be a more flexible solution we could grow with.

I am not keen on the idea as it presents a major refactor that would likely
take 6 months to get to production but I understand and agree with the
points they've made. I'm trying to find a middle ground as this seems like
the nuclear option to me.

Of the top of my head some things they've raised are:

   1. Lack of tooling
  1. Inability to change a column type + no support in schema migration
  tools that we've found (even deleting the column we can't reuse the name)
  2. We had to build our own backup solution and even now backup has
  landed in 2.9 we can't use it directly because we have implemented
  relatively granular backup to be able to push to S3 compatible APIs (Ceph
  in our case) and restore partially to per hour granularity. Whilst we've
  done it, it took some serious engineering effort and time. We considered
  open sourcing it but it was done in a way that's tightly coupled to our
  internal stack and APIs.
  2. Inconsistency between various Ignite APIs.
  1. Transactions on KV, none of SQL (or now in beta but work seem
  seems to have paused?)
  2. SQL limitations - SELECT queries never read through data from the
  external database
  3. Even if we implemented a CacheStore we have to load all data into
  Ignite to use SELECT
  4. No referential integrity enforcement
   3. It is incredibly easy to corrupt the data in Ignite persistence.
   We've gotten better due to operational experience but upgrades (in k8s)
   still on occasion lead to one or two nodes being corrupt when their pod was
   stopped

I'll stop there but the point is, after 3yrs in production the team feels
like they're always running up against a wall and that Ignite has created
that wall.

My goal in writing this is to find out more about why the limitation around
CacheStore exists, how does Ignite persistence achieve partially caching
data in memory and pulling from disk if the data is not in memory and why
can't that apply to a CacheStore as well?

What would it take to make it so that Ignite's SQL operations could be
pushed down to a CacheStore implementation?
Ignite's a relatively large code base, so hints about which
classes/interfaces to investigate if we're looking to replace Ignite
persistence would be incredibly useful. My idea at the moment is to have
Ignite as the in-memory SQL layer with a SQL MPP providing persistence.

To me right now the path forward is for us to put the work into removing
these Ignite limitations if possible. We have a mixture of on-premise
clients for our product as well as a multi-tenant SaaS version - some of
these on-prem clients depend on Ignite's in-memory capabilities and so we
can't easily take this away.

FYI it doesn't have to be CacheStore, I realise this just inherits the
JCache interface just generally can something like CacheStore be
implemented to replace the integration that Ignite persistence provides?

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


Re: Why does CacheBasedDataSet destroy the cache it is given

2020-08-19 Thread Courtney Robinson
Hey,
Just seen this reply.
We have Ignite persistence enabled. The caches/tables are the primary
source of the data. That's the use case.
If we build an ML model from the data in a cache, Ignite's behaviour of
deleting the cache means we'll have lost that data.
We were just lucky this showed up in tests before it got anywhere near
production data.

In our case, we're push data into a cache continually and rebuilding the
model periodically.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Mon, Aug 3, 2020 at 5:28 PM zaleslaw  wrote:

> Dear Courtney Robinson, let's discuss here the possible behaviour of this
> CacheBased Dataset closing.
>
> When designed this feature we think, that the all training parts and stuff
> should be deleted from Caches ad model should be serialized or exported
> somwhere.
>
> What is your use-case& Could you share some code or pseudo-code?
> How are you going to handle data after training?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Deploying Ignite in Docker

2020-07-06 Thread Courtney Robinson
Ignore the last email. Ivan your reply was sent to spam unfortunately so I
didn't see it.
I'll try setting the localhost property as you've suggested.

On Tue, Jun 30, 2020 at 6:25 AM Pavel Vinokurov 
wrote:

> Hi Courtney,
>
> Probably you need to specify a local ip address to bind to by setting
> IgniteConfiguration#setLocalhost().
>
> Thanks,
> Pavel
>
> пн, 29 июн. 2020 г. в 21:18, Courtney Robinson  >:
>
>> I've deployed Ignite bare-metal and in Kubernetes in test and production
>> but I'm now trying to deploy for a new project in a new cluster with Docker
>> and it is proving difficult. I can't figure out what port it is trying to
>> use or how to force it to use a specific one.
>>
>> Note that I've removed the "--net=host" option which the docs
>> <https://apacheignite.readme.io/docs/docker-deployment> mention since
>> that would expose the cluster on the node's public IPs.
>>
>> Note that if I use --net=host it works as expected.
>>
>> sudo docker run -d -it --name=ignite --restart=unless-stopped \
>> -e CONFIG_URI="/ignite/config.xml" \
>> -e OPTION_LIBS="ignite-rest-http,ignite-visor-console,ignite-web" \
>> -e JVM_OPTS="-Xms1g -Xmx10g -server -XX:+AggressiveOpts -XX:MaxPermSize=256m 
>> -Djava.net.preferIPv4Stack=true -DIGNITE_QUIET=false 
>> -Dcom.sun.management.jmxremote.port=49112" \
>> -e IGNITE_WORK_DIR=/persistence \
>> -v /mnt/vol0/ignite:/persistence \
>> -v /root/ignite/config.xml:/ignite/config.xml \
>> -p ${arr[$host]}:11211:11211 \
>> -p ${arr[$host]}:47100:47100 \
>> -p ${arr[$host]}:47500:47500 \
>> -p ${arr[$host]}:49112:49112 \
>> -p ${arr[$host]}:49100:49100 \
>> -p ${arr[$host]}:10800:10800 \
>> -p ${arr[$host]}:8080:8080 \
>> apacheignite/ignite:2.8.1
>>
>> This is in a loop so ${arr[$host]} will be replaced with one of the
>> private network's IPs.
>> You can see from the logs that the TCPDiscovery gets connections but
>> whatever happens next isn't successful so the nodes keep retrying and the
>> logs below just repeat forever:
>>
>> [12:46:42,934][INFO][tcp-disco-sock-reader-[20ec0423 
>> 172.17.0.1:35998]-#10][TcpDiscoverySpi]
>>> Finished serving remote node connection [rmtAddr=/172.17.0.1:35998,
>>> rmtPort=35998
>>> [12:46:45,637][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
>>> discovery accepted incoming connection [rmtAddr=/172.17.0.1,
>>> rmtPort=36016]
>>> [12:46:45,637][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
>>> discovery spawning a new thread for connection [rmtAddr=/172.17.0.1,
>>> rmtPort=36016]
>>> [12:46:45,637][INFO][tcp-disco-sock-reader-[]-#11][TcpDiscoverySpi]
>>> Started serving remote node connection [rmtAddr=/172.17.0.1:36016,
>>> rmtPort=36016]
>>> [12:46:45,639][INFO][tcp-disco-sock-reader-[20ec0423 
>>> 172.17.0.1:36016]-#11][TcpDiscoverySpi]
>>> Finished serving remote node connection [rmtAddr=/172.17.0.1:36016,
>>> rmtPort=36016
>>> [12:46:45,661][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
>>> discovery accepted incoming connection [rmtAddr=/10.131.60.224,
>>> rmtPort=40325]
>>> [12:46:45,661][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
>>> discovery spawning a new thread for connection [rmtAddr=/10.131.60.224,
>>> rmtPort=40325]
>>> [12:46:45,661][INFO][tcp-disco-sock-reader-[]-#12][TcpDiscoverySpi]
>>> Started serving remote node connection [rmtAddr=/10.131.60.224:40325,
>>> rmtPort=40325]
>>> [12:46:45,662][INFO][tcp-disco-sock-reader-[8433be36 
>>> 10.131.60.224:40325]-#12][TcpDiscoverySpi]
>>> Initialized connection with remote server node
>>> [nodeId=8433be36-2855-4ff3-a849-35c3ebb25545, rmtAddr=/
>>> 10.131.60.224:40325]
>>> [12:46:45,676][INFO][tcp-disco-sock-reader-[8433be36 
>>> 10.131.60.224:40325]-#12][TcpDiscoverySpi]
>>> Finished serving remote node connection [rmtAddr=/10.131.60.224:40325,
>>> rmtPort=40325
>>>
>>
>> The config.xml being used is below and the docker command is:
>>
>> What have I missed in this config? What port does it need that isn't
>> being set/opened?
>>
>> 
>> http://www.springframework.org/schema/beans;
>>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>>xsi:schemaLocation="
>> http://www.springframework.org/schema/beans
>> http://www.springframework.org/schema/beans/spring-beans.xsd;>
>>   &g

Re: Deploying Ignite in Docker

2020-07-06 Thread Courtney Robinson
Bumping this. Any suggestions on the below?
Not only is it undesirable to run the cluster on public IPs from a security
aspect but it incurs additional cost in the cloud env it is being deployed
in.

On Mon, Jun 29, 2020 at 7:18 PM Courtney Robinson 
wrote:

> I've deployed Ignite bare-metal and in Kubernetes in test and production
> but I'm now trying to deploy for a new project in a new cluster with Docker
> and it is proving difficult. I can't figure out what port it is trying to
> use or how to force it to use a specific one.
>
> Note that I've removed the "--net=host" option which the docs
> <https://apacheignite.readme.io/docs/docker-deployment> mention since
> that would expose the cluster on the node's public IPs.
>
> Note that if I use --net=host it works as expected.
>
> sudo docker run -d -it --name=ignite --restart=unless-stopped \
> -e CONFIG_URI="/ignite/config.xml" \
> -e OPTION_LIBS="ignite-rest-http,ignite-visor-console,ignite-web" \
> -e JVM_OPTS="-Xms1g -Xmx10g -server -XX:+AggressiveOpts -XX:MaxPermSize=256m 
> -Djava.net.preferIPv4Stack=true -DIGNITE_QUIET=false 
> -Dcom.sun.management.jmxremote.port=49112" \
> -e IGNITE_WORK_DIR=/persistence \
> -v /mnt/vol0/ignite:/persistence \
> -v /root/ignite/config.xml:/ignite/config.xml \
> -p ${arr[$host]}:11211:11211 \
> -p ${arr[$host]}:47100:47100 \
> -p ${arr[$host]}:47500:47500 \
> -p ${arr[$host]}:49112:49112 \
> -p ${arr[$host]}:49100:49100 \
> -p ${arr[$host]}:10800:10800 \
> -p ${arr[$host]}:8080:8080 \
> apacheignite/ignite:2.8.1
>
> This is in a loop so ${arr[$host]} will be replaced with one of the
> private network's IPs.
> You can see from the logs that the TCPDiscovery gets connections but
> whatever happens next isn't successful so the nodes keep retrying and the
> logs below just repeat forever:
>
> [12:46:42,934][INFO][tcp-disco-sock-reader-[20ec0423 
> 172.17.0.1:35998]-#10][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/172.17.0.1:35998,
>> rmtPort=35998
>> [12:46:45,637][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
>> discovery accepted incoming connection [rmtAddr=/172.17.0.1,
>> rmtPort=36016]
>> [12:46:45,637][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
>> discovery spawning a new thread for connection [rmtAddr=/172.17.0.1,
>> rmtPort=36016]
>> [12:46:45,637][INFO][tcp-disco-sock-reader-[]-#11][TcpDiscoverySpi]
>> Started serving remote node connection [rmtAddr=/172.17.0.1:36016,
>> rmtPort=36016]
>> [12:46:45,639][INFO][tcp-disco-sock-reader-[20ec0423 
>> 172.17.0.1:36016]-#11][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/172.17.0.1:36016,
>> rmtPort=36016
>> [12:46:45,661][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
>> discovery accepted incoming connection [rmtAddr=/10.131.60.224,
>> rmtPort=40325]
>> [12:46:45,661][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
>> discovery spawning a new thread for connection [rmtAddr=/10.131.60.224,
>> rmtPort=40325]
>> [12:46:45,661][INFO][tcp-disco-sock-reader-[]-#12][TcpDiscoverySpi]
>> Started serving remote node connection [rmtAddr=/10.131.60.224:40325,
>> rmtPort=40325]
>> [12:46:45,662][INFO][tcp-disco-sock-reader-[8433be36 
>> 10.131.60.224:40325]-#12][TcpDiscoverySpi]
>> Initialized connection with remote server node
>> [nodeId=8433be36-2855-4ff3-a849-35c3ebb25545, rmtAddr=/
>> 10.131.60.224:40325]
>> [12:46:45,676][INFO][tcp-disco-sock-reader-[8433be36 
>> 10.131.60.224:40325]-#12][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.131.60.224:40325,
>> rmtPort=40325
>>
>
> The config.xml being used is below and the docker command is:
>
> What have I missed in this config? What port does it need that isn't being
> set/opened?
>
> 
> http://www.springframework.org/schema/beans;
>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>xsi:schemaLocation="
> http://www.springframework.org/schema/beans
> http://www.springframework.org/schema/beans/spring-beans.xsd;>
>class="org.apache.ignite.configuration.IgniteConfiguration">
> 
> 
>   
> 
> 
>   
>   
>class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
> 
> 
>       
> 
> 10.131.53.147:47500
> 10.131.77.79:47500
> 10.131.60.224:47500
> 10.131.77.111:47500
> 10.131.77.93:47500
> 10.131.77.84:47500
>   
> 
>   
> 
>   
> 
> 
>class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
> 
>   
> 
>   
> 
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>
> <https://hypi.io>
> https://hypi.io
>


Re: SELECT values of each row within groups on a table with composite primary key

2020-07-06 Thread Courtney Robinson
Thanks for replying.
The stackoverflow question was answered.

SELECT a, b, c, cntFROM T1 INNER JOIN (
>   SELECT c, COUNT(c) as cnt
>   FROM T1
>   GROUP BY c) countsON counts.c = c
>
> The above produces the aggregate value as well as each row which
contributed to the aggregate.

On Mon, Jul 6, 2020 at 9:42 AM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> I’m not sure I understand the question. Can you give an example of the
> source data and the results you’re expecting?
>
> On 4 Jul 2020, at 19:17, Courtney Robinson 
> wrote:
>
> I've posted this question on Stackoverflow here
> https://stackoverflow.com/questions/62732258/select-values-of-each-row-within-groups-on-a-table-with-composite-primary-key
>
>
> Copying for convenience:
>
> I'm using Ignite 2.8.1 I have a table T1(a,b,c) with both a and b as
> primary columns. I want to know the value of each b in each of the group.
>
> Normally this would be fine since the primary key is functionally
> dependent on the grouped column c in this case but Ignite's returning an
> error saying b must be one of group by'a columns...which wouldn't be what
> I want, in fact that'd be the same as not grouping.
>
> Using the available SELECT
> <https://apacheignite-sql.readme.io/docs/select> - can you suggest how to
> get Ignite to produce both a and b for each group, or even just b. It
> happily produces a as if it is the only column in the primary key.
> 
> Any thoughts?
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io/>
>
> <https://hypi.io/>
> https://hypi.io
>
>
>
>


Deploying Ignite in Docker

2020-06-29 Thread Courtney Robinson
I've deployed Ignite bare-metal and in Kubernetes in test and production
but I'm now trying to deploy for a new project in a new cluster with Docker
and it is proving difficult. I can't figure out what port it is trying to
use or how to force it to use a specific one.

Note that I've removed the "--net=host" option which the docs
<https://apacheignite.readme.io/docs/docker-deployment> mention since that
would expose the cluster on the node's public IPs.

Note that if I use --net=host it works as expected.

sudo docker run -d -it --name=ignite --restart=unless-stopped \
-e CONFIG_URI="/ignite/config.xml" \
-e OPTION_LIBS="ignite-rest-http,ignite-visor-console,ignite-web" \
-e JVM_OPTS="-Xms1g -Xmx10g -server -XX:+AggressiveOpts
-XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true
-DIGNITE_QUIET=false -Dcom.sun.management.jmxremote.port=49112" \
-e IGNITE_WORK_DIR=/persistence \
-v /mnt/vol0/ignite:/persistence \
-v /root/ignite/config.xml:/ignite/config.xml \
-p ${arr[$host]}:11211:11211 \
-p ${arr[$host]}:47100:47100 \
-p ${arr[$host]}:47500:47500 \
-p ${arr[$host]}:49112:49112 \
-p ${arr[$host]}:49100:49100 \
-p ${arr[$host]}:10800:10800 \
-p ${arr[$host]}:8080:8080 \
apacheignite/ignite:2.8.1

This is in a loop so ${arr[$host]} will be replaced with one of the private
network's IPs.
You can see from the logs that the TCPDiscovery gets connections but
whatever happens next isn't successful so the nodes keep retrying and the
logs below just repeat forever:

[12:46:42,934][INFO][tcp-disco-sock-reader-[20ec0423
172.17.0.1:35998]-#10][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/172.17.0.1:35998,
> rmtPort=35998
> [12:46:45,637][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
> discovery accepted incoming connection [rmtAddr=/172.17.0.1,
> rmtPort=36016]
> [12:46:45,637][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
> discovery spawning a new thread for connection [rmtAddr=/172.17.0.1,
> rmtPort=36016]
> [12:46:45,637][INFO][tcp-disco-sock-reader-[]-#11][TcpDiscoverySpi]
> Started serving remote node connection [rmtAddr=/172.17.0.1:36016,
> rmtPort=36016]
> [12:46:45,639][INFO][tcp-disco-sock-reader-[20ec0423 
> 172.17.0.1:36016]-#11][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/172.17.0.1:36016,
> rmtPort=36016
> [12:46:45,661][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
> discovery accepted incoming connection [rmtAddr=/10.131.60.224,
> rmtPort=40325]
> [12:46:45,661][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
> discovery spawning a new thread for connection [rmtAddr=/10.131.60.224,
> rmtPort=40325]
> [12:46:45,661][INFO][tcp-disco-sock-reader-[]-#12][TcpDiscoverySpi]
> Started serving remote node connection [rmtAddr=/10.131.60.224:40325,
> rmtPort=40325]
> [12:46:45,662][INFO][tcp-disco-sock-reader-[8433be36 
> 10.131.60.224:40325]-#12][TcpDiscoverySpi]
> Initialized connection with remote server node
> [nodeId=8433be36-2855-4ff3-a849-35c3ebb25545, rmtAddr=/10.131.60.224:40325
> ]
> [12:46:45,676][INFO][tcp-disco-sock-reader-[8433be36 
> 10.131.60.224:40325]-#12][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.131.60.224:40325,
> rmtPort=40325
>

The config.xml being used is below and the docker command is:

What have I missed in this config? What port does it need that isn't being
set/opened?


http://www.springframework.org/schema/beans;
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
   xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd;>
  


  


  
  
  


  

10.131.53.147:47500
10.131.77.79:47500
10.131.60.224:47500
10.131.77.111:47500
10.131.77.93:47500
10.131.77.84:47500
  

  

  


  

  

  


Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


Re: stddev* and var* etc are not implemented but are documented

2020-06-22 Thread Courtney Robinson
Ahhh, should've done a search before.
Okay inferring no real appetite to implement because of the complexities
alluded to in that issue.
Is there anyway for guests to update/send change requests since the docs
are on readme.io?

On Mon, Jun 22, 2020 at 10:51 AM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> This actually came up last year but it seems that we neither updated the
> documentation nor implemented the missing functions.
>
>
> https://lists.apache.org/thread.html/f717f17cf9852fe77274df8187c9b82148537dfc829772aba810e613%40%3Cuser.ignite.apache.org%3E
>
> https://issues.apache.org/jira/browse/IGNITE-3180
>
> Regards,
> Stephen
>
> On 22 Jun 2020, at 10:16, Courtney Robinson 
> wrote:
>
> Hi,
> The STDDEV* and VAR* SQL functions are documented on both Ignite and
> GridGain websites but are not implemented.
>
> https://www.gridgain.com/docs/latest/sql-reference/aggregate-functions#stddev_pop
> https://apacheignite-sql.readme.io/docs/aggregate-functions
>
> The enums were commented out 6 years ago in
>
> https://github.com/apache/ignite/commit/63ec158e8dcfa9294c0727b8b778c87477416e10
> So these do not work STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, BOOL_OR,
> BOOL_AND, SELECTIVITY, HISTOGRAM.
>
> What's the story here? Shouldn't they be removed from the docs until/if
> they are ever implemented?
>
> Using Ignite 2.8.
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io/>
>
> <https://hypi.io/>
> https://hypi.io
>
>
>
>


stddev* and var* etc are not implemented but are documented

2020-06-22 Thread Courtney Robinson
Hi,
The STDDEV* and VAR* SQL functions are documented on both Ignite and
GridGain websites but are not implemented.
https://www.gridgain.com/docs/latest/sql-reference/aggregate-functions#stddev_pop
https://apacheignite-sql.readme.io/docs/aggregate-functions

The enums were commented out 6 years ago in
https://github.com/apache/ignite/commit/63ec158e8dcfa9294c0727b8b778c87477416e10
So these do not work STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, BOOL_OR,
BOOL_AND, SELECTIVITY, HISTOGRAM.

What's the story here? Shouldn't they be removed from the docs until/if
they are ever implemented?

Using Ignite 2.8.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


Why does CacheBasedDataSet destroy the cache it is given

2020-05-27 Thread Courtney Robinson
Hi all,

The current CacheBasedDataSet destroys the cache and all data along with
it...there is no option to turn this off either.

https://github.com/apache/ignite/blob/master/modules/ml/src/main/java/org/apache/ignite/ml/dataset/impl/cache/CacheBasedDataset.java#L189

/** {@inheritDoc} */
@Override public void close() {
datasetCache.destroy();
ComputeUtils.removeData(ignite, datasetId);
ComputeUtils.removeLearningEnv(ignite, datasetId);
}


Why does it do this?
It means that using SqlDatasetBuilder will result in the data being deleted
after training a model.
We had to work around this with

var datasetBuilder = new SqlDatasetBuilder(repo.getCtx().getIgnite(),
cacheName, (k, v) -> {
  //*...*
});
var wrapper = new DatasetBuilder() {
  @Override
  public  Dataset build(LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder partCtxBuilder,
PartitionDataBuilder partDataBuilder,
LearningEnvironment localLearningEnv) {
var cbd = datasetBuilder.build(envBuilder, partCtxBuilder,
partDataBuilder, localLearningEnv);
return new DatasetWrapper(cbd) {
  @Override public void close() {
System.out.println("Dataset closed");
//DO NOT call close. Cache based data set deletes the data in
the cache like some mad man!
  }
};
  }

  @Override
  public DatasetBuilder
withUpstreamTransformer(UpstreamTransformerBuilder builder) {
return datasetBuilder.withUpstreamTransformer(builder);
  }

  @Override
  public DatasetBuilder
withFilter(IgniteBiPredicate filterToAdd) {
return datasetBuilder.withFilter(filterToAdd);
  }
};

which works but seems very hacky.
Are we misusing the API somehow - examples/docs do not mention or indicate
anything about this as far as I've found.

Regards,
Courtney Robinson
Founder and CEO, Hypi
https://hypi.io


Re: Unable to enable ML inference storage

2020-05-27 Thread Courtney Robinson
Hi Alex,
Thanks for replying. We're definitely not loading it twice.
The MLPluginProvider.onIgniteStart is being called being we can set the
node to "active".
To work around it we had to override that method and call it after...

> public class HypiMLPluginProvider extends MLPluginProvider {
>
>   /** {@inheritDoc} */
>   @Override public void onIgniteStart() {
> SystemBootstrap.onBootstrapped(super::onIgniteStart);
>   }
>
> }
>
>
is there a way to start a node as active without explicitly setting it to
be we are currently doing

ignite.cluster().active(true)

after

ignite = IgnitionEx.start(cfg, springGridCtx)

Regards,
Courtney Robinson
Founder and CEO, Hypi
https://hypi.io


On Thu, May 21, 2020 at 9:55 PM akorensh  wrote:

> Hi,
>
>   Looks like you are loading the ML plugin twice (see below)
>   Can you try w/out the plugin config first.
>   Just copy the ignite-ml module to the libs folder or
>   follow these instructions:
> https://apacheignite.readme.io/docs/machine-learning#getting-started
>
>   If that works, put in your xml config, and try again.
>
>   see:
>
> https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/inference/ModelStorageExample.java
> (model storage)
>   and the corresponding:
>
> https://github.com/apache/ignite/blob/master/examples/config/example-ignite-ml.xml
> (this shows how to config ML plugin)
>
>
>   If it still doesn't work send the config and, if possible, a reproducer
> project.
> Thanks, Alex
>
>
>
> Multiple loadings of ML Plugin
>
>  2020-05-21 10:04:25.382  INFO 63933 --- [   main]
> o.a.i.i.p.plugin.IgnitePluginProcessor   : Configured plugins:
> [10:04:25]   ^-- ml-inference-plugin 1.0.0
> 2020-05-21 10:04:25.383  INFO 63933 --- [   main]
> o.a.i.i.p.plugin.IgnitePluginProcessor   :   ^-- ml-inference-plugin 1.0.0
> 2020-05-21 10:04:25.382  INFO [hypi,,,] 63933 --- [   main]
> o.a.i.i.p.plugin.IgnitePluginProcessor  [?] : Configured plugins:
> [10:04:25]   ^-- null
> 2020-05-21 10:04:25.383  INFO 63933 --- [   main]
> o.a.i.i.p.plugin.IgnitePluginProcessor   :   ^-- null
> 2020-05-21 10:04:25.383  INFO [hypi,,,] 63933 --- [   main]
> o.a.i.i.p.plugin.IgnitePluginProcessor  [?] :   ^-- ml-inference-plugin
> 1.0.0
> [10:04:25]
> 2020-05-21 10:04:25.383  INFO [hypi,,,] 63933 --- [   main]
> o.a.i.i.p.plugin.IgnitePluginProcessor  [?] :   ^-- null
> 2020-05-21 10:04:25.384  INFO 63933 --- [   main]
> o.a.i.i.p.plugin.IgnitePluginProcessor   :
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Re: Backups not being done for SQL caches

2020-05-01 Thread Courtney Robinson
Hi Alexandr,
Thanks, I didn't know the metrics and topology info was different.
I found the issue, we were not adding the nodes to the baseline topology
due to a bug.
We're using v2.8.0 in the upgrade/migration, previous version is 2.7

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
https://hypi.io


On Thu, Apr 30, 2020 at 11:36 PM Alexandr Shapkin  wrote:

> Hi,
>
>
>
> I believe that you need to find the following message:
>
>
>
> 2020-04-30 16:57:44.8141|INFO|Test|Topology snapshot [ver=53,
> locNode=26887aac, servers=3, clients=0, state=ACTIVE, CPUs=16,
> offheap=8.3GB, heap=15.0GB]
>
> 2020-04-30 16:57:44.8141|INFO|Test|  ^-- Baseline [id=0, size=3, online=3,
> offline=0]
>
>
>
> Metrics don’t tell you l the actual topology and baseline snapshot.
>
> A node might be running, but not included into the baseline, that might be
> the reason in your case.
>
>
>
> Also, what Ignite version do you use?
>
>
>
> *From: *Courtney Robinson 
> *Sent: *Thursday, April 30, 2020 7:58 PM
> *To: *user@ignite.apache.org
> *Subject: *Re: Backups not being done for SQL caches
>
>
>
> Hi Illya,
>
> Yes we have persistence enabled in this cluster. This is also change from
> our current production deployment where we have our own CacheStore with
> read and write through enabled. In this test cluster Ignite's native
> persistence is being used without any external or custom CacheStore
> implementation.
>
>
>
> From the Ignite logs it says all 3 nodes are present:
>
>
>
> 2020-04-30 16:53:20.468  INFO 9 --- [orker-#23%hypi%]
> o.a.ignite.internal.IgniteKernal%hypi:
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
> ^-- Node [id=e0b6889f, name=hypi, uptime=19:15:06.473]
> ^-- H/N/C [hosts=3, nodes=3, CPUs=3]
> ^-- CPU [cur=-100%, avg=-100%, GC=0%]
> ^-- PageMemory [pages=975]
> ^-- Heap [used=781MB, free=92.37%, comm=4912MB]
> ^-- Off-heap [used=3MB, free=99.91%, comm=4296MB]
> ^--   sysMemPlc region [used=0MB, free=99.98%, comm=100MB]
> ^--   metastoreMemPlc region [used=0MB, free=99.95%, comm=0MB]
> ^--   TxLog region [used=0MB, free=100%, comm=100MB]
> ^--   hypi region [used=3MB, free=99.91%, comm=4096MB]
> ^-- Ignite persistence [used=3MB]
> ^--   sysMemPlc region [used=0MB]
> ^--   metastoreMemPlc region [used=0MB]
> ^--   TxLog region [used=0MB]
>     ^--   hypi region [used=3MB]
> ^-- Outbound messages queue [size=0]
> ^-- Public thread pool [active=0, idle=0, qSize=0]
> ^-- System thread pool [active=0, idle=6, qSize=0]
>
>
> Regards,
>
> Courtney Robinson
>
> Founder and CEO, Hypi
>
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>
> https://hypi.io
>
>
>
>
>
> On Thu, Apr 30, 2020 at 3:12 PM Ilya Kasnacheev 
> wrote:
>
> Hello!
>
>
>
> Do you have persistence? If so, are you sure that all 3 of your nodes are
> in baseline topology?
>
>
>
> Regards,
>
> --
>
> Ilya Kasnacheev
>
>
>
>
>
> чт, 30 апр. 2020 г. в 16:09, Courtney Robinson  >:
>
> We're continuing migration from using the Java API to purley SQL and have
> encountered a situation on our development cluster where even though ALL
> tables are created with backups=2, as in
>
> template=partitioned,backups=2,affinity_key=instanceId,atomicity=ATOMIC,cache_name=  name here>
>
> In the logs, with 3 nodes in this test environment we have:
>
>
>
> 2020-04-29 22:55:50.083 INFO 9
> *--- [orker-#40%hypi%] o.apache.ignite.internal.exchange.time : Started
> exchange init [topVer=AffinityTopologyVersion [topVer=27, minorTopVer=1],
> crd=true, evt=DISCOVERY_CUSTOM_EVT,
> evtNode=e0b6889f-219b-4686-ab52-725bfe7848b2,
> customEvt=DynamicCacheChangeBatch
> [id=a81a0e7c171-3f0fbbc0-b996-448c-98f7-119d7e485f04, reqs=ArrayList
> [DynamicCacheChangeRequest [cacheName=hypi_whatsapp_Item, hasCfg=true,
> nodeId=e0b6889f-219b-4686-ab52-725bfe7848b2, clientStartOnly=false,
> stop=false, destroy=false, disabledAfterStartfalse]],
> exchangeActions=ExchangeActions [startCaches=[hypi_whatsapp_Item],
> stopCaches=null, startGrps=[hypi_whatsapp_Item], stopGrps=[],
> resetParts=null, stateChangeRequest=null], startCaches=false],
> allowMerge=false, exchangeFreeSwitch=false]*2020-04-29 22:55:50.280 INFO 9
>
> *--- [orker-#40%hypi%] o.a.i.i.p.cache.GridCacheProcessor : Started cache
> [name=hypi_whatsapp_Item, id=1391701259, dataRegionName=hypi,
> mode=PARTITIONED, atomicity=ATOMIC, backups=2, mvcc=false]*2020-04-29 22:
> 55:50.289 INFO 9
> *--- [ sys-#648%hypi%] o.a.i.i.p.a.GridAffinityAssignmentCache : Loc

Re: Backups not being done for SQL caches

2020-04-30 Thread Courtney Robinson
Hi Illya,
Yes we have persistence enabled in this cluster. This is also change from
our current production deployment where we have our own CacheStore with
read and write through enabled. In this test cluster Ignite's native
persistence is being used without any external or custom CacheStore
implementation.

>From the Ignite logs it says all 3 nodes are present:

2020-04-30 16:53:20.468  INFO 9 --- [orker-#23%hypi%]
o.a.ignite.internal.IgniteKernal%hypi:
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=e0b6889f, name=hypi, uptime=19:15:06.473]
^-- H/N/C [hosts=3, nodes=3, CPUs=3]
^-- CPU [cur=-100%, avg=-100%, GC=0%]
^-- PageMemory [pages=975]
^-- Heap [used=781MB, free=92.37%, comm=4912MB]
^-- Off-heap [used=3MB, free=99.91%, comm=4296MB]
^--   sysMemPlc region [used=0MB, free=99.98%, comm=100MB]
^--   metastoreMemPlc region [used=0MB, free=99.95%, comm=0MB]
^--   TxLog region [used=0MB, free=100%, comm=100MB]
^--   hypi region [used=3MB, free=99.91%, comm=4096MB]
^-- Ignite persistence [used=3MB]
^--   sysMemPlc region [used=0MB]
^--   metastoreMemPlc region [used=0MB]
^--   TxLog region [used=0MB]
^--   hypi region [used=3MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=6, qSize=0]

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
https://hypi.io


On Thu, Apr 30, 2020 at 3:12 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> Do you have persistence? If so, are you sure that all 3 of your nodes are
> in baseline topology?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 30 апр. 2020 г. в 16:09, Courtney Robinson  >:
>
>> We're continuing migration from using the Java API to purley SQL and have
>> encountered a situation on our development cluster where even though ALL
>> tables are created with backups=2, as in
>>
>> template=partitioned,backups=2,affinity_key=instanceId,atomicity=ATOMIC,cache_name=>  name here>
>>
>> In the logs, with 3 nodes in this test environment we have:
>>
>> 2020-04-29 22:55:50.083 INFO 9 --- [orker-#40%hypi%]
>>> o.apache.ignite.internal.exchange.time : Started exchange init
>>> [topVer=AffinityTopologyVersion [topVer=27, minorTopVer=1], crd=true,
>>> evt=DISCOVERY_CUSTOM_EVT, evtNode=e0b6889f-219b-4686-ab52-725bfe7848b2,
>>> customEvt=DynamicCacheChangeBatch
>>> [id=a81a0e7c171-3f0fbbc0-b996-448c-98f7-119d7e485f04, reqs=ArrayList
>>> [DynamicCacheChangeRequest [cacheName=hypi_whatsapp_Item, hasCfg=true,
>>> nodeId=e0b6889f-219b-4686-ab52-725bfe7848b2, clientStartOnly=false,
>>> stop=false, destroy=false, disabledAfterStartfalse]],
>>> exchangeActions=ExchangeActions [startCaches=[hypi_whatsapp_Item],
>>> stopCaches=null, startGrps=[hypi_whatsapp_Item], stopGrps=[],
>>> resetParts=null, stateChangeRequest=null], startCaches=false],
>>> allowMerge=false, exchangeFreeSwitch=false]
>>> 2020-04-29 22:55:50.280 INFO 9 --- [orker-#40%hypi%]
>>> o.a.i.i.p.cache.GridCacheProcessor : Started cache
>>> [name=hypi_whatsapp_Item, id=1391701259, dataRegionName=hypi,
>>> mode=PARTITIONED, atomicity=ATOMIC, backups=2, mvcc=false]
>>> 2020-04-29 22:55:50.289 INFO 9 --- [ sys-#648%hypi%]
>>> o.a.i.i.p.a.GridAffinityAssignmentCache : Local node affinity assignment
>>> distribution is not ideal [cache=hypi_whatsapp_Item,
>>> expectedPrimary=1024.00, actualPrimary=0, expectedBackups=2048.00,
>>> actualBackups=0, warningThreshold=50.00%]
>>> 2020-04-29 22:55:50.293 INFO 9 --- [orker-#40%hypi%]
>>> .c.d.d.p.GridDhtPartitionsExchangeFuture : Finished waiting for partition
>>> release future [topVer=AffinityTopologyVersion [topVer=27, minorTopVer=1],
>>> waitTime=0ms, futInfo=NA, mode=DISTRIBUTED]
>>> 2020-04-29 22:55:50.330 INFO 9 --- [orker-#40%hypi%]
>>> .c.d.d.p.GridDhtPartitionsExchangeFuture : Finished waiting for partitions
>>> release latch: ServerLatch [permits=0, pendingAcks=HashSet [],
>>> super=CompletableLatch [id=CompletableLatchUid [id=exchange,
>>> topVer=AffinityTopologyVersion [topVer=27, minorTopVer=1
>>
>>
>> You can see the line
>>
>> Local node affinity assignment distribution is not ideal
>>
>>
>> but it's clear they the backup = 2 is there. To verify, I stoped 2 of the
>> three nodes and sure enough I get the exception
>>
>> Failed to find data nodes for cache: InstanceMapping
>>
>>
>> Is there some additional configuration needed for partitioned SQL caches
>> to

Backups not being done for SQL caches

2020-04-30 Thread Courtney Robinson
stMappingHandlerAdapter.java:892)
>at 
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:797)
>at 
> org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
>at 
> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1039)
>at 
> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)
>at 
> org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005)
>at 
> org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:908)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:660)
>at 
> org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
>at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
>at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
>at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
>at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
>at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
>at 
> io.hypi.arc.os.config.CorsConfiguration$1.doFilter(CorsConfiguration.java:60)
>at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
>at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
>at 
> org.springframework.boot.actuate.web.trace.servlet.HttpTraceFilter.doFilterInternal(HttpTraceFilter.java:88)
>at 
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:118)
>at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
>at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
>at 
> org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.filterAndRecordMetrics(WebMvcMetricsFilter.java:114)
>at 
> org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.doFilterInternal(WebMvcMetricsFilter.java:104)
>at 
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:118)
>at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
>at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
>at 
> org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
>at 
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:118)
>at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
>at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
>at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202)
>at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
>at 
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:526)
>at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
>at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
>at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
>at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
>at 
> org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408)
>at 
> org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
>at 
> org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:860)
>at 
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1587)
>at 
> org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
>at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>at 
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>at java.base/java.lang.Thread.run(Thread.java:834)
>
>
Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
https://hypi.io


Re: SQL MERGE INTO with SELECT UNION

2020-04-24 Thread Courtney Robinson
Hi Illya,
I did end up just doing select then insert/update in the end to work around
this.
I didn't try selecting over a temp. table though.

There are no additional guarantees as far as I know either but there used
to be a  performance impact.
We're migrating functionality from using cache.put/putAll ourselves and
from our benchmarks of doing cache.get then cache.put there was a
difference that ranged from about 10 to 15% slower than when we changed to
a design that ensured we could always do cache.put without cache.get.

We assumed the same to be true with SQL so that's what motivated using
merge in the first place since we'd incur parsing once and fewer network
round trips...that was the theory anyway, we never got as far as
benchmarking to know for sure with our workload because we never got the
MERGE query working.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
https://hypi.io


On Fri, Apr 24, 2020 at 12:56 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> I think you can union a select over a temporary table of one row.
> such as
>
> select * from table (id bigint = ?, ...)
>
>
> However, maybe you should just re-write your upsert with select and then
> insert/update.
> You're not gaining any more guarantees by using MERGE, as far as I know.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 23 апр. 2020 г. в 00:56, Courtney Robinson  >:
>
>> My aim is to perform an upsert.
>> Originally, my query was just doing a MERGE INTO with no UNION.
>> Unfortunately Ignite if a row already exists, Ignite DOES NOT merge, it
>> replaces the row. So any columns from the old row that are not included in
>> the new MERGE will be set to NULL at the end of the operation.
>> Looking around I found
>> http://apache-ignite-users.70518.x6.nabble.com/INSERT-and-MERGE-statements-td28685.html
>>  which
>> suggests this is intended behaviour and not a bug.
>>
>> So I thought one way to do this with SQL is by doing a MERGE SELECT where
>> the first SELECT gets the existing row and any columns not being updated
>> are taken from the existing row. If no row matches the first select then
>> nothing will be inserted (that's why I need the union) so the second SELECT
>> is a list of literals of the columns currently being modified.
>>
>> In effect I'm doing an IF first SELECT take its data else use these
>> literals. Ignite also doesn't support the MERGE USING syntax in H2
>> http://www.h2database.com/html/commands.html#merge_using so I thought
>> this might work.
>>
>> Using the MERGE SELECT UNION I can't get Ignite to parse the second
>> select IFF the fields are placeholders i.e. ?
>>
>> In
>>
>>> *MERGE* *INTO* hypi_store_App(hypi_id,hypi_instanceId,hypi_created,
>>> hypi_updated,hypi_createdBy,hypi_instance,hypi_app,hypi_release,
>>> hypi_publisherRealm,hypi_publisherApp,hypi_publisherRelease,hypi_impl)(
>>> *SELECT* ?,?,(IFNULL(*SELECT* hypi_created *FROM* hypi_store_App *WHERE*
>>> hypi_instanceId = ? *AND* hypi_id = ?, 
>>> *CURRENT_TIMESTAMP*())),?,?,?,?,?,?,?,?,?
>>> *FROM* hypi_store_App r *WHERE* hypi_id = ? *AND* hypi_instanceId = ?
>>>
>>> *UNION**SELECT* 'a','a','a','a','a','a','a','a','a','a','a','a'
>>> -- SELECT ?,?,?,?,?,?,?,?,?,?,?,?
>>> -- SELECT ?,?,(IFNULL(SELECT hypi_created FROM hypi_store_App WHERE
>>> hypi_instanceId = ? AND hypi_id = ?, CURRENT_TIMESTAMP())),?,?,?,?,?,?,?,?,?
>>> );
>>
>>
>> The query is parsed successfully if I use literals as in *SELECT* 'a','a'
>> ,'a','a','a','a','a','a','a','a','a','a' but SELECT
>> ?,?,?,?,?,?,?,?,?,?,?,? will fail, same for the longer version above.
>>
>> The error is
>>  Failed to parse query. Unknown data type: "?, ?"
>> as in
>>
>> SQL Error [1001] [42000]: Failed to parse query. Unknown data type: "?,
>>> ?"; SQL statement:
>>> MERGE INTO
>>> hypi_store_App(hypi_id,hypi_instanceId,hypi_created,hypi_updated,hypi_createdBy,hypi_instance,hypi_app,hypi_release,hypi_publisherRealm,hypi_publisherApp,hypi_publisherRelease,hypi_impl)(
>>> SELECT ?,?,(IFNULL(SELECT hypi_created FROM hypi_store_App WHERE
>>> hypi_instanceId = ? AND hypi_id = ?,
>>> CURRENT_TIMESTAMP())),?,?,?,?,?,?,?,?,? FROM hypi_store_App r WHERE hypi_id
>>> = ? AND hypi_instanceId = ?
>>> UNION
>>> -- SELECT 'a','a','a','a','a','a','a','a','a','a','a','a'
>>>  SELECT ?,?,?,?,?,?,?,?,?,?,?,?
>>> --SELECT ?,?,(IFNULL(SELECT hypi_created FROM hypi_store_App WHERE
>>> hypi_instanceId = ? AND hypi_id = ?, CURRENT_TIMESTAMP())),?,?,?,?,?,?

SQL MERGE INTO with SELECT UNION

2020-04-22 Thread Courtney Robinson
CT ?,?,?,?,?,?,?,?,?,?,?,?
> --SELECT ?,?,(IFNULL(SELECT hypi_created FROM hypi_store_App WHERE
> hypi_instanceId = ? AND hypi_id = ?, CURRENT_TIMESTAMP())),?,?,?,?,?,?,?,?,?
> ) [50004-197]
> at
> org.apache.ignite.internal.processors.query.h2.QueryParser.parseH2(QueryParser.java:582)
> at
> org.apache.ignite.internal.processors.query.h2.QueryParser.parse0(QueryParser.java:210)
> at
> org.apache.ignite.internal.processors.query.h2.QueryParser.parse(QueryParser.java:131)
> at
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.querySqlFields(IgniteH2Indexing.java:1060)
> at
> org.apache.ignite.internal.processors.query.GridQueryProcessor$3.applyx(GridQueryProcessor.java:2406)
> at
> org.apache.ignite.internal.processors.query.GridQueryProcessor$3.applyx(GridQueryProcessor.java:2402)
> at
> org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36)
> at
> org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:2919)
> at
> org.apache.ignite.internal.processors.query.GridQueryProcessor.lambda$querySqlFields$1(GridQueryProcessor.java:2422)
> at
> org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuerySafe(GridQueryProcessor.java:2460)
> at
> org.apache.ignite.internal.processors.query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:2396)
> at
> org.apache.ignite.internal.processors.query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:2354)
> at
> org.apache.ignite.internal.processors.odbc.jdbc.JdbcRequestHandler.executeQuery(JdbcRequestHandler.java:615)
> at
> org.apache.ignite.internal.processors.odbc.jdbc.JdbcRequestHandler.doHandle(JdbcRequestHandler.java:310)
> at
> org.apache.ignite.internal.processors.odbc.jdbc.JdbcRequestHandler.handle(JdbcRequestHandler.java:247)
> at
> org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:195)
> at
> org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:49)
> at
> org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279)
> at
> org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
> at
> org.apache.ignite.internal.util.nio.GridNioAsyncNotifyFilter$3.body(GridNioAsyncNotifyFilter.java:97)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at
> org.apache.ignite.internal.util.worker.GridWorkerPool$1.run(GridWorkerPool.java:70)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.h2.jdbc.JdbcSQLException: Unknown data type: "?, ?"; SQL
> statement:
> MERGE INTO
> hypi_store_App(hypi_id,hypi_instanceId,hypi_created,hypi_updated,hypi_createdBy,hypi_instance,hypi_app,hypi_release,hypi_publisherRealm,hypi_publisherApp,hypi_publisherRelease,hypi_impl)(
> SELECT ?,?,(IFNULL(SELECT hypi_created FROM hypi_store_App WHERE
> hypi_instanceId = ? AND hypi_id = ?,
> CURRENT_TIMESTAMP())),?,?,?,?,?,?,?,?,? FROM hypi_store_App r WHERE hypi_id
> = ? AND hypi_instanceId = ?
> UNION
> -- SELECT 'a','a','a','a','a','a','a','a','a','a','a','a'
>  SELECT ?,?,?,?,?,?,?,?,?,?,?,?
> --SELECT ?,?,(IFNULL(SELECT hypi_created FROM hypi_store_App WHERE
> hypi_instanceId = ? AND hypi_id = ?, CURRENT_TIMESTAMP())),?,?,?,?,?,?,?,?,?
> ) [50004-197]
> at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
> at org.h2.message.DbException.get(DbException.java:179)
> at org.h2.message.DbException.get(DbException.java:155)
> at org.h2.value.Value.getHigherOrder(Value.java:370)
> at org.h2.command.dml.SelectUnion.prepare(SelectUnion.java:348)
> at org.h2.command.dml.Merge.prepare(Merge.java:283)
> at org.h2.command.Parser.prepareCommand(Parser.java:283)
> at org.h2.engine.Session.prepareLocal(Session.java:611)
> at org.h2.engine.Session.prepareCommand(Session.java:549)
> at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1247)
> at org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:76)
> at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:694)
> at
> org.apache.ignite.internal.processors.query.h2.ConnectionManager.prepareStatementNoCache(ConnectionManager.java:363)
> at
> org.apache.ignite.internal.processors.query.h2.QueryParser.parseH2(QueryParser.java:345)
> ... 24 common frames omitted
>

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
https://hypi.io


Re: Transaction already completed errors

2020-04-15 Thread Courtney Robinson
Thanks for letting me know.
It's worth adding this to the docs it doesn't currently include any warning
or notice that TRANSACTIONAL_SNAPSHOT isn't ready for production in
https://apacheignite-sql.readme.io/docs/multiversion-concurrency-control
Is there a set of outstanding tickets I can keep track of? Depending on
time needed, we could potentially contribute to getting this released.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Wed, Apr 15, 2020 at 3:26 PM Evgenii Zhuravlev 
wrote:

> Hi Courtney,
>
> MVCC is not production ready yet, so, I wouldn't recommend using
> TRANSACTIONAL_SNAPSHOT atomicity for now.
>
> Best Regards,
> Evgenii
>
> ср, 15 апр. 2020 г. в 06:02, Courtney Robinson  >:
>
>> We're upgrading to Ignite 2.8 and are starting to use SQL tables. In all
>> previous work we've used the key value APIs directly.
>>
>> After getting everything working, we're regularly seeing "transaction
>> already completed" errors when executing SELECT queries. A stack trace is
>> included at the end.
>> All tables are created with
>> "template=partitioned,backups=2,data_region=hypi,affinity_key=instanceId,atomicity=TRANSACTIONAL_SNAPSHOT"
>>
>> I found https://issues.apache.org/jira/browse/IGNITE-10763 which
>> suggested the problem was fixed in 2.8 and "is caused by leaked tx stored
>> in ThreadLocal".
>>
>> Has anyone else encountered this issue and is there a fix?
>> Just to be clear, we're definitely not performing any insert/update/merge
>> operations, only selects when this error occurs.
>>
>> From that issue I linked to, assuming the problem is still a leaked
>> ThreadLocal is there any workaround for this?
>> We have a managed thread pool (you can see Pool.java in the trace), I've
>> tried not to use it but still get the error because I guess it's now just
>> defaulting to Spring Boot's request thread pool.
>>
>>
>> 2020-04-13 19:56:31.548 INFO 9 --- [io-1-exec-2] io.hypi.arc.os.gql.
>>> HypiGraphQLException : GraphQL error, path: null, source: null, msg:
>>> null javax.cache.CacheException: Transaction is already completed. at
>>> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(
>>> IgniteCacheProxyImpl.java:820) at org.apache.ignite.internal.processors.
>>> cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:753) at org.
>>> apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query
>>> (GatewayProtectedCacheProxy.java:424) at io.hypi.arc.os.ignite.
>>> IgniteRepo.findInstanceCtx(IgniteRepo.java:134) at io.hypi.arc.os.
>>> handlers.BaseHandler.evaluateQuery(BaseHandler.java:38) at io.hypi.arc.
>>> os.handlers.HttpHandler.lambda$runQuery$0(HttpHandler.java:145) at io.
>>> hypi.arc.base.Pool.apply(Pool.java:109) at io.hypi.arc.base.Pool.
>>> lambda$async$3(Pool.java:93) at com.google.common.util.concurrent.
>>> TrustedListenableFutureTask$TrustedFutureInterruptibleTask.
>>> runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.
>>> common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>>> at com.google.common.util.concurrent.TrustedListenableFutureTask.run(
>>> TrustedListenableFutureTask.java:78) at java.base/java.util.concurrent.
>>> Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.
>>> util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.
>>> util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
>>> ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.
>>> ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by:
>>> org.apache.ignite.transactions.TransactionAlreadyCompletedException:
>>> Transaction is already completed. at org.apache.ignite.internal.util.
>>> IgniteUtils$18.apply(IgniteUtils.java:991) at org.apache.ignite.internal
>>> .util.IgniteUtils$18.apply(IgniteUtils.java:989) at org.apache.ignite.
>>> internal.util.IgniteUtils.convertException(IgniteUtils.java:1062) at org
>>> .apache.ignite.internal.processors.query.h2.IgniteH2Indexing.
>>> executeSelect(IgniteH2Indexing.java:1292) at org.apache.ignite.internal.
>>> processors.query.h2.IgniteH2Indexing.querySqlFields(IgniteH2Indexing.
>>> java:1117) at org.apache.ignite.internal.processors.query

Transaction already completed errors

2020-04-15 Thread Courtney Robinson
We're upgrading to Ignite 2.8 and are starting to use SQL tables. In all
previous work we've used the key value APIs directly.

After getting everything working, we're regularly seeing "transaction
already completed" errors when executing SELECT queries. A stack trace is
included at the end.
All tables are created with
"template=partitioned,backups=2,data_region=hypi,affinity_key=instanceId,atomicity=TRANSACTIONAL_SNAPSHOT"

I found https://issues.apache.org/jira/browse/IGNITE-10763 which suggested
the problem was fixed in 2.8 and "is caused by leaked tx stored in
ThreadLocal".

Has anyone else encountered this issue and is there a fix?
Just to be clear, we're definitely not performing any insert/update/merge
operations, only selects when this error occurs.

>From that issue I linked to, assuming the problem is still a leaked
ThreadLocal is there any workaround for this?
We have a managed thread pool (you can see Pool.java in the trace), I've
tried not to use it but still get the error because I guess it's now just
defaulting to Spring Boot's request thread pool.


2020-04-13 19:56:31.548 INFO 9 --- [io-1-exec-2] io.hypi.arc.os.gql.
> HypiGraphQLException : GraphQL error, path: null, source: null, msg: null
> javax.cache.CacheException: Transaction is already completed. at org.
> apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(
> IgniteCacheProxyImpl.java:820) at org.apache.ignite.internal.processors.
> cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:753) at org.
> apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(
> GatewayProtectedCacheProxy.java:424) at io.hypi.arc.os.ignite.IgniteRepo.
> findInstanceCtx(IgniteRepo.java:134) at io.hypi.arc.os.handlers.
> BaseHandler.evaluateQuery(BaseHandler.java:38) at io.hypi.arc.os.handlers.
> HttpHandler.lambda$runQuery$0(HttpHandler.java:145) at io.hypi.arc.base.
> Pool.apply(Pool.java:109) at io.hypi.arc.base.Pool.lambda$async$3(Pool.
> java:93) at com.google.common.util.concurrent.TrustedListenableFutureTask$
> TrustedFutureInterruptibleTask.runInterruptibly(
> TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent
> .InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.
> util.concurrent.TrustedListenableFutureTask.run(
> TrustedListenableFutureTask.java:78) at java.base/java.util.concurrent.
> Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.
> concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.
> concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
> ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.
> ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
> :628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.
> apache.ignite.transactions.TransactionAlreadyCompletedException:
> Transaction is already completed. at org.apache.ignite.internal.util.
> IgniteUtils$18.apply(IgniteUtils.java:991) at org.apache.ignite.internal.
> util.IgniteUtils$18.apply(IgniteUtils.java:989) at org.apache.ignite.
> internal.util.IgniteUtils.convertException(IgniteUtils.java:1062) at org.
> apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSelect(
> IgniteH2Indexing.java:1292) at org.apache.ignite.internal.processors.query
> .h2.IgniteH2Indexing.querySqlFields(IgniteH2Indexing.java:1117) at org.
> apache.ignite.internal.processors.query.GridQueryProcessor$3.applyx(
> GridQueryProcessor.java:2406) at org.apache.ignite.internal.processors.
> query.GridQueryProcessor$3.applyx(GridQueryProcessor.java:2402) at org.
> apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX
> .java:36) at org.apache.ignite.internal.processors.query.
> GridQueryProcessor.executeQuery(GridQueryProcessor.java:2919) at org.
> apache.ignite.internal.processors.query.GridQueryProcessor.
> lambda$querySqlFields$1(GridQueryProcessor.java:2422) at org.apache.ignite
> .internal.processors.query.GridQueryProcessor.executeQuerySafe(
> GridQueryProcessor.java:2460) at org.apache.ignite.internal.processors.
> query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:2396) at
> org.apache.ignite.internal.processors.query.GridQueryProcessor.
> querySqlFields(GridQueryProcessor.java:2323) at org.apache.ignite.internal
> .processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:805
> ) ... 16 common frames omitted Caused by: org.apache.ignite.internal.
> transactions.IgniteTxAlreadyCompletedCheckedException: Transaction is
> already completed. at org.apache.ignite.internal.processors.cache.mvcc.
> MvccUtils.checkActive(MvccUtils.java:684) at org.apache.ignite.internal.
> processors.query.h2.IgniteH2Inde

Re: Fulltext matching

2018-09-10 Thread Courtney Robinson
Hi,
Thanks for the response.
I went ahead and implemented a custom indexing SPI. Works like a charm. As
long as Ignite doesn't drop support for the indexing SPI interface this is
exactly what we need.
I'm happy to create Jira issues and extract this into something more
generic for upstream if it'll be accepted.

Regards,
Courtney Robinson
CTO, Hypi
Tel: +4402032870961 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Thu, Sep 6, 2018 at 4:09 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> Unfortunately, fulltext doesn't seem to have much traction, so I recommend
> doing investigations on your side, possibly creating JIRA issues in the
> process.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 3 сент. 2018 г. в 22:34, Courtney Robinson  >:
>
>> Hi,
>>
>> We've got Ignite in production and decided to start using some fulltext
>> matching as well.
>> I've investigated and can't figure out why my queries are not matching.
>>
>> I construct a query entity e.g new QueryEntity(keyClass, valueClass) and
>> in debug I can see it generates a list of fields
>> e.g. a, b, c.a, c.b
>> I then expected to be able to match on those fields that are marked as
>> indexed. Everything is annotation driven. The appropriate fields have been
>> annotated and appear to be detected as such
>> when I inspect what gets put into the QueryEntityDescriptor. i.e. all
>> expected indices and indexed fields are present.
>>
>> In LuceneGridIndex I see that the lucene document generated as fields a,b
>> (c.a and c.b are not included). Now a couple of questions arise:
>>
>> 1. Is there a way to get Ignite to index the nested fields as well so
>> that c.a and c.b end up in the doc?
>>
>> 2. If you use a composite object as a key, its fields are extracted into
>> the top level so if you have Key.a and Value.a you cannot index both since
>> Key.a becomes a which collides with Value.a - can this be changed, are
>> there any known reasons why it couldn't be (i.e. I'm happy to send a PR
>> doing so - but I suspect the answer to this is linked to the answer to the
>> first question)
>>
>> 3. The docs simply say you can use lucene syntax, I presume it means the
>> syntax that appears in
>> https://lucene.apache.org/core/2_9_4/queryparsersyntax.html is all valid
>> - checking the code that appears to be case as it does
>> a MultiFieldQueryParser in GridLuceneIndex. However, when I try to run a
>> query such as a: - none of the indexed documents match. In debug
>> mode I've enabled parser.setAllowLeadingWildcard(true); and if I do a
>> simple searcher.search * I get back the list of expected documents.
>>
>> What's even more odd is I tried querying each of the 6 indexed fields as
>> found in idxdFields in GridLuceneIndex and 1 of them match. The other
>> values are being typed exactly but also doing wild cards or other free text
>> forms do not match.
>>
>> 4. I couldn't see a way to provide a custom GridLuceneIndex, I found the
>> two cases where it's constructed in the code base and doesn't look like I
>> can inject instances. Is it ok to construct and use a custom
>> GridLuceneDirectory/IndexWriter/Searcher and so on in the same way
>> GridLuceneIndex does it so I can do a custom IndexingSpi to change how
>> indexing happens?
>> There are a number of things I'd like to customise and from looking at
>> the current impl. these things aren't injectable, I guess it's not
>> considered a prime use case maybe.
>>
>> Yeah, the analyzer and a number of things would be handy to change.
>> Ideally also want to customise how a field is indexed e.g. to be able to do
>> term matches with lucene queries
>>
>> Looking at this impl as well it passes Integer.MAX_VALUE and pulls back
>> all matches. That'll surely kill our nodes for some of the use cases we're
>> considering.
>> I'd also like to implement paging, the searcher API has a nice option to
>> pass through a last doc it can continue from to potentially implement
>> something like deep-paging.
>>
>> 5. If I were to do a custom IndexingSpi to make all of this happen, how
>> do I get additional parameters through so that I could have paging params
>> passed
>>
>> Ideally I could customise the indexing, searching and paging through
>> standard Ignite means but I can't find any means of doing that in the
>> current code and short of doing a custom IndexingSpi I think I've gone as
>> far as I can debugging and could do with a few pointers of how to go about
>> this.
>>
>> FYI, SQL isn't a great option for this part of the product, we're
>> generating and compiling Java classes at runtime and generating SQL to do
>> the queries is an order of magnitude more work than indexing the relatively
>> few fields we need and then searching but off the bat the paging would be
>> an issue as there can be several million matches to a query. Can't have
>> Ignite pulling all of those into memory.
>>
>> Thanks in advance
>>
>> Courtney
>>
>


Fulltext matching

2018-09-03 Thread Courtney Robinson
Hi,

We've got Ignite in production and decided to start using some fulltext
matching as well.
I've investigated and can't figure out why my queries are not matching.

I construct a query entity e.g new QueryEntity(keyClass, valueClass) and in
debug I can see it generates a list of fields
e.g. a, b, c.a, c.b
I then expected to be able to match on those fields that are marked as
indexed. Everything is annotation driven. The appropriate fields have been
annotated and appear to be detected as such
when I inspect what gets put into the QueryEntityDescriptor. i.e. all
expected indices and indexed fields are present.

In LuceneGridIndex I see that the lucene document generated as fields a,b
(c.a and c.b are not included). Now a couple of questions arise:

1. Is there a way to get Ignite to index the nested fields as well so that
c.a and c.b end up in the doc?

2. If you use a composite object as a key, its fields are extracted into
the top level so if you have Key.a and Value.a you cannot index both since
Key.a becomes a which collides with Value.a - can this be changed, are
there any known reasons why it couldn't be (i.e. I'm happy to send a PR
doing so - but I suspect the answer to this is linked to the answer to the
first question)

3. The docs simply say you can use lucene syntax, I presume it means the
syntax that appears in
https://lucene.apache.org/core/2_9_4/queryparsersyntax.html is all valid -
checking the code that appears to be case as it does
a MultiFieldQueryParser in GridLuceneIndex. However, when I try to run a
query such as a: - none of the indexed documents match. In debug
mode I've enabled parser.setAllowLeadingWildcard(true); and if I do a
simple searcher.search * I get back the list of expected documents.

What's even more odd is I tried querying each of the 6 indexed fields as
found in idxdFields in GridLuceneIndex and 1 of them match. The other
values are being typed exactly but also doing wild cards or other free text
forms do not match.

4. I couldn't see a way to provide a custom GridLuceneIndex, I found the
two cases where it's constructed in the code base and doesn't look like I
can inject instances. Is it ok to construct and use a custom
GridLuceneDirectory/IndexWriter/Searcher and so on in the same way
GridLuceneIndex does it so I can do a custom IndexingSpi to change how
indexing happens?
There are a number of things I'd like to customise and from looking at the
current impl. these things aren't injectable, I guess it's not considered a
prime use case maybe.

Yeah, the analyzer and a number of things would be handy to change. Ideally
also want to customise how a field is indexed e.g. to be able to do term
matches with lucene queries

Looking at this impl as well it passes Integer.MAX_VALUE and pulls back all
matches. That'll surely kill our nodes for some of the use cases we're
considering.
I'd also like to implement paging, the searcher API has a nice option to
pass through a last doc it can continue from to potentially implement
something like deep-paging.

5. If I were to do a custom IndexingSpi to make all of this happen, how do
I get additional parameters through so that I could have paging params
passed

Ideally I could customise the indexing, searching and paging through
standard Ignite means but I can't find any means of doing that in the
current code and short of doing a custom IndexingSpi I think I've gone as
far as I can debugging and could do with a few pointers of how to go about
this.

FYI, SQL isn't a great option for this part of the product, we're
generating and compiling Java classes at runtime and generating SQL to do
the queries is an order of magnitude more work than indexing the relatively
few fields we need and then searching but off the bat the paging would be
an issue as there can be several million matches to a query. Can't have
Ignite pulling all of those into memory.

Thanks in advance

Courtney