Re: Role of H2 datbase in Apache Ignite

2018-10-11 Thread eugene miretsky
Thanks!

So does it mean that CacheConfiguration.queryParallelism is really an H2
settings?


On Tue, Oct 9, 2018 at 4:27 PM Stanislav Lukyanov 
wrote:

> In short, Ignite replaces H2’s storage level with its own.
>
> For example, Ignite implements H2’s Index interface with its own off-heap
> data structures underneath.
>
> When Ignite executes an SQL query, it will ask H2 to process it, then H2
> will callback to Ignite’s implementations
>
> of H2’s interfaces (such as Index) to actually retrieve the data.
>
> I guess the on-heap processing is mostly H2, although there is a lot of
> work done by Ignite to make the distributed
>
> map-reduce work like creating temporary tables for intermediate results.
>
>
>
> Stan
>
>
>
> *From: *eugene miretsky 
> *Sent: *9 октября 2018 г. 21:52
> *To: *user@ignite.apache.org
> *Subject: *Re: Role of H2 datbase in Apache Ignite
>
>
>
> Hello,
>
>
>
> I have been struggling with this question myself for a while now too.
>
> I think the documents are very ambiguous on how exactly H2 is being used.
>
>
>
> The document that you linked say
>
> "Apache Ignite leverages from H2's SQL query parser and optimizer as well
> as the execution planner. Lastly, *H2 executes a query locally* on a
> particular node and passes a local result to a distributed Ignite SQL
> engine for further processing."
>
>
>
> And
>
> "However, *the data, as well as the indexes, are always stored in the
> Ignite that executes queries* in a distributed and fault-tolerant manner
> which is not supported by H2."
>
>
>
> To me, this leaves a lot of ambiguity on how H2 is leveraged on a single
> Ignite node.  (I get that the Reduce stage, as well as distributed
> transactions, are handled by Ignite, but how about the 'map' stage on a
> single node).
>
>
>
> How is a query executed on a single node?
>
> Example query: Select count(customer_id) from user where (age > 20) group
> by customer_id
>
>
>
> What steps are taken?
>
>1. execution plan: H2 creates an execution plan
>2. data retrieval:  Since data is stored off-heap, it has to be
>brought into heap.  Does H2 have anything to do with this step, or is it
>only Ignite? When are indexes used for that?
>3. Query execution: Once the data is on heap, what executes the Query
>(the group_by, aggregations, filters that were not handled by indexes,
>etc.)? H2 or Ignite?
>
>
>
>
>
>
>
> On Fri, Sep 21, 2018 at 9:27 AM Mikhail 
> wrote:
>
> Hi,
>
> Could you please formulate your question? Because right not your message
> looks like a request for google.
> I think the following article has answer for your question:
> https://apacheignite-sql.readme.io/docs/how-ignite-sql-works
>
> Thanks,
> Mike.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>
>
>


Re: Query 3x slower with index

2018-10-11 Thread eugene miretsky
Thanks!

Could you please clarfiy "*In case of a composite index, it will apply the
columns one by one"? *

Igntie (or rather H2?) needs to load the data into heap in order to do the
groupBy & aggregations. We were hoping that only data that matches the
category filter will be loaded.
*What does one by one mean when: (assuming and index *(customer_id,
category_id)*) *

   1. *The fiilter is on both customer  and category. What data will be
   loaded into Heap?*
   2. *The fitler is only on **category, and the customer is just used for
   groupBy. Will Ignite*
  1. * load one customer with all the rows, and apply the category
  filter in heap*
  2.  *load one customer, but load only the rows that pass the category
  fitler in heap*
  3. *load all the events that pass the category filter, and then group
  them by customer. *

*From out benchmarking so far it seems like 1 is happening. *

On Thu, Oct 11, 2018 at 1:28 PM Stanislav Lukyanov 
wrote:

> Hi,
>
>
>
> It is a rather lengthy thread and I can’t dive into details right now,
>
> but AFAICS the issue now is making affinity key index to work with a
> secondary index.
>
> The important things to understand is
>
>1. Ignite will only use one index per table
>2. In case of a composite index, it will apply the columns one by one
>3. The affinity key index should always go first as the first step is
>splitting the query by affinity key values
>
>
>
> So, to use index over the affinity key (customer_id) and a secondary index
> (category_id) one needs to create an index
>
> like (customer_id, category_id), in that order, with no columns in between.
>
> Note that index (customer_id, dt, category_id) can’t be used instead of it.
>
> On the other hand, (customer_id, category_id, dt) can - the last part of
> the index will be left unused.
>
>
>
> Thanks,
>
> Stan
>
>
>
> *From: *eugene miretsky 
> *Sent: *9 октября 2018 г. 19:40
> *To: *user@ignite.apache.org
> *Subject: *Re: Query 3x slower with index
>
>
>
> Hi Ilya,
>
>
>
> I have tried it, and got the same performance as forcing using category
> index in my initial benchmark - query is 3x slowers and uses only one
> thread.
>
>
>
> From my experiments so far it seems like Ignite can either (a) use
> affinity key and run queries in parallel, (b) use index but run the query
> on only one thread.
>
>
>
> Has anybody been able to run OLAP like queries in while using an index?
>
>
>
> Cheers,
>
> Eugene
>
>
>
> On Mon, Sep 24, 2018 at 10:55 AM Ilya Kasnacheev <
> ilya.kasnach...@gmail.com> wrote:
>
> Hello!
>
>
>
> I guess that using AFFINITY_KEY as index have something to do with the
> fact that GROUP BY really wants to work per-partition.
>
>
>
> I have the following query for you:
>
>
>
> 1: jdbc:ignite:thin://localhost> explain Select count(*) FROM( Select
> customer_id from (Select customer_id, product_views_app, product_clict_app
> from GA_DATA ga join table(category_id int = ( 117930, 175930,
> 175940,175945,101450)) cats on cats.category_id = ga.category_id) data
> group by customer_id having SUM(product_views_app) > 2 OR
> SUM(product_clict_app) > 1);
> PLAN  SELECT
> DATA__Z2.CUSTOMER_ID AS __C0_0,
> SUM(DATA__Z2.PRODUCT_VIEWS_APP) AS __C0_1,
> SUM(DATA__Z2.PRODUCT_CLICT_APP) AS __C0_2
> FROM (
> SELECT
> GA__Z0.CUSTOMER_ID,
> GA__Z0.PRODUCT_VIEWS_APP,
> GA__Z0.PRODUCT_CLICT_APP
> FROM TABLE(CATEGORY_ID INTEGER=(117930, 175930, 175940, 175945,
> 101450)) CATS__Z1
> INNER JOIN PUBLIC.GA_DATA GA__Z0
> ON 1=1
> WHERE CATS__Z1.CATEGORY_ID = GA__Z0.CATEGORY_ID
> ) DATA__Z2
> /* SELECT
> GA__Z0.CUSTOMER_ID,
> GA__Z0.PRODUCT_VIEWS_APP,
> GA__Z0.PRODUCT_CLICT_APP
> FROM TABLE(CATEGORY_ID INTEGER=(117930, 175930, 175940, 175945,
> 101450)) CATS__Z1
> /++ function ++/
> INNER JOIN PUBLIC.GA_DATA GA__Z0
> /++ PUBLIC.GA_CATEGORY_ID: CATEGORY_ID = CATS__Z1.CATEGORY_ID ++/
> ON 1=1
> WHERE CATS__Z1.CATEGORY_ID = GA__Z0.CATEGORY_ID
>  */
> GROUP BY DATA__Z2.CUSTOMER_ID
>
> PLAN  SELECT
> COUNT(*)
> FROM (
> SELECT
> __C0_0 AS CUSTOMER_ID
> FROM PUBLIC.__T0
> GROUP BY __C0_0
> HAVING (SUM(__C0_1) > 2)
> OR (SUM(__C0_2) > 1)
> ) _18__Z3
> /* SELECT
> __C0_0 AS CUSTOMER_ID
> FROM PUBLIC.__T0
> /++ PUBLIC."merge_scan" ++/
> GROUP BY __C0_0
> HAVING (SUM(__C0_1) > 2)
> OR (SUM(__C0_2) > 1)
>  */
>
>
>
> However

Re: Role of H2 datbase in Apache Ignite

2018-10-09 Thread eugene miretsky
Hello,

I have been struggling with this question myself for a while now too.
I think the documents are very ambiguous on how exactly H2 is being used.

The document that you linked say
"Apache Ignite leverages from H2's SQL query parser and optimizer as well
as the execution planner. Lastly, *H2 executes a query locally* on a
particular node and passes a local result to a distributed Ignite SQL
engine for further processing."

And
"However, *the data, as well as the indexes, are always stored in the
Ignite that executes queries* in a distributed and fault-tolerant manner
which is not supported by H2."

To me, this leaves a lot of ambiguity on how H2 is leveraged on a single
Ignite node.  (I get that the Reduce stage, as well as distributed
transactions, are handled by Ignite, but how about the 'map' stage on a
single node).

How is a query executed on a single node?
Example query: Select count(customer_id) from user where (age > 20) group
by customer_id

What steps are taken?

   1. execution plan: H2 creates an execution plan
   2. data retrieval:  Since data is stored off-heap, it has to be brought
   into heap.  Does H2 have anything to do with this step, or is it only
   Ignite? When are indexes used for that?
   3. Query execution: Once the data is on heap, what executes the Query
   (the group_by, aggregations, filters that were not handled by indexes,
   etc.)? H2 or Ignite?




On Fri, Sep 21, 2018 at 9:27 AM Mikhail  wrote:

> Hi,
>
> Could you please formulate your question? Because right not your message
> looks like a request for google.
> I think the following article has answer for your question:
> https://apacheignite-sql.readme.io/docs/how-ignite-sql-works
>
> Thanks,
> Mike.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Query 3x slower with index

2018-10-09 Thread eugene miretsky
Hi Ilya,

I have tried it, and got the same performance as forcing using category
index in my initial benchmark - query is 3x slowers and uses only one
thread.

>From my experiments so far it seems like Ignite can either (a) use
affinity key and run queries in parallel, (b) use index but run the query
on only one thread.

Has anybody been able to run OLAP like queries in while using an index?

Cheers,
Eugene

On Mon, Sep 24, 2018 at 10:55 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> I guess that using AFFINITY_KEY as index have something to do with the
> fact that GROUP BY really wants to work per-partition.
>
> I have the following query for you:
>
> 1: jdbc:ignite:thin://localhost> explain Select count(*) FROM( Select
> customer_id from (Select customer_id, product_views_app, product_clict_app
> from GA_DATA ga join table(category_id int = ( 117930, 175930,
> 175940,175945,101450)) cats on cats.category_id = ga.category_id) data
> group by customer_id having SUM(product_views_app) > 2 OR
> SUM(product_clict_app) > 1);
> PLAN  SELECT
> DATA__Z2.CUSTOMER_ID AS __C0_0,
> SUM(DATA__Z2.PRODUCT_VIEWS_APP) AS __C0_1,
> SUM(DATA__Z2.PRODUCT_CLICT_APP) AS __C0_2
> FROM (
> SELECT
> GA__Z0.CUSTOMER_ID,
> GA__Z0.PRODUCT_VIEWS_APP,
> GA__Z0.PRODUCT_CLICT_APP
> FROM TABLE(CATEGORY_ID INTEGER=(117930, 175930, 175940, 175945,
> 101450)) CATS__Z1
> INNER JOIN PUBLIC.GA_DATA GA__Z0
> ON 1=1
> WHERE CATS__Z1.CATEGORY_ID = GA__Z0.CATEGORY_ID
> ) DATA__Z2
> /* SELECT
> GA__Z0.CUSTOMER_ID,
> GA__Z0.PRODUCT_VIEWS_APP,
> GA__Z0.PRODUCT_CLICT_APP
> FROM TABLE(CATEGORY_ID INTEGER=(117930, 175930, 175940, 175945,
> 101450)) CATS__Z1
> /++ function ++/
> INNER JOIN PUBLIC.GA_DATA GA__Z0
> /++ PUBLIC.GA_CATEGORY_ID: CATEGORY_ID = CATS__Z1.CATEGORY_ID ++/
> ON 1=1
> WHERE CATS__Z1.CATEGORY_ID = GA__Z0.CATEGORY_ID
>  */
> GROUP BY DATA__Z2.CUSTOMER_ID
>
> PLAN  SELECT
> COUNT(*)
> FROM (
> SELECT
> __C0_0 AS CUSTOMER_ID
> FROM PUBLIC.__T0
> GROUP BY __C0_0
> HAVING (SUM(__C0_1) > 2)
> OR (SUM(__C0_2) > 1)
> ) _18__Z3
> /* SELECT
> __C0_0 AS CUSTOMER_ID
> FROM PUBLIC.__T0
> /++ PUBLIC."merge_scan" ++/
> GROUP BY __C0_0
> HAVING (SUM(__C0_1) > 2)
> OR (SUM(__C0_2) > 1)
>  */
>
> However, I'm not sure it is "optimal" or not since I have no idea if it
> will perform better or worse on real data. That's why I need a subset of
> data which will make query execution speed readily visible. Unfortunately,
> I can't deduce that from query plan alone.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 24 сент. 2018 г. в 16:14, eugene miretsky :
>
>> An easy way to reproduce would be to
>>
>> 1. Create table
>>
>> CREATE TABLE GA_DATA (
>> customer_id bigint,
>> dt timestamp,
>> category_id int,
>> product_views_app int,
>> product_clict_app int,
>> product_clict_web int,
>> product_clict_web int,
>> PRIMARY KEY (customer_id, dt, category_id)
>> ) WITH "template=ga_template, backups=0, affinityKey=customer_id";
>>
>> 2. Create indexes
>>
>>- CREATE INDEX ga_customer_id ON GA_Data (customer_id)
>>- CREATE INDEX ga_pKey ON GA_Data (customer_id, dt, category_id)
>>- CREATE INDEX ga_category_and_customer_id ON GA_Data (category_id,
>>customer_id)
>>- CREATE INDEX ga_category_id ON GA_Data (category_id)
>>
>> 3. Run Explain on the following queries while trying forcing using
>> different indexes
>>
>>- Select count(*) FROM(
>>
>> Select customer_id from GA_DATA  use index (ga_category_id)
>> where category_id in (117930, 175930, 175940,175945,101450)
>> group by customer_id having SUM(product_views_app) > 2 OR
>> SUM(product_clicks_app) > 1 )
>>
>>
>>- Select count(*) FROM(
>>
>> Select customer_id from GA_DATA ga use index (ga_pKey)
>> join table(category_id int = ( 117930, 175930, 175940,175945,101450))
>> cats on cats.category_id = ga.category_id
>> group by customer_id having SUM(product_views_app) > 2 OR
>> SUM(product_clicks_app) > 1
>> )
>>
>> The execution plans will be similar to what I have posted earler. In
>> particular, only on of (a) affinty key index, (b) category_id index will be
>> used.
>>
>> On Fri, Sep 21, 2018 at 8:49 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
&g

Ignite + Spark: json4s versions are incompatible

2018-09-26 Thread eugene miretsky
Hello,

Spark provides json4s 3.2.X, while Ignite uses the newest version. This
seems to cause an error when using some spark SQL commands that use a
json4s methods that no longer exist.

Adding Ignite to our existing Spark code bases seems to break things.

How do people work around this issue?

Stack trace:

[info] Caused by: java.lang.NoSuchMethodError:
org.json4s.jackson.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue;
[info] at
org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:108)
[info] at
org.apache.spark.sql.types.StructType$$anonfun$6.apply(StructType.scala:414)
[info] at
org.apache.spark.sql.types.StructType$$anonfun$6.apply(StructType.scala:414)
[info] at scala.util.Try$.apply(Try.scala:192)
[info] at
org.apache.spark.sql.types.StructType$.fromString(StructType.scala:414)
[info] at
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:80)
[info] at
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:341)
[info] at
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
[info] at
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37)
[info] at
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:159)
[info] at
org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:303)
[info] at
org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:312)
[info] at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
[info] at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:254)
[info] at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
[info] at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:259)
[info] ... 8 more


Re: Query 3x slower with index

2018-09-24 Thread eugene miretsky
An easy way to reproduce would be to

1. Create table

CREATE TABLE GA_DATA (
customer_id bigint,
dt timestamp,
category_id int,
product_views_app int,
product_clict_app int,
product_clict_web int,
product_clict_web int,
PRIMARY KEY (customer_id, dt, category_id)
) WITH "template=ga_template, backups=0, affinityKey=customer_id";

2. Create indexes

   - CREATE INDEX ga_customer_id ON GA_Data (customer_id)
   - CREATE INDEX ga_pKey ON GA_Data (customer_id, dt, category_id)
   - CREATE INDEX ga_category_and_customer_id ON GA_Data (category_id,
   customer_id)
   - CREATE INDEX ga_category_id ON GA_Data (category_id)

3. Run Explain on the following queries while trying forcing using
different indexes

   - Select count(*) FROM(

Select customer_id from GA_DATA  use index (ga_category_id)
where category_id in (117930, 175930, 175940,175945,101450)
group by customer_id having SUM(product_views_app) > 2 OR
SUM(product_clicks_app) > 1 )


   - Select count(*) FROM(

Select customer_id from GA_DATA ga use index (ga_pKey)
join table(category_id int = ( 117930, 175930, 175940,175945,101450))
cats on cats.category_id = ga.category_id
group by customer_id having SUM(product_views_app) > 2 OR
SUM(product_clicks_app) > 1
)

The execution plans will be similar to what I have posted earler. In
particular, only on of (a) affinty key index, (b) category_id index will be
used.

On Fri, Sep 21, 2018 at 8:49 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> Can you share a reproducer project which loads (or generates) data for
> caches and then queries them? I could try and debug it if I had the
> reproducer.
>
> Regards.
> --
> Ilya Kasnacheev
>
>
> чт, 20 сент. 2018 г. в 21:05, eugene miretsky :
>
>> Thanks Ilya,
>>
>> Tried it, no luck. It performs the same as when using category_id index
>> alone (slow).
>>   Any combindation I try either uses AFFINITY_KEY or category index. When
>> it uses category index it runs slowers.
>>
>> Also, when AFFINITY_KEY key is used, the jobs runs on 32 threads (my
>> query parallelism settings ) when category_id is used, the jobs runs on one
>> thread most of the time (first few seconds it looks like more threads are
>> doing work).
>>
>> Please help on this. It seems like a very simple use case (using affinity
>> key and another index), either I am doing something extremly silly, or I
>> stumbled on a bug in Ignite that's effecting a lot of people.
>>
>> Cheers,
>> Eugene
>>
>> On Thu, Sep 20, 2018 at 6:22 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> > 2) ga_customer_and_category_id: on customer_id and category_id
>>>
>>> Have you tried to do an index on category_id first, customer_id second?
>>> Note that Ignite will use only one index when joining two tables and that
>>> in your case it should start with category_id.
>>>
>>> You can also try adding affinity key to this index in various places,
>>> see if it helps further.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> ср, 19 сент. 2018 г. в 21:27, eugene miretsky >> >:
>>>
>>>> Hi Ilya,
>>>>
>>>> I created 4 indexs on the table:
>>>> 1) ga_pKey: on customer_id, dt, category_id (that's our primary key
>>>> columns)
>>>> 2) ga_customer_and_category_id: on customer_id and category_id
>>>> 2) ga_customer_id: on customer_id
>>>> 4) ga_category_id: on category_id
>>>>
>>>>
>>>> For the first query (category in ()), the execution plan when using the
>>>> first 3 index is exactly the same  - using /* PUBLIC.AFFINITY_KEY */
>>>> When using #4 (alone or in combination with any of the other 3)
>>>>
>>>>1. /* PUBLIC.AFFINITY_KEY */ is replaced with  /*
>>>>PUBLIC.GA_CATEGORY_ID: CATEGORY_ID IN(117930, 175930, 175940, 175945,
>>>>101450) */
>>>>2. The query runs slower.
>>>>
>>>> For the second query (join on an inlined table) the behaviour is very
>>>> similar. Using the first 3 indexes results in the same plan - using  /*
>>>> PUBLIC.AFFINITY_KEY */ and  /* function: CATEGORY_ID = GA__Z0.CATEGORY_ID
>>>> */.
>>>> When using #4 (alone or in combination with any of the other 3)
>>>>
>>>>1. /* function */ and /* PUBLIC.GA_CATEGORY_ID: CATEGORY_ID =
>>>>CATS__Z1.CATEGORY_ID */ are used
>>>>2. The query is much slower.
>>>>
>>&g

Re: Query 3x slower with index

2018-09-20 Thread eugene miretsky
Thanks Ilya,

Tried it, no luck. It performs the same as when using category_id index
alone (slow).
  Any combindation I try either uses AFFINITY_KEY or category index. When
it uses category index it runs slowers.

Also, when AFFINITY_KEY key is used, the jobs runs on 32 threads (my query
parallelism settings ) when category_id is used, the jobs runs on one
thread most of the time (first few seconds it looks like more threads are
doing work).

Please help on this. It seems like a very simple use case (using affinity
key and another index), either I am doing something extremly silly, or I
stumbled on a bug in Ignite that's effecting a lot of people.

Cheers,
Eugene

On Thu, Sep 20, 2018 at 6:22 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> > 2) ga_customer_and_category_id: on customer_id and category_id
>
> Have you tried to do an index on category_id first, customer_id second?
> Note that Ignite will use only one index when joining two tables and that
> in your case it should start with category_id.
>
> You can also try adding affinity key to this index in various places, see
> if it helps further.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 19 сент. 2018 г. в 21:27, eugene miretsky :
>
>> Hi Ilya,
>>
>> I created 4 indexs on the table:
>> 1) ga_pKey: on customer_id, dt, category_id (that's our primary key
>> columns)
>> 2) ga_customer_and_category_id: on customer_id and category_id
>> 2) ga_customer_id: on customer_id
>> 4) ga_category_id: on category_id
>>
>>
>> For the first query (category in ()), the execution plan when using the
>> first 3 index is exactly the same  - using /* PUBLIC.AFFINITY_KEY */
>> When using #4 (alone or in combination with any of the other 3)
>>
>>1. /* PUBLIC.AFFINITY_KEY */ is replaced with  /*
>>PUBLIC.GA_CATEGORY_ID: CATEGORY_ID IN(117930, 175930, 175940, 175945,
>>101450) */
>>2. The query runs slower.
>>
>> For the second query (join on an inlined table) the behaviour is very
>> similar. Using the first 3 indexes results in the same plan - using  /*
>> PUBLIC.AFFINITY_KEY */ and  /* function: CATEGORY_ID = GA__Z0.CATEGORY_ID
>> */.
>> When using #4 (alone or in combination with any of the other 3)
>>
>>1. /* function */ and /* PUBLIC.GA_CATEGORY_ID: CATEGORY_ID =
>>CATS__Z1.CATEGORY_ID */ are used
>>2. The query is much slower.
>>
>>
>> Theoretically the query seems pretty simple
>>
>>1. Use affinity key  to make sure the query runs in parallel and
>>there are no shuffles
>>2. Filter rows that match category_id using the category_id index
>>3. Used customer_id index for the group_by (not sure if this step
>>makes sense)
>>
>> But I cannot get it to work.
>>
>> Cheers,
>> Eugene
>>
>>
>>
>>
>> On Tue, Sep 18, 2018 at 10:56 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> I can see you try to use _key_PK as index. If your primary key is
>>> composite, it won't work properly for you. I recommend creating an explicit
>>> (category_id, customer_id) index.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> вт, 18 сент. 2018 г. в 17:47, eugene miretsky >> >:
>>>
>>>> Hi Ilya,
>>>>
>>>> The different query result was my mistake - one of the categoy_ids was
>>>> duplicate, so in the query that used join, it counted rows for that
>>>> category twice. My apologies.
>>>>
>>>> However, we are still having an issue with query time, and the index
>>>> not being applied to category_id. Would appreciate if you could take a
>>>> look.
>>>>
>>>> Cheers,
>>>> Eugene
>>>>
>>>> On Mon, Sep 17, 2018 at 9:15 AM Ilya Kasnacheev <
>>>> ilya.kasnach...@gmail.com> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> Why don't you diff the results of those two queries, tell us what the
>>>>> difference is?
>>>>>
>>>>> Regards,
>>>>> --
>>>>> Ilya Kasnacheev
>>>>>
>>>>>
>>>>> пн, 17 сент. 2018 г. в 16:08, eugene miretsky <
>>>>> eugene.miret...@gmail.com>:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Just wanted to see if anybody had time to look into this.
>>>>>>
>>>>>> Cheers,
>>>&

Re: Query 3x slower with index

2018-09-19 Thread eugene miretsky
Hi Ilya,

I created 4 indexs on the table:
1) ga_pKey: on customer_id, dt, category_id (that's our primary key columns)
2) ga_customer_and_category_id: on customer_id and category_id
2) ga_customer_id: on customer_id
4) ga_category_id: on category_id


For the first query (category in ()), the execution plan when using the
first 3 index is exactly the same  - using /* PUBLIC.AFFINITY_KEY */
When using #4 (alone or in combination with any of the other 3)

   1. /* PUBLIC.AFFINITY_KEY */ is replaced with  /* PUBLIC.GA_CATEGORY_ID:
   CATEGORY_ID IN(117930, 175930, 175940, 175945, 101450) */
   2. The query runs slower.

For the second query (join on an inlined table) the behaviour is very
similar. Using the first 3 indexes results in the same plan - using  /*
PUBLIC.AFFINITY_KEY */ and  /* function: CATEGORY_ID = GA__Z0.CATEGORY_ID
*/.
When using #4 (alone or in combination with any of the other 3)

   1. /* function */ and /* PUBLIC.GA_CATEGORY_ID: CATEGORY_ID =
   CATS__Z1.CATEGORY_ID */ are used
   2. The query is much slower.


Theoretically the query seems pretty simple

   1. Use affinity key  to make sure the query runs in parallel and there
   are no shuffles
   2. Filter rows that match category_id using the category_id index
   3. Used customer_id index for the group_by (not sure if this step makes
   sense)

But I cannot get it to work.

Cheers,
Eugene




On Tue, Sep 18, 2018 at 10:56 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> I can see you try to use _key_PK as index. If your primary key is
> composite, it won't work properly for you. I recommend creating an explicit
> (category_id, customer_id) index.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 18 сент. 2018 г. в 17:47, eugene miretsky :
>
>> Hi Ilya,
>>
>> The different query result was my mistake - one of the categoy_ids was
>> duplicate, so in the query that used join, it counted rows for that
>> category twice. My apologies.
>>
>> However, we are still having an issue with query time, and the index not
>> being applied to category_id. Would appreciate if you could take a look.
>>
>> Cheers,
>> Eugene
>>
>> On Mon, Sep 17, 2018 at 9:15 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> Why don't you diff the results of those two queries, tell us what the
>>> difference is?
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пн, 17 сент. 2018 г. в 16:08, eugene miretsky >> >:
>>>
>>>> Hello,
>>>>
>>>> Just wanted to see if anybody had time to look into this.
>>>>
>>>> Cheers,
>>>> Eugene
>>>>
>>>> On Wed, Sep 12, 2018 at 6:29 PM eugene miretsky <
>>>> eugene.miret...@gmail.com> wrote:
>>>>
>>>>> Thanks!
>>>>>
>>>>> Tried joining with an inlined table instead of IN as per the second
>>>>> suggestion, and it didn't quite work.
>>>>>
>>>>> Query1:
>>>>>
>>>>>- Select COUNT(*) FROM( Select customer_id from GATABLE3  use
>>>>>Index( ) where category_id in (9005, 175930, 175930, 
>>>>> 175940,175945,101450,
>>>>>6453) group by customer_id having SUM(product_views_app) > 2 OR
>>>>>SUM(product_clicks_app) > 1 )
>>>>>- exec time = 17s
>>>>>- *Result: 3105868*
>>>>>- Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
>>>>>customer_id index
>>>>>- Using an index on category_id increases the query time 33s
>>>>>
>>>>> Query2:
>>>>>
>>>>>- Select COUNT(*) FROM( Select customer_id from GATABLE3 ga  use
>>>>>index (PUBLIC."_key_PK") inner join table(category_id int = (9005, 
>>>>> 175930,
>>>>>175930, 175940,175945,101450, 6453)) cats on cats.category_id =
>>>>>ga.category_id   group by customer_id having SUM(product_views_app) > 
>>>>> 2 OR
>>>>>SUM(product_clicks_app) > 1 )
>>>>>- exec time = 38s
>>>>>- *Result: 3113921*
>>>>>- Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
>>>>>customer_id index or category_id index
>>>>>- Using an index on category_id doesnt change the run time
>>>>>
>>>>> Query plans are attached.
>>>>>
>>>>> 3 questions:
>>>>>
>>>>>1.

Re: IGNITE-8386 question (composite pKeys)

2018-09-18 Thread eugene miretsky
So how should we work around it now? Just create a new index for
(customer_id, date)?

Cheers,
Eugene

On Mon, Sep 17, 2018 at 10:52 AM Stanislav Lukyanov 
wrote:

> Hi,
>
>
>
> The thing is that the PK index is currently created roughly as
>
> CREATE INDEX T(_key)
>
> and not
>
> CREATE INDEX T(customer_id, date).
>
>
>
> You can’t use the _key column in the WHERE clause directly, so the query
> optimizer can’t use the index.
>
>
>
> After the IGNITE-8386 is fixed the index will be created as a multi-column
> index, and will behave the way you expect (e.g. it will be used instead of
> the affinity key index).
>
>
>
> Stan
>
>
>
> *From: *eugene miretsky 
> *Sent: *12 сентября 2018 г. 23:45
> *To: *user@ignite.apache.org
> *Subject: *IGNITE-8386 question (composite pKeys)
>
>
>
> Hi,
>
>
>
> A question regarding
> https://issues.apache.org/jira/browse/IGNITE-8386?focusedCommentId=16511394=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16511394
>
>
>
> It states that a pkey index with a  compoise pKey is "effectively useless".
> Could you please explain why is that? We have a pKey that we are using as
> an index.
>
>
>
> Also, on our pKey is (customer_id, date) and affinity column is
> customer_id. I have noticed that most queries use AFFINITY_KEY index.
> Looking at the source code, AFFINITY_KEY index should not even be created
> since the first field of the pKey  is the affinity key. Any idea what may
> be happening?
>
>
>
> Cheers,
>
> Eugene
>
>
>


Re: Query 3x slower with index

2018-09-18 Thread eugene miretsky
Hi Ilya,

The different query result was my mistake - one of the categoy_ids was
duplicate, so in the query that used join, it counted rows for that
category twice. My apologies.

However, we are still having an issue with query time, and the index not
being applied to category_id. Would appreciate if you could take a look.

Cheers,
Eugene

On Mon, Sep 17, 2018 at 9:15 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> Why don't you diff the results of those two queries, tell us what the
> difference is?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 17 сент. 2018 г. в 16:08, eugene miretsky :
>
>> Hello,
>>
>> Just wanted to see if anybody had time to look into this.
>>
>> Cheers,
>> Eugene
>>
>> On Wed, Sep 12, 2018 at 6:29 PM eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> Thanks!
>>>
>>> Tried joining with an inlined table instead of IN as per the second
>>> suggestion, and it didn't quite work.
>>>
>>> Query1:
>>>
>>>- Select COUNT(*) FROM( Select customer_id from GATABLE3  use Index(
>>>) where category_id in (9005, 175930, 175930, 175940,175945,101450, 6453)
>>>group by customer_id having SUM(product_views_app) > 2 OR
>>>SUM(product_clicks_app) > 1 )
>>>- exec time = 17s
>>>- *Result: 3105868*
>>>- Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
>>>customer_id index
>>>- Using an index on category_id increases the query time 33s
>>>
>>> Query2:
>>>
>>>- Select COUNT(*) FROM( Select customer_id from GATABLE3 ga  use
>>>index (PUBLIC."_key_PK") inner join table(category_id int = (9005, 
>>> 175930,
>>>175930, 175940,175945,101450, 6453)) cats on cats.category_id =
>>>ga.category_id   group by customer_id having SUM(product_views_app) > 2 
>>> OR
>>>SUM(product_clicks_app) > 1 )
>>>- exec time = 38s
>>>- *Result: 3113921*
>>>- Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
>>>customer_id index or category_id index
>>>- Using an index on category_id doesnt change the run time
>>>
>>> Query plans are attached.
>>>
>>> 3 questions:
>>>
>>>1. Why is the result differnt for the 2 queries - this is quite
>>>concerning.
>>>2. Why is the 2nd query taking longer
>>>3. Why  category_id index doesn't work in case of query 2.
>>>
>>>
>>> On Wed, Sep 5, 2018 at 8:31 AM Ilya Kasnacheev <
>>> ilya.kasnach...@gmail.com> wrote:
>>>
>>>> Hello!
>>>>
>>>> I don't think that we're able to use index with IN () clauses. Please
>>>> convert it into OR clauses.
>>>>
>>>> Please see
>>>> https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-sql-performance-and-usability-considerations
>>>>
>>>> Regards,
>>>> --
>>>> Ilya Kasnacheev
>>>>
>>>>
>>>> пн, 3 сент. 2018 г. в 12:46, Andrey Mashenkov <
>>>> andrey.mashen...@gmail.com>:
>>>>
>>>>> Hi
>>>>>
>>>>> Actually, first query uses index on affinity key which looks more
>>>>> efficient than index on category_id column.
>>>>> The first query can process groups one by one and stream partial
>>>>> results from map phase to reduce phase as it use sorted index lookup,
>>>>> while second query should process full dataset on map phase before
>>>>> pass it for reducing.
>>>>>
>>>>> Try to use composite index (customer_id, category_id).
>>>>>
>>>>> Also, SqlQueryFields.setCollocated(true) flag can help Ignite to build
>>>>> more efficient plan when group by on collocated column is used.
>>>>>
>>>>> On Sun, Sep 2, 2018 at 2:02 AM eugene miretsky <
>>>>> eugene.miret...@gmail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Schema:
>>>>>>
>>>>>>-
>>>>>>
>>>>>>PUBLIC.GATABLE2.CUSTOMER_ID
>>>>>>
>>>>>>PUBLIC.GATABLE2.DT
>>>>>>
>>>>>>PUBLIC.GATABLE2.CATEGORY_ID
>>>>>>
>>>>>>PUBLIC.GATABLE2.VERTICAL_ID
>>>>>>
>>>>>>PUBLIC.GATABLE2.SERVICE
>>>>>>
>>>>>>PUBLIC.GATABLE2.PRODUCT_VIEWS_APP
>>>>>>
>>>>>>PUBLIC.GATABLE2.PRODUCT_CLICKS_APP
>>>>>>
>>>>>>PUBLIC.GATABLE2.PRODUCT_VIEWS_WEB
>>>>>>
>>>>>>PUBLIC.GATABLE2.PRODUCT_CLICKS_WEB
>>>>>>
>>>>>>PUBLIC.GATABLE2.PDP_SESSIONS_APP
>>>>>>
>>>>>>PUBLIC.GATABLE2.PDP_SESSIONS_WEB
>>>>>>- pkey = customer_id,dt
>>>>>>- affinityKey = customer
>>>>>>
>>>>>> Query:
>>>>>>
>>>>>>- select COUNT(*) FROM( Select customer_id from GATABLE2 where
>>>>>>category_id in (175925, 101450, 9005, 175930, 175930, 
>>>>>> 175940,175945,101450,
>>>>>>6453) group by customer_id having SUM(product_views_app) > 2 OR
>>>>>>SUM(product_clicks_app) > 1 )
>>>>>>
>>>>>> The table has 600M rows.
>>>>>> At first, the query took 1m, when we added an index on category_id
>>>>>> the query started taking 3m.
>>>>>>
>>>>>> The SQL execution plan for both queries is attached.
>>>>>>
>>>>>> We are using a single x1.16xlarge insntace with query parallelism
>>>>>> set to 32
>>>>>>
>>>>>> Cheers,
>>>>>> Eugene
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Andrey V. Mashenkov
>>>>>
>>>>


Re: How much heap to allocate

2018-09-18 Thread eugene miretsky
My understanding is that lazy loading doesn't work with group_by.

On Tue, Sep 18, 2018 at 10:11 AM Mikhail 
wrote:

> Hi Eugene,
>
> >For #2: wouldn't H2 need to bring the data into the heap to make the
> queries?
> > Or at least some of the date to do the group_by and sum operation?
>
> yes, ignite will bring data from off-heap to heap, sometimes if data set is
> too big for heap memory you need to set lazy flag for your query:
>
> https://apacheignite-sql.readme.io/docs/performance-and-debugging#result-set-lazy-load
>
> Thanks,
> Mike.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Configurations precedence and consistency across the cluster

2018-09-18 Thread eugene miretsky
Thanks!

A few clarifications:
1) The first configuration with given cache name will be applied to all
nodes" - what do you mean by the first configuration? The configuration of
the first node that was started? Is there a gossip/consensus  protocol that
syncs the cache configs across the
2) We are using an xml configuration file instead of  IgniteCache.withX.,
will it work similarly?

For example, I have the following configuration files on the client and
server node
1)  What settings will be applied when I create a SQL table with
template test_template from the client?
2)  What will happen if I start another  client with different settings?

*Client:*





















*Server*























On Tue, Sep 18, 2018 at 10:01 AM akurbanov  wrote:

> Hello Eugene,
>
> 1. Dynamic cache configuration changes are not supported, except properties
> that may be overridden with IgniteCache.withX.
> 2. The first configuration with given cache name will be applied to all
> nodes. You can use the same IgniteCache.withX to put with different expiry
> policies per each client. Also you can configure different near cache
> configurations that will take effect only for operations on client where it
> was configured.
>
> Regards,
> Anton
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: SQL query and Indexes architecture

2018-09-17 Thread eugene miretsky
Thanks!

I am curious about the process of loading data from Ignite to H2 on the
fly, as H2 creating indexes but storing them in Ignite. Can you point me to
some JIRAs that discuss it, or which part of the code is responsible for
that?

On Mon, Sep 17, 2018 at 9:18 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> 1. 1. H2 executes the query, during which it has to load rows from tables,
> and Ignite does the row loading part. Then Ignite will collect query
> results on all nodes and aggregate them on a single node.
> 1. 2. Index is created by H2, but it is stored in Ignite pages (?).
> 2. Maybe you're right, I have to admit I'm unfamiliar with precise details
> here.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 17 сент. 2018 г. в 16:02, eugene miretsky :
>
>> Thanks!
>>
>>
>>1.
>>1.  "Ignite feeds H2 rows that it asks for, and H2 creates indexes on
>>   them and executes queries on them." - what exactly do you mean by 
>> that? Do
>>   you mean that all parts of a query that use indexes are executed by H2,
>>   then the actual data is retrieved from Ignite pages, and the final
>>   (non-indexed) parts of the query executed by Ignite?
>>   2.  What happens when I create an index on a new column? Is the
>>   index created in Ignite (and stored in Ignite pages?), or is it 
>> created in
>>   H2?
>>2.  The reason I was asking about AFFINITY_KEY, _key_PK and
>>_key_PK_hash indexed is that in this   code
>>
>> <https://github.com/apache/ignite/blob/56975c266e7019f307bb9da42333a6db4e47365e/modules/indexing/src/main/java/org/apache/ignite/internal/processors/query/h2/H2TableDescriptor.java>
>>  it
>>looks like they are created in H2
>>
>>
>>
>> On Mon, Sep 17, 2018 at 8:36 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> 1. H2 does not store data but, as far as my understanding goes, it
>>> created SQL indexes from data. Ignite feeds H2 rows that it asks for, and
>>> H2 creates indexes on them and executes queries on them.
>>> 2. Ignite always has special index on your key (since it's a key-value
>>> storage it can always find tuple by key). Ignite is also aware of key's
>>> hash code, and affinity key value always maps to one partition of data (of
>>> 1024 by default). Those are not H2 indexes and they're mostly used on
>>> planning stage. E.g. you can map query to one node if affinity key is
>>> present in the request.
>>> 3. Data is brought onto the heap to read any fields from row. GROUP BY
>>> will hold its tuples on heap. Ignite has configurable index inlining where
>>> you can avoid reading objects from heap just to access indexed fields.
>>> 4. With GROUP BY, lazy evaluation will not help you much. It will still
>>> have to hold all data on heap at some point. Lazy evaluation mostly helps
>>> with "SELECT * FROM table" type queries which provide very large and boring
>>> result set.
>>>
>>> Hope this helps.
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пт, 14 сент. 2018 г. в 17:39, eugene miretsky >> >:
>>>
>>>> Hello,
>>>>
>>>> Trying to understand how exactly SQL queries are executed in Ignite. A
>>>> few questions
>>>>
>>>>
>>>>1. To what extent is H2 used? Does it store the data? Does it
>>>>create the indexes? Is it used only for generating execution plans? I
>>>>believe that all the data used to be stored in H2, but with the new 
>>>> durable
>>>>memory architecture, I believe that's no longer the case.
>>>>2. Which indexes are used? Ignite creates  B+ tree indexes and
>>>>stores them in Index pages, but I also see AFFINITY_KEY, _key_PK and
>>>>_key_PK_hash indexes created in H2.
>>>>3. When is data brought onto the heap? I am assuming that groupby
>>>>and aggregate require all the matching queries to first be copied from
>>>>off-heap to heap
>>>>4. How does lazy evaluation work? For example, for group_by, does
>>>>it bring batches of matching records with the same group_by key onto the
>>>>heap?
>>>>
>>>> I am not necessarily looking for the exact answers, but rather pointer
>>>> in the right direction (documentation, code, jiras)
>>>>
>>>> Cheers,
>>>> Eugene
>>>>
>>>


Re: Query 3x slower with index

2018-09-17 Thread eugene miretsky
Hello,

Just wanted to see if anybody had time to look into this.

Cheers,
Eugene

On Wed, Sep 12, 2018 at 6:29 PM eugene miretsky 
wrote:

> Thanks!
>
> Tried joining with an inlined table instead of IN as per the second
> suggestion, and it didn't quite work.
>
> Query1:
>
>- Select COUNT(*) FROM( Select customer_id from GATABLE3  use Index( )
>where category_id in (9005, 175930, 175930, 175940,175945,101450, 6453)
>group by customer_id having SUM(product_views_app) > 2 OR
>SUM(product_clicks_app) > 1 )
>- exec time = 17s
>- *Result: 3105868*
>- Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
>customer_id index
>- Using an index on category_id increases the query time 33s
>
> Query2:
>
>- Select COUNT(*) FROM( Select customer_id from GATABLE3 ga  use index
>(PUBLIC."_key_PK") inner join table(category_id int = (9005, 175930,
>175930, 175940,175945,101450, 6453)) cats on cats.category_id =
>ga.category_id   group by customer_id having SUM(product_views_app) > 2 OR
>SUM(product_clicks_app) > 1 )
>- exec time = 38s
>- *Result: 3113921*
>- Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
>customer_id index or category_id index
>- Using an index on category_id doesnt change the run time
>
> Query plans are attached.
>
> 3 questions:
>
>1. Why is the result differnt for the 2 queries - this is quite
>concerning.
>2. Why is the 2nd query taking longer
>3. Why  category_id index doesn't work in case of query 2.
>
>
> On Wed, Sep 5, 2018 at 8:31 AM Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> I don't think that we're able to use index with IN () clauses. Please
>> convert it into OR clauses.
>>
>> Please see
>> https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-sql-performance-and-usability-considerations
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 3 сент. 2018 г. в 12:46, Andrey Mashenkov > >:
>>
>>> Hi
>>>
>>> Actually, first query uses index on affinity key which looks more
>>> efficient than index on category_id column.
>>> The first query can process groups one by one and stream partial results
>>> from map phase to reduce phase as it use sorted index lookup,
>>> while second query should process full dataset on map phase before pass
>>> it for reducing.
>>>
>>> Try to use composite index (customer_id, category_id).
>>>
>>> Also, SqlQueryFields.setCollocated(true) flag can help Ignite to build
>>> more efficient plan when group by on collocated column is used.
>>>
>>> On Sun, Sep 2, 2018 at 2:02 AM eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> Schema:
>>>>
>>>>-
>>>>
>>>>PUBLIC.GATABLE2.CUSTOMER_ID
>>>>
>>>>PUBLIC.GATABLE2.DT
>>>>
>>>>PUBLIC.GATABLE2.CATEGORY_ID
>>>>
>>>>PUBLIC.GATABLE2.VERTICAL_ID
>>>>
>>>>PUBLIC.GATABLE2.SERVICE
>>>>
>>>>PUBLIC.GATABLE2.PRODUCT_VIEWS_APP
>>>>
>>>>PUBLIC.GATABLE2.PRODUCT_CLICKS_APP
>>>>
>>>>PUBLIC.GATABLE2.PRODUCT_VIEWS_WEB
>>>>
>>>>PUBLIC.GATABLE2.PRODUCT_CLICKS_WEB
>>>>
>>>>PUBLIC.GATABLE2.PDP_SESSIONS_APP
>>>>
>>>>PUBLIC.GATABLE2.PDP_SESSIONS_WEB
>>>>- pkey = customer_id,dt
>>>>- affinityKey = customer
>>>>
>>>> Query:
>>>>
>>>>- select COUNT(*) FROM( Select customer_id from GATABLE2 where
>>>>category_id in (175925, 101450, 9005, 175930, 175930, 
>>>> 175940,175945,101450,
>>>>6453) group by customer_id having SUM(product_views_app) > 2 OR
>>>>SUM(product_clicks_app) > 1 )
>>>>
>>>> The table has 600M rows.
>>>> At first, the query took 1m, when we added an index on category_id the
>>>> query started taking 3m.
>>>>
>>>> The SQL execution plan for both queries is attached.
>>>>
>>>> We are using a single x1.16xlarge insntace with query parallelism set
>>>> to 32
>>>>
>>>> Cheers,
>>>> Eugene
>>>>
>>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>


Re: Backup failover with persistence

2018-09-17 Thread eugene miretsky
Thanks Ilya,


   1. "So all nodes will know when node A begins hosting that partition as
   primary" - how is that consensus achieved? Will it result in partition map
   exchange and new topology version?
   2. What I actually meant is that it is impossible to know when Node A is
   fully caught up to node B unless you stop all the writes to Node B while
   node A is catching up. So how does Ignite know that it is safe to set A to
   primary again?


On Mon, Sep 17, 2018 at 8:48 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> Apache Ignite is NOT "eventually consistent" if you ask that. Apache
> Ignite is strongly consistent. It has discovery ring (or discovery star
> with Zk) which allows messages to be sent and acknowledged by all nodes.
>
> So all nodes will know when node A begins hosting that partition as
> primary.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 17 сент. 2018 г. в 15:45, eugene miretsky :
>
>> How is "finish syncing" defined? Since it is a distributed system that is
>> no way to guarantee that node A is 100% caught up to node B. In Kafka there
>> is a replica.lag.time.max.ms settings, is there something similar in
>> Ignite?
>>
>>
>>
>> On Mon, Sep 17, 2018 at 8:37 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> Node A will have two choices: either drop partition completely and
>>> re-download it from B, or replicate recent changes on it. Either one will
>>> be choosed internally.
>>> Node A will only become primary again when it finishes syncing that
>>> partition.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пт, 14 сент. 2018 г. в 22:23, eugene miretsky >> >:
>>>
>>>> What is the process when a node goes down and then restarts?
>>>>
>>>> Say backups = 1. We have node A that is primary for some key, and node
>>>> B that is back up.
>>>>
>>>> Node A goes down and then restarts after 5 min. What are the steps?
>>>> 1) Node A is servicing all traffic for key X
>>>> 2) Node A goes down
>>>> 3) Node B starts serving all traffic for key X (I guess the clients
>>>> detect the failover and start calling node B )
>>>> 4) Node A comes back up
>>>> 5) WAL replication is initiated
>>>>
>>>> What happens next? When does node A become the primary again? How are
>>>> in-flight updates happen?
>>>>
>>>>


Re: SQL query and Indexes architecture

2018-09-17 Thread eugene miretsky
Thanks!


   1.
   1.  "Ignite feeds H2 rows that it asks for, and H2 creates indexes on
  them and executes queries on them." - what exactly do you mean
by that? Do
  you mean that all parts of a query that use indexes are executed by H2,
  then the actual data is retrieved from Ignite pages, and the final
  (non-indexed) parts of the query executed by Ignite?
  2.  What happens when I create an index on a new column? Is the index
  created in Ignite (and stored in Ignite pages?), or is it created in H2?
   2.  The reason I was asking about AFFINITY_KEY, _key_PK and _key_PK_hash
   indexed is that in this   code
   
<https://github.com/apache/ignite/blob/56975c266e7019f307bb9da42333a6db4e47365e/modules/indexing/src/main/java/org/apache/ignite/internal/processors/query/h2/H2TableDescriptor.java>
it
   looks like they are created in H2



On Mon, Sep 17, 2018 at 8:36 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> 1. H2 does not store data but, as far as my understanding goes, it created
> SQL indexes from data. Ignite feeds H2 rows that it asks for, and H2
> creates indexes on them and executes queries on them.
> 2. Ignite always has special index on your key (since it's a key-value
> storage it can always find tuple by key). Ignite is also aware of key's
> hash code, and affinity key value always maps to one partition of data (of
> 1024 by default). Those are not H2 indexes and they're mostly used on
> planning stage. E.g. you can map query to one node if affinity key is
> present in the request.
> 3. Data is brought onto the heap to read any fields from row. GROUP BY
> will hold its tuples on heap. Ignite has configurable index inlining where
> you can avoid reading objects from heap just to access indexed fields.
> 4. With GROUP BY, lazy evaluation will not help you much. It will still
> have to hold all data on heap at some point. Lazy evaluation mostly helps
> with "SELECT * FROM table" type queries which provide very large and boring
> result set.
>
> Hope this helps.
> --
> Ilya Kasnacheev
>
>
> пт, 14 сент. 2018 г. в 17:39, eugene miretsky :
>
>> Hello,
>>
>> Trying to understand how exactly SQL queries are executed in Ignite. A
>> few questions
>>
>>
>>1. To what extent is H2 used? Does it store the data? Does it create
>>the indexes? Is it used only for generating execution plans? I believe 
>> that
>>all the data used to be stored in H2, but with the new durable memory
>>architecture, I believe that's no longer the case.
>>2. Which indexes are used? Ignite creates  B+ tree indexes and stores
>>them in Index pages, but I also see AFFINITY_KEY, _key_PK and _key_PK_hash
>>indexes created in H2.
>>3. When is data brought onto the heap? I am assuming that groupby and
>>aggregate require all the matching queries to first be copied from 
>> off-heap
>>to heap
>>4. How does lazy evaluation work? For example, for group_by, does it
>>bring batches of matching records with the same group_by key onto the 
>> heap?
>>
>> I am not necessarily looking for the exact answers, but rather pointer in
>> the right direction (documentation, code, jiras)
>>
>> Cheers,
>> Eugene
>>
>


Re: Backup failover with persistence

2018-09-17 Thread eugene miretsky
How is "finish syncing" defined? Since it is a distributed system that is
no way to guarantee that node A is 100% caught up to node B. In Kafka there
is a replica.lag.time.max.ms settings, is there something similar in
Ignite?



On Mon, Sep 17, 2018 at 8:37 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> Node A will have two choices: either drop partition completely and
> re-download it from B, or replicate recent changes on it. Either one will
> be choosed internally.
> Node A will only become primary again when it finishes syncing that
> partition.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 14 сент. 2018 г. в 22:23, eugene miretsky :
>
>> What is the process when a node goes down and then restarts?
>>
>> Say backups = 1. We have node A that is primary for some key, and node B
>> that is back up.
>>
>> Node A goes down and then restarts after 5 min. What are the steps?
>> 1) Node A is servicing all traffic for key X
>> 2) Node A goes down
>> 3) Node B starts serving all traffic for key X (I guess the clients
>> detect the failover and start calling node B )
>> 4) Node A comes back up
>> 5) WAL replication is initiated
>>
>> What happens next? When does node A become the primary again? How are
>> in-flight updates happen?
>>
>>


Handling split brain with Zookeeper and persistence

2018-09-14 Thread eugene miretsky
Hi,

What are best practices for handling split brain with persistence?

1) Does Zookeeper split brain resolver consider all nodes as the same
(client, memory only, persistent). Ideally, we want to shut down persistent
nodes only as last resort.
2) If a persistent node is shut down, we need to remove it from baseline
topology. Are there events we can subscribe to?

Cheers,
Eugene


Backup failover with persistence

2018-09-14 Thread eugene miretsky
What is the process when a node goes down and then restarts?

Say backups = 1. We have node A that is primary for some key, and node B
that is back up.

Node A goes down and then restarts after 5 min. What are the steps?
1) Node A is servicing all traffic for key X
2) Node A goes down
3) Node B starts serving all traffic for key X (I guess the clients detect
the failover and start calling node B )
4) Node A comes back up
5) WAL replication is initiated

What happens next? When does node A become the primary again? How are
in-flight updates happen?


Re: Configurations precedence and consistency across the cluster

2018-09-14 Thread eugene miretsky
Thanks for the response!

A few more follow up questions:
1) How can we chance configurations of persistent caches (replication and
recovery settings for example)?
2) For client related settings, are the settings taken from the server
config, or client config (partitionLosePolicy or SQL table templates)? Can
I have different policies per client?

Cheers,
Eugene

On Tue, Aug 21, 2018 at 3:26 PM akurbanov  wrote:

> Hello,
>
> 1. No. Cache with configuration specified in xml file will be pre-created
> and won't change if you will try to get it using changed configuration.
> 2. If you are asking about a single node, yes, that is correct.
> 3. You will get an exception on startup in case you have different
> configurations for the same cache.
>
> Caches were not designed to support configuration changes on the fly, you
> should destroy/recreate cache in order to apply configuration change which
> means you should also delete data from the cache. In the future, there will
> be some possibilities to change some configuration parameters in runtime,
> but definitely not the ones responsible for data flow and distribution in
> cluster.
>
> Could you please clarify what are your requirements for cache configuration
> updates?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


SQL query and Indexes architecture

2018-09-14 Thread eugene miretsky
Hello,

Trying to understand how exactly SQL queries are executed in Ignite. A few
questions


   1. To what extent is H2 used? Does it store the data? Does it create the
   indexes? Is it used only for generating execution plans? I believe that all
   the data used to be stored in H2, but with the new durable memory
   architecture, I believe that's no longer the case.
   2. Which indexes are used? Ignite creates  B+ tree indexes and stores
   them in Index pages, but I also see AFFINITY_KEY, _key_PK and _key_PK_hash
   indexes created in H2.
   3. When is data brought onto the heap? I am assuming that groupby and
   aggregate require all the matching queries to first be copied from off-heap
   to heap
   4. How does lazy evaluation work? For example, for group_by, does it
   bring batches of matching records with the same group_by key onto the heap?

I am not necessarily looking for the exact answers, but rather pointer in
the right direction (documentation, code, jiras)

Cheers,
Eugene


Re: Network Segmentation

2018-09-14 Thread eugene miretsky
What does it provide on top of the Zookeeper split brain resolver?
https://apacheignite.readme.io/docs/zookeeper-discovery

On Fri, Sep 14, 2018 at 3:59 AM Kopilov  wrote:

> luqmanahmad, what does this plugin definitely do?
> Can it help to avoid segmentation or only to detect them?
> Can it be useful for my current problem:
>
> http://apache-ignite-users.70518.x6.nabble.com/Unreasonable-segmentation-in-Kubernetes-on-one-node-reboot-tp24102.html
> ?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Failed to wait for initial partition map exchange

2018-09-12 Thread eugene miretsky
Do you have persistence enabled?

On Wed, Sep 12, 2018 at 6:31 PM ndipiazza3565 <
nicholas.dipia...@lucidworks.com> wrote:

> I'm trying to build up a list of possible causes for this issue.
>
> I'm only really interested in the issues that occur after successful
> production deployments. Meaning the environment has been up for some time
> successfully, but then later on our ignite nodes will not start and stick
>
> But as of now, a certain bad behavior from a single node in the ignite
> cluster can cause a deadlock
>
> * Anything that causes one of the ignite nodes to become unresponsive
>   * oom
>   * high gc
>   * high cpu
>   * high disk usage
> * Network issues?
>
> I'm trying to get a list of the causes for this issue so I can troubleshoot
> further.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Query 3x slower with index

2018-09-12 Thread eugene miretsky
Thanks!

Tried joining with an inlined table instead of IN as per the second
suggestion, and it didn't quite work.

Query1:

   - Select COUNT(*) FROM( Select customer_id from GATABLE3  use Index( )
   where category_id in (9005, 175930, 175930, 175940,175945,101450, 6453)
   group by customer_id having SUM(product_views_app) > 2 OR
   SUM(product_clicks_app) > 1 )
   - exec time = 17s
   - *Result: 3105868*
   - Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
   customer_id index
   - Using an index on category_id increases the query time 33s

Query2:

   - Select COUNT(*) FROM( Select customer_id from GATABLE3 ga  use index
   (PUBLIC."_key_PK") inner join table(category_id int = (9005, 175930,
   175930, 175940,175945,101450, 6453)) cats on cats.category_id =
   ga.category_id   group by customer_id having SUM(product_views_app) > 2 OR
   SUM(product_clicks_app) > 1 )
   - exec time = 38s
   - *Result: 3113921*
   - Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
   customer_id index or category_id index
   - Using an index on category_id doesnt change the run time

Query plans are attached.

3 questions:

   1. Why is the result differnt for the 2 queries - this is quite
   concerning.
   2. Why is the 2nd query taking longer
   3. Why  category_id index doesn't work in case of query 2.


On Wed, Sep 5, 2018 at 8:31 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> I don't think that we're able to use index with IN () clauses. Please
> convert it into OR clauses.
>
> Please see
> https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-sql-performance-and-usability-considerations
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 3 сент. 2018 г. в 12:46, Andrey Mashenkov  >:
>
>> Hi
>>
>> Actually, first query uses index on affinity key which looks more
>> efficient than index on category_id column.
>> The first query can process groups one by one and stream partial results
>> from map phase to reduce phase as it use sorted index lookup,
>> while second query should process full dataset on map phase before pass
>> it for reducing.
>>
>> Try to use composite index (customer_id, category_id).
>>
>> Also, SqlQueryFields.setCollocated(true) flag can help Ignite to build
>> more efficient plan when group by on collocated column is used.
>>
>> On Sun, Sep 2, 2018 at 2:02 AM eugene miretsky 
>> wrote:
>>
>>> Hello,
>>>
>>> Schema:
>>>
>>>-
>>>
>>>PUBLIC.GATABLE2.CUSTOMER_ID
>>>
>>>PUBLIC.GATABLE2.DT
>>>
>>>PUBLIC.GATABLE2.CATEGORY_ID
>>>
>>>PUBLIC.GATABLE2.VERTICAL_ID
>>>
>>>PUBLIC.GATABLE2.SERVICE
>>>
>>>PUBLIC.GATABLE2.PRODUCT_VIEWS_APP
>>>
>>>PUBLIC.GATABLE2.PRODUCT_CLICKS_APP
>>>
>>>PUBLIC.GATABLE2.PRODUCT_VIEWS_WEB
>>>
>>>PUBLIC.GATABLE2.PRODUCT_CLICKS_WEB
>>>
>>>PUBLIC.GATABLE2.PDP_SESSIONS_APP
>>>
>>>PUBLIC.GATABLE2.PDP_SESSIONS_WEB
>>>- pkey = customer_id,dt
>>>- affinityKey = customer
>>>
>>> Query:
>>>
>>>- select COUNT(*) FROM( Select customer_id from GATABLE2 where
>>>category_id in (175925, 101450, 9005, 175930, 175930, 
>>> 175940,175945,101450,
>>>6453) group by customer_id having SUM(product_views_app) > 2 OR
>>>SUM(product_clicks_app) > 1 )
>>>
>>> The table has 600M rows.
>>> At first, the query took 1m, when we added an index on category_id the
>>> query started taking 3m.
>>>
>>> The SQL execution plan for both queries is attached.
>>>
>>> We are using a single x1.16xlarge insntace with query parallelism set
>>> to 32
>>>
>>> Cheers,
>>> Eugene
>>>
>>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>


Query1_pKeyIdx
Description: Binary data


Query1_categoryIdIdx
Description: Binary data


Query2_categoryIdx
Description: Binary data


Query2_pKeyIdx
Description: Binary data


Re: How much heap to allocate

2018-09-12 Thread eugene miretsky
Thanks!

For #2: wouldn't H2 need to bring the data into the heap to make the
queries? Or at least some of the date to do the group_by and sum operation?

On Mon, Sep 10, 2018 at 6:19 AM Vladimir Ozerov 
wrote:

> Hi Eugene,
>
> Answering your questions:
> 1) Grouping is performed on both mapper and reducer (coordinator). If you
> group be affinity key, you may try setting "SqlFieldsQuery.colocated=true"
> to bypass grouping on reducer
> 2) For this specific query H2 will store (customer_id, count(*),
> sum(views)) for every customer_id. It is hard to guess how much space it
> would take in heap, but I think it would be ~50-100 bytes per customer_id.
> So if you have N customers, it would be (100 * N) bytes
> 3) Please see
> https://apacheignite-sql.readme.io/docs/performance-and-debugging
>
> Vladimir.
>
> On Thu, Aug 30, 2018 at 5:57 PM eugene miretsky 
> wrote:
>
>> Thanks against for the detailed response!
>>
>> Our main use case is preforming large SQL queries over tables with 200M+
>> rows  - wanted to give you a bit more details and context you can pass along
>>
>> A simple example would be:
>>
>>- Table: customer_id, date, category, views, clicks ( pkey =
>>"customer_id, date", affinity key = date )
>>- Query: SELECT count(*) where date < X AND categroy in (C1, C2, C3)
>>GROUP BY customer_id HAVING SUM(views) > 20
>>
>> My main concernse are
>> 1) How is the group by performed. You mentioend that it is performend on
>> the coordinator, I was coping that singe we are grouping using an colomn
>> that is an affintiy key, each node will be able to do it's own group by
>> 2) How much heap should I allocate for the group by stage
>> 3) General performance tips
>>
>> Cheers,
>> Eugene
>>
>>
>> On Thu, Aug 30, 2018 at 1:32 AM Denis Magda  wrote:
>>
>>> Eugene,
>>>
>>> Just want to be sure you know about the existence of the following pages
>>> which elaborate on Ignite memory architecture in details:
>>>
>>>-
>>>
>>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Entriesandpagesindurablememory
>>>-
>>>
>>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>>>
>>>
>>>
>>>> 1) Are indexs loaded into heap (when used)?
>>>>
>>>
>>> Something might be copied to disk but in most of the cases we perform
>>> comparisons and other operations directly off-heap.
>>> See 
>>> org.apache.ignite.internal.processors.query.h2.database.InlineIndexHelper
>>> and related classes.
>>>
>>> 2) Are full pages loaded into heap, or only the matching records?
>>>>
>>>
>>> Matching records (result set) are presently loaded. The pages are not.
>>>
>>>
>>>> 3) When the query needs more processing than the exisiting index
>>>> (non-indexed columns, groupBy, aggreag) where/how does it happen?
>>>>
>>>
>>> We will be doing a full scan. Grouping and aggregations are finalized on
>>> the query coordinator which needs to get a full result set.
>>>
>>> 4) How is the query coordinator chosen? Is it the client node? How about
>>>> when using the web console?
>>>>
>>>
>>> That's your application. Web Console uses Ignite SQL APIs as well.
>>>
>>>
>>>> 5) What paralalism settings would your recomend, we were thinking to
>>>> set parallelJobsNumber  to 1  and task parallelism to number of cores * 2 -
>>>> this way we can make sure that each job gets al the heap memory instead of
>>>> all jobs fighting each other. Not sure if it makes sense, and it will also
>>>> prevent us from making real time transactional transactional queries.(we
>>>> are hoping to use ignite for both olap and simple real time queries)
>>>
>>>
>>> I would start a separate discussion for this bringing this question to
>>> the attention of our SQL experts. I'm not the one of them.
>>>
>>> --
>>> Denis
>>>
>>> On Mon, Aug 27, 2018 at 8:54 PM eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
>>>> Denis, thanks for the detailed response.
>>>>
>>>> A few more follow up questions
>>>> 1) Are indexs loaded into heap (when used)?
>>>> 2) Are full pages loaded in

IGNITE-8386 question (composite pKeys)

2018-09-12 Thread eugene miretsky
Hi,

A question regarding
https://issues.apache.org/jira/browse/IGNITE-8386?focusedCommentId=16511394=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16511394

It states that a pkey index with a  compoise pKey is "effectively useless".
Could you please explain why is that? We have a pKey that we are using as
an index.

Also, on our pKey is (customer_id, date) and affinity column is
customer_id. I have noticed that most queries use AFFINITY_KEY index.
Looking at the source code, AFFINITY_KEY index should not even be created
since the first field of the pKey  is the affinity key. Any idea what may
be happening?

Cheers,
Eugene


Re: Partition map exchange in detail

2018-09-12 Thread eugene miretsky
Make sense
I think the actual issue that was affecting me is
https://issues.apache.org/jira/browse/IGNITE-9562. (which IEP-25 should
solve).

Final 2 questions:
1) If all NPE waits for all pending transactions
  a) What constitutes a transaction in this context? (any query, a SQL
transaction, etc)
  b) Does it mean that if the cluster constantly receives transaction
requests, NPE will never happen? (Or will all transactions that were
received after the NPE request wait for the NPE to complete?)
2) Any other advice on how to avoid NPE? (transaction timeouts, graceful
shutdown/restart of nodes, etc)

Cheers,
Eugene





On Wed, Sep 12, 2018 at 12:18 PM Pavel Kovalenko  wrote:

> Eugene,
>
> In the case of Zookeeper Discovery is enabled and communication problem
> between some nodes, a subset of problem nodes will be automatically killed
> to reach cluster state where each node can communicate with each other
> without problems. So, you're absolutely right, dead nodes will be removed
> from a cluster and will not participate in PME.
> IEP-25 is trying to solve a more general problem related only to PME.
> Network problems are only part of the problem can happen during PME. A node
> may break down before it even tried to send a message because of unexpected
> exceptions (e.g. NullPointer, Runtime, Assertion e.g.). In general, IEP-25
> tries to defend us from any kind of unexpected problems to make sure that
> PME will not be blocked in that case and the cluster will continue to live.
>
>
> ср, 12 сент. 2018 г. в 18:53, eugene miretsky :
>
>> Hi Pavel,
>>
>> The issue we are discussing is PME failing because one node cannot
>> communicate to another node, that's what IEP-25 is trying to solve. But in
>> that case (where one node is either down, or there is a communication
>> problem between two nodes) I would expect the split brain resolver to kick
>> in, and shut down one of the nodes. I would also expect the dead node to be
>> removed from the cluster, and no longer take part in PME.
>>
>>
>>
>> On Wed, Sep 12, 2018 at 11:25 AM Pavel Kovalenko 
>> wrote:
>>
>>> Hi Eugene,
>>>
>>> Sorry, but I didn't catch the meaning of your question about Zookeeper
>>> Discovery. Could you please re-phrase it?
>>>
>>> ср, 12 сент. 2018 г. в 17:54, Ilya Lantukh :
>>>
>>>> Pavel K., can you please answer about Zookeeper discovery?
>>>>
>>>> On Wed, Sep 12, 2018 at 5:49 PM, eugene miretsky <
>>>> eugene.miret...@gmail.com> wrote:
>>>>
>>>>> Thanks for the patience with my questions - just trying to understand
>>>>> the system better.
>>>>>
>>>>> 3) I was referring to
>>>>> https://apacheignite.readme.io/docs/zookeeper-discovery#section-failures-and-split-brain-handling.
>>>>> How come it doesn't get the node to shut down?
>>>>> 4) Are there any docs/JIRAs that explain how counters are used, and
>>>>> why they are required in the state?
>>>>>
>>>>> Cheers,
>>>>> Eugene
>>>>>
>>>>>
>>>>> On Wed, Sep 12, 2018 at 10:04 AM Ilya Lantukh 
>>>>> wrote:
>>>>>
>>>>>> 3) Such mechanics will be implemented in IEP-25 (linked above).
>>>>>> 4) Partition map states include update counters, which are
>>>>>> incremented on every cache update and play important role in new state
>>>>>> calculation. So, technically, every cache operation can lead to partition
>>>>>> map change, and for obvious reasons we can't route them through
>>>>>> coordinator. Ignite is a more complex system than Akka or Kafka and such
>>>>>> simple solutions won't work here (in general case). However, it is true
>>>>>> that PME could be simplified or completely avoid for certain cases and 
>>>>>> the
>>>>>> community is currently working on such optimizations (
>>>>>> https://issues.apache.org/jira/browse/IGNITE-9558 for example).
>>>>>>
>>>>>> On Wed, Sep 12, 2018 at 9:08 AM, eugene miretsky <
>>>>>> eugene.miret...@gmail.com> wrote:
>>>>>>
>>>>>>> 2b) I had a few situations where the cluster went into a state where
>>>>>>> PME constantly failed, and could never recover. I think the root cause 
>>>>>>> was
>>>>>>> that a transaction got stuck and didn't timeout/rollback.  I will try to
>>>>>>> repr

Re: Node keeps crashing under load

2018-09-12 Thread eugene miretsky
Good question :)
yardstick does this, but not sure if it is a valid prod solution.
https://github.com/apache/ignite/blob/3307a8b26ccb5f0bb7e9c387c73fd221b98ab668/modules/yardstick/src/main/java/org/apache/ignite/yardstick/jdbc/AbstractJdbcBenchmark.java

We have set preferIPv4Stack=true and provided localAddress in the config -
it seems to have solved the problem. (Didn't run it enough to be 100% sure)

On Wed, Sep 12, 2018 at 10:59 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> How would you distinguish the wrong interface (172.17.0.1) from the right
> one if you were Ignite?
>
> I think it's not the first time I have seen this problem but I have
> positively no idea how to tackle it.
> Maybe Docker experts could chime in?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 12 сент. 2018 г. в 3:29, eugene miretsky :
>
>> Thanks Ilya,
>>
>> We are writing to Ignite from Spark running in EMR. We don't know the
>> address of the node in advance, we have tried
>> 1) Set localHost in Ignite configuration to 127.0.0.1, as per the example
>> online
>> 2) Leave localHost unset, and let ignite figure out the host
>>
>> I have attached more logs at the end.
>>
>> My understanding is that Ignite should pick the first non-local address
>> to publish, however, it seems like it picks randomly one of (a) proper
>> address, (b) ipv6 address, (c) 127.0.0.1, (d)  172.17.0.1.
>>
>> A few questions:
>> 1) How do we force Spark client to use the proper address
>> 2) Where is 172.17.0.1 coming from? It is usually the default docker
>> network host address, and it seems like Ignite creates a network interface
>> for it on the instance. (otherwise I have no idea where the interface is
>> coming from)
>> 3) If there are communication errors, shouldn't the Zookeeper split brain
>> resolver kick in and shut down the dead node. Or shouldn't at least the
>> initiating node mark the remote node as dead?
>>
>> [19:36:26,189][INFO][grid-nio-worker-tcp-comm-15-#88%Server%][TcpCommunicationSpi]
>> Accepted incoming communication connection [locAddr=/172.17.0.1:47100,
>> rmtAddr=/172.21.86.7:41648]
>>
>> [19:36:26,190][INFO][grid-nio-worker-tcp-comm-3-#76%Server%][TcpCommunicationSpi]
>> Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100,
>> rmtAddr=/0:0:0:0:0:0:0:1:52484]
>>
>> [19:36:26,191][INFO][grid-nio-worker-tcp-comm-5-#78%Server%][TcpCommunicationSpi]
>> Accepted incoming communication connection [locAddr=/127.0.0.1:47100,
>> rmtAddr=/127.0.0.1:37656]
>>
>> [19:36:26,191][INFO][grid-nio-worker-tcp-comm-1-#74%Server%][TcpCommunicationSpi]
>> Established outgoing communication connection [locAddr=/172.21.86.7:53272,
>> rmtAddr=ip-172-21-86-175.ap-south-1.compute.internal/172.21.86.175:47100]
>>
>> [19:36:26,191][INFO][grid-nio-worker-tcp-comm-0-#73%Server%][TcpCommunicationSpi]
>> Established outgoing communication connection [locAddr=/172.17.0.1:41648,
>> rmtAddr=ip-172-17-0-1.ap-south-1.compute.internal/172.17.0.1:47100]
>>
>> [19:36:26,193][INFO][grid-nio-worker-tcp-comm-4-#77%Server%][TcpCommunicationSpi]
>> Established outgoing communication connection [locAddr=/127.0.0.1:37656,
>> rmtAddr=/127.0.0.1:47100]
>>
>> [19:36:26,193][INFO][grid-nio-worker-tcp-comm-2-#75%Server%][TcpCommunicationSpi]
>> Established outgoing communication connection
>> [locAddr=/0:0:0:0:0:0:0:1:52484, rmtAddr=/0:0:0:0:0:0:0:1%lo:47100]
>>
>> [19:36:26,195][INFO][grid-nio-worker-tcp-comm-8-#81%Server%][TcpCommunicationSpi]
>> Accepted incoming communication connection [locAddr=/172.17.0.1:47100,
>> rmtAddr=/172.21.86.7:41656]
>>
>> [19:36:26,195][INFO][grid-nio-worker-tcp-comm-10-#83%Server%][TcpCommunicationSpi]
>> Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100,
>> rmtAddr=/0:0:0:0:0:0:0:1:52492]
>>
>> [19:36:26,195][INFO][grid-nio-worker-tcp-comm-12-#85%Server%][TcpCommunicationSpi]
>> Accepted incoming communication connection [locAddr=/127.0.0.1:47100,
>> rmtAddr=/127.0.0.1:37664]
>>
>> [19:36:26,196][INFO][grid-nio-worker-tcp-comm-7-#80%Server%][TcpCommunicationSpi]
>> Established outgoing communication connection [locAddr=/172.21.86.7:41076,
>> rmtAddr=ip-172-21-86-229.ap-south-1.compute.internal/172.21.86.229:47100]
>>
>>
>>
>>
>> On Mon, Sep 10, 2018 at 12:04 PM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> I can see a lot of errors like this one:
>>>
>>> [04:05:29,268][INFO][tcp-comm-worker-#1%Server%][ZookeeperDiscoveryImpl]
>>&

Re: Partition map exchange in detail

2018-09-12 Thread eugene miretsky
Hi Pavel,

The issue we are discussing is PME failing because one node cannot
communicate to another node, that's what IEP-25 is trying to solve. But in
that case (where one node is either down, or there is a communication
problem between two nodes) I would expect the split brain resolver to kick
in, and shut down one of the nodes. I would also expect the dead node to be
removed from the cluster, and no longer take part in PME.



On Wed, Sep 12, 2018 at 11:25 AM Pavel Kovalenko  wrote:

> Hi Eugene,
>
> Sorry, but I didn't catch the meaning of your question about Zookeeper
> Discovery. Could you please re-phrase it?
>
> ср, 12 сент. 2018 г. в 17:54, Ilya Lantukh :
>
>> Pavel K., can you please answer about Zookeeper discovery?
>>
>> On Wed, Sep 12, 2018 at 5:49 PM, eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> Thanks for the patience with my questions - just trying to understand
>>> the system better.
>>>
>>> 3) I was referring to
>>> https://apacheignite.readme.io/docs/zookeeper-discovery#section-failures-and-split-brain-handling.
>>> How come it doesn't get the node to shut down?
>>> 4) Are there any docs/JIRAs that explain how counters are used, and why
>>> they are required in the state?
>>>
>>> Cheers,
>>> Eugene
>>>
>>>
>>> On Wed, Sep 12, 2018 at 10:04 AM Ilya Lantukh 
>>> wrote:
>>>
>>>> 3) Such mechanics will be implemented in IEP-25 (linked above).
>>>> 4) Partition map states include update counters, which are incremented
>>>> on every cache update and play important role in new state calculation. So,
>>>> technically, every cache operation can lead to partition map change, and
>>>> for obvious reasons we can't route them through coordinator. Ignite is a
>>>> more complex system than Akka or Kafka and such simple solutions won't work
>>>> here (in general case). However, it is true that PME could be simplified or
>>>> completely avoid for certain cases and the community is currently working
>>>> on such optimizations (
>>>> https://issues.apache.org/jira/browse/IGNITE-9558 for example).
>>>>
>>>> On Wed, Sep 12, 2018 at 9:08 AM, eugene miretsky <
>>>> eugene.miret...@gmail.com> wrote:
>>>>
>>>>> 2b) I had a few situations where the cluster went into a state where
>>>>> PME constantly failed, and could never recover. I think the root cause was
>>>>> that a transaction got stuck and didn't timeout/rollback.  I will try to
>>>>> reproduce it again and get back to you
>>>>> 3) If a node is down, I would expect it to get detected and the node
>>>>> to get removed from the cluster. In such case, PME should not even be
>>>>> attempted with that node. Hence you would expect PME to fail very rarely
>>>>> (any faulty node will be removed before it has a chance to fail PME)
>>>>> 4) Don't all partition map changes go through the coordinator? I
>>>>> believe a lot of distributed systems work in this way (all decisions are
>>>>> made by the coordinator/leader) - In Akka the leader is responsible for
>>>>> making all cluster membership changes, in Kafka the controller does the
>>>>> leader election.
>>>>>
>>>>> On Tue, Sep 11, 2018 at 11:11 AM Ilya Lantukh 
>>>>> wrote:
>>>>>
>>>>>> 1) It is.
>>>>>> 2a) Ignite has retry mechanics for all messages, including
>>>>>> PME-related ones.
>>>>>> 2b) In this situation PME will hang, but it isn't a "deadlock".
>>>>>> 3) Sorry, I didn't understand your question. If a node is down, but
>>>>>> DiscoverySpi doesn't detect it, it isn't PME-related problem.
>>>>>> 4) How can you ensure that partition maps on coordinator are *latest
>>>>>> *without "freezing" cluster state for some time?
>>>>>>
>>>>>> On Sat, Sep 8, 2018 at 3:21 AM, eugene miretsky <
>>>>>> eugene.miret...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> We are using persistence, so I am not sure if shutting down nodes
>>>>>>> will be the desired outcome for us since we would need to modify the
>>>>>>> baseline topolgy.
>>>>>>>
>>>>>>> A couple more follow up questions
>>&

Re: Partition map exchange in detail

2018-09-12 Thread eugene miretsky
Thanks for the patience with my questions - just trying to understand the
system better.

3) I was referring to
https://apacheignite.readme.io/docs/zookeeper-discovery#section-failures-and-split-brain-handling.
How come it doesn't get the node to shut down?
4) Are there any docs/JIRAs that explain how counters are used, and why
they are required in the state?

Cheers,
Eugene


On Wed, Sep 12, 2018 at 10:04 AM Ilya Lantukh  wrote:

> 3) Such mechanics will be implemented in IEP-25 (linked above).
> 4) Partition map states include update counters, which are incremented on
> every cache update and play important role in new state calculation. So,
> technically, every cache operation can lead to partition map change, and
> for obvious reasons we can't route them through coordinator. Ignite is a
> more complex system than Akka or Kafka and such simple solutions won't work
> here (in general case). However, it is true that PME could be simplified or
> completely avoid for certain cases and the community is currently working
> on such optimizations (https://issues.apache.org/jira/browse/IGNITE-9558
> for example).
>
> On Wed, Sep 12, 2018 at 9:08 AM, eugene miretsky <
> eugene.miret...@gmail.com> wrote:
>
>> 2b) I had a few situations where the cluster went into a state where PME
>> constantly failed, and could never recover. I think the root cause was that
>> a transaction got stuck and didn't timeout/rollback.  I will try to
>> reproduce it again and get back to you
>> 3) If a node is down, I would expect it to get detected and the node to
>> get removed from the cluster. In such case, PME should not even be
>> attempted with that node. Hence you would expect PME to fail very rarely
>> (any faulty node will be removed before it has a chance to fail PME)
>> 4) Don't all partition map changes go through the coordinator? I believe
>> a lot of distributed systems work in this way (all decisions are made by
>> the coordinator/leader) - In Akka the leader is responsible for making all
>> cluster membership changes, in Kafka the controller does the leader
>> election.
>>
>> On Tue, Sep 11, 2018 at 11:11 AM Ilya Lantukh 
>> wrote:
>>
>>> 1) It is.
>>> 2a) Ignite has retry mechanics for all messages, including PME-related
>>> ones.
>>> 2b) In this situation PME will hang, but it isn't a "deadlock".
>>> 3) Sorry, I didn't understand your question. If a node is down, but
>>> DiscoverySpi doesn't detect it, it isn't PME-related problem.
>>> 4) How can you ensure that partition maps on coordinator are *latest 
>>> *without
>>> "freezing" cluster state for some time?
>>>
>>> On Sat, Sep 8, 2018 at 3:21 AM, eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
>>>> Thanks!
>>>>
>>>> We are using persistence, so I am not sure if shutting down nodes will
>>>> be the desired outcome for us since we would need to modify the baseline
>>>> topolgy.
>>>>
>>>> A couple more follow up questions
>>>>
>>>> 1) Is PME triggered when client nodes join us well? We are using Spark
>>>> client, so new nodes are created/destroy every time.
>>>> 2) It sounds to me like there is a pontential for the cluster to get
>>>> into a deadlock if
>>>>a) single PME message is lost (PME never finishes, there are no
>>>> retries, and all future operations are blocked on the pending PME)
>>>>b) one of the nodes has a  long running/stuck pending operation
>>>> 3) Under what circumastance can PME fail, while DiscoverySpi fails to
>>>> detect the node being down? We are using ZookeeperSpi so I would expect the
>>>> split brain resolver to shut down the node.
>>>> 4) Why is PME needed? Doesn't the coordinator know the altest
>>>> toplogy/pertition map of the cluster through regualr gossip?
>>>>
>>>> Cheers,
>>>> Eugene
>>>>
>>>> On Fri, Sep 7, 2018 at 5:18 PM Ilya Lantukh 
>>>> wrote:
>>>>
>>>>> Hi Eugene,
>>>>>
>>>>> 1) PME happens when topology is modified (TopologyVersion is
>>>>> incremented). The most common events that trigger it are: node
>>>>> start/stop/fail, cluster activation/deactivation, dynamic cache 
>>>>> start/stop.
>>>>> 2) It is done by a separate ExchangeWorker. Events that trigger PME
>>>>> are transferred using DiscoverySpi instead of CommunicationSpi.
>>>>> 3) All nodes wait for

Re: Partition map exchange in detail

2018-09-12 Thread eugene miretsky
2b) I had a few situations where the cluster went into a state where PME
constantly failed, and could never recover. I think the root cause was that
a transaction got stuck and didn't timeout/rollback.  I will try to
reproduce it again and get back to you
3) If a node is down, I would expect it to get detected and the node to get
removed from the cluster. In such case, PME should not even be attempted
with that node. Hence you would expect PME to fail very rarely (any faulty
node will be removed before it has a chance to fail PME)
4) Don't all partition map changes go through the coordinator? I believe a
lot of distributed systems work in this way (all decisions are made by the
coordinator/leader) - In Akka the leader is responsible for making all
cluster membership changes, in Kafka the controller does the leader
election.

On Tue, Sep 11, 2018 at 11:11 AM Ilya Lantukh  wrote:

> 1) It is.
> 2a) Ignite has retry mechanics for all messages, including PME-related
> ones.
> 2b) In this situation PME will hang, but it isn't a "deadlock".
> 3) Sorry, I didn't understand your question. If a node is down, but
> DiscoverySpi doesn't detect it, it isn't PME-related problem.
> 4) How can you ensure that partition maps on coordinator are *latest *without
> "freezing" cluster state for some time?
>
> On Sat, Sep 8, 2018 at 3:21 AM, eugene miretsky  > wrote:
>
>> Thanks!
>>
>> We are using persistence, so I am not sure if shutting down nodes will be
>> the desired outcome for us since we would need to modify the baseline
>> topolgy.
>>
>> A couple more follow up questions
>>
>> 1) Is PME triggered when client nodes join us well? We are using Spark
>> client, so new nodes are created/destroy every time.
>> 2) It sounds to me like there is a pontential for the cluster to get into
>> a deadlock if
>>a) single PME message is lost (PME never finishes, there are no
>> retries, and all future operations are blocked on the pending PME)
>>b) one of the nodes has a  long running/stuck pending operation
>> 3) Under what circumastance can PME fail, while DiscoverySpi fails to
>> detect the node being down? We are using ZookeeperSpi so I would expect the
>> split brain resolver to shut down the node.
>> 4) Why is PME needed? Doesn't the coordinator know the altest
>> toplogy/pertition map of the cluster through regualr gossip?
>>
>> Cheers,
>> Eugene
>>
>> On Fri, Sep 7, 2018 at 5:18 PM Ilya Lantukh 
>> wrote:
>>
>>> Hi Eugene,
>>>
>>> 1) PME happens when topology is modified (TopologyVersion is
>>> incremented). The most common events that trigger it are: node
>>> start/stop/fail, cluster activation/deactivation, dynamic cache start/stop.
>>> 2) It is done by a separate ExchangeWorker. Events that trigger PME are
>>> transferred using DiscoverySpi instead of CommunicationSpi.
>>> 3) All nodes wait for all pending cache operations to finish and then
>>> send their local partition maps to the coordinator (oldest node). Then
>>> coordinator calculates new global partition maps and sends them to every
>>> node.
>>> 4) All cache operations.
>>> 5) Exchange is never retried. Ignite community is currently working on
>>> PME failure handling that should kick all problematic nodes after timeout
>>> is reached (see
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-25%3A+Partition+Map+Exchange+hangs+resolving
>>> for details), but it isn't done yet.
>>> 6) You shouldn't consider PME failure as a error by itself, but rather
>>> as a result of some other error. The most common reason of PME hang-up is
>>> pending cache operation that couldn't finish. Check your logs - it should
>>> list pending transactions and atomic updates. Search for "Found long
>>> running" substring.
>>>
>>> Hope this helps.
>>>
>>> On Fri, Sep 7, 2018 at 11:45 PM, eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> Out cluster occasionally fails with "partition map exchange failure"
>>>> errors, I have searched around and it seems that a lot of people have had a
>>>> similar issue in the past. My high-level understanding is that when one of
>>>> the nodes fails (out of memory, exception, GC etc.) nodes fail to exchange
>>>> partition maps. However, I have a few questions
>>>> 1) When does partition map exchange happen? Periodically, when a node
>>>> joins, etc.
>>>> 2) Is it done in the same thread as communication SPI, or is a separate
>>>> worker?
>>>> 3) How does the exchange happen? Via a coordinator, peer to peer, etc?
>>>> 4) What does the exchange block?
>>>> 5) When is the exchange retried?
>>>> 5) How to resolve the error? The only thing I have seen online is to
>>>> decrease failureDetectionTimeout
>>>>
>>>> Our settings are
>>>> - Zookeeper SPI
>>>> - Persistence enabled
>>>>
>>>> Cheers,
>>>> Eugene
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Ilya
>>>
>>
>
>
> --
> Best regards,
> Ilya
>


Re: Node keeps crashing under load

2018-09-11 Thread eugene miretsky
Thanks Ilya,

We are writing to Ignite from Spark running in EMR. We don't know the
address of the node in advance, we have tried
1) Set localHost in Ignite configuration to 127.0.0.1, as per the example
online
2) Leave localHost unset, and let ignite figure out the host

I have attached more logs at the end.

My understanding is that Ignite should pick the first non-local address to
publish, however, it seems like it picks randomly one of (a) proper
address, (b) ipv6 address, (c) 127.0.0.1, (d)  172.17.0.1.

A few questions:
1) How do we force Spark client to use the proper address
2) Where is 172.17.0.1 coming from? It is usually the default docker
network host address, and it seems like Ignite creates a network interface
for it on the instance. (otherwise I have no idea where the interface is
coming from)
3) If there are communication errors, shouldn't the Zookeeper split brain
resolver kick in and shut down the dead node. Or shouldn't at least the
initiating node mark the remote node as dead?

[19:36:26,189][INFO][grid-nio-worker-tcp-comm-15-#88%Server%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/172.17.0.1:47100,
rmtAddr=/172.21.86.7:41648]

[19:36:26,190][INFO][grid-nio-worker-tcp-comm-3-#76%Server%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100,
rmtAddr=/0:0:0:0:0:0:0:1:52484]

[19:36:26,191][INFO][grid-nio-worker-tcp-comm-5-#78%Server%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/127.0.0.1:47100,
rmtAddr=/127.0.0.1:37656]

[19:36:26,191][INFO][grid-nio-worker-tcp-comm-1-#74%Server%][TcpCommunicationSpi]
Established outgoing communication connection [locAddr=/172.21.86.7:53272,
rmtAddr=ip-172-21-86-175.ap-south-1.compute.internal/172.21.86.175:47100]

[19:36:26,191][INFO][grid-nio-worker-tcp-comm-0-#73%Server%][TcpCommunicationSpi]
Established outgoing communication connection [locAddr=/172.17.0.1:41648,
rmtAddr=ip-172-17-0-1.ap-south-1.compute.internal/172.17.0.1:47100]

[19:36:26,193][INFO][grid-nio-worker-tcp-comm-4-#77%Server%][TcpCommunicationSpi]
Established outgoing communication connection [locAddr=/127.0.0.1:37656,
rmtAddr=/127.0.0.1:47100]

[19:36:26,193][INFO][grid-nio-worker-tcp-comm-2-#75%Server%][TcpCommunicationSpi]
Established outgoing communication connection
[locAddr=/0:0:0:0:0:0:0:1:52484, rmtAddr=/0:0:0:0:0:0:0:1%lo:47100]

[19:36:26,195][INFO][grid-nio-worker-tcp-comm-8-#81%Server%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/172.17.0.1:47100,
rmtAddr=/172.21.86.7:41656]

[19:36:26,195][INFO][grid-nio-worker-tcp-comm-10-#83%Server%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100,
rmtAddr=/0:0:0:0:0:0:0:1:52492]

[19:36:26,195][INFO][grid-nio-worker-tcp-comm-12-#85%Server%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/127.0.0.1:47100,
rmtAddr=/127.0.0.1:37664]

[19:36:26,196][INFO][grid-nio-worker-tcp-comm-7-#80%Server%][TcpCommunicationSpi]
Established outgoing communication connection [locAddr=/172.21.86.7:41076,
rmtAddr=ip-172-21-86-229.ap-south-1.compute.internal/172.21.86.229:47100]




On Mon, Sep 10, 2018 at 12:04 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> I can see a lot of errors like this one:
>
> [04:05:29,268][INFO][tcp-comm-worker-#1%Server%][ZookeeperDiscoveryImpl]
> Created new communication error process future
> [errNode=598e3ead-99b8-4c49-b7df-04d578dcbf5f, err=class
> org.apache.ignite.IgniteCheckedException: Failed to connect to node (is
> node still alive?). Make sure that each ComputeTask and cache Transaction
> has a timeout set in order to prevent parties from waiting forever in case
> of network issues [nodeId=598e3ead-99b8-4c49-b7df-04d578dcbf5f,
> addrs=[ip-172-17-0-1.ap-south-1.compute.internal/172.17.0.1:47100,
> ip-172-21-85-213.ap-south-1.compute.internal/172.21.85.213:47100,
> /0:0:0:0:0:0:0:1%lo:47100, /127.0.0.1:47100]]]
>
> I think the problem is, you have two nodes, they both have 172.17.0.1
> address but it's the different address (totally unrelated private nets).
>
> Try to specify your external address (such as 172.21.85.213) with
> TcpCommunicationSpi.setLocalAddress() on each node.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 7 сент. 2018 г. в 20:01, eugene miretsky :
>
>> Hi all,
>>
>> Can somebody please provide some pointers on what could be the issue or
>> how to debug it? We have a fairly large Ignite use case, but cannot go
>> ahead with a POC because of these crashes.
>>
>> Cheers,
>> Eugene
>>
>>
>>
>> On Fri, Aug 31, 2018 at 11:52 AM eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> Also, don't want to spam the mailing list with more threads, but I get
>>> the same stability issue when writing to Ignite from Spar

Re: Partition map exchange in detail

2018-09-07 Thread eugene miretsky
Thanks!

We are using persistence, so I am not sure if shutting down nodes will be
the desired outcome for us since we would need to modify the baseline
topolgy.

A couple more follow up questions

1) Is PME triggered when client nodes join us well? We are using Spark
client, so new nodes are created/destroy every time.
2) It sounds to me like there is a pontential for the cluster to get into a
deadlock if
   a) single PME message is lost (PME never finishes, there are no retries,
and all future operations are blocked on the pending PME)
   b) one of the nodes has a  long running/stuck pending operation
3) Under what circumastance can PME fail, while DiscoverySpi fails to
detect the node being down? We are using ZookeeperSpi so I would expect the
split brain resolver to shut down the node.
4) Why is PME needed? Doesn't the coordinator know the altest
toplogy/pertition map of the cluster through regualr gossip?

Cheers,
Eugene

On Fri, Sep 7, 2018 at 5:18 PM Ilya Lantukh  wrote:

> Hi Eugene,
>
> 1) PME happens when topology is modified (TopologyVersion is incremented).
> The most common events that trigger it are: node start/stop/fail, cluster
> activation/deactivation, dynamic cache start/stop.
> 2) It is done by a separate ExchangeWorker. Events that trigger PME are
> transferred using DiscoverySpi instead of CommunicationSpi.
> 3) All nodes wait for all pending cache operations to finish and then send
> their local partition maps to the coordinator (oldest node). Then
> coordinator calculates new global partition maps and sends them to every
> node.
> 4) All cache operations.
> 5) Exchange is never retried. Ignite community is currently working on PME
> failure handling that should kick all problematic nodes after timeout is
> reached (see
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-25%3A+Partition+Map+Exchange+hangs+resolving
> for details), but it isn't done yet.
> 6) You shouldn't consider PME failure as a error by itself, but rather as
> a result of some other error. The most common reason of PME hang-up is
> pending cache operation that couldn't finish. Check your logs - it should
> list pending transactions and atomic updates. Search for "Found long
> running" substring.
>
> Hope this helps.
>
> On Fri, Sep 7, 2018 at 11:45 PM, eugene miretsky <
> eugene.miret...@gmail.com> wrote:
>
>> Hello,
>>
>> Out cluster occasionally fails with "partition map exchange failure"
>> errors, I have searched around and it seems that a lot of people have had a
>> similar issue in the past. My high-level understanding is that when one of
>> the nodes fails (out of memory, exception, GC etc.) nodes fail to exchange
>> partition maps. However, I have a few questions
>> 1) When does partition map exchange happen? Periodically, when a node
>> joins, etc.
>> 2) Is it done in the same thread as communication SPI, or is a separate
>> worker?
>> 3) How does the exchange happen? Via a coordinator, peer to peer, etc?
>> 4) What does the exchange block?
>> 5) When is the exchange retried?
>> 5) How to resolve the error? The only thing I have seen online is to
>> decrease failureDetectionTimeout
>>
>> Our settings are
>> - Zookeeper SPI
>> - Persistence enabled
>>
>> Cheers,
>> Eugene
>>
>
>
>
> --
> Best regards,
> Ilya
>


Partition map exchange in detail

2018-09-07 Thread eugene miretsky
Hello,

Out cluster occasionally fails with "partition map exchange failure"
errors, I have searched around and it seems that a lot of people have had a
similar issue in the past. My high-level understanding is that when one of
the nodes fails (out of memory, exception, GC etc.) nodes fail to exchange
partition maps. However, I have a few questions
1) When does partition map exchange happen? Periodically, when a node
joins, etc.
2) Is it done in the same thread as communication SPI, or is a separate
worker?
3) How does the exchange happen? Via a coordinator, peer to peer, etc?
4) What does the exchange block?
5) When is the exchange retried?
5) How to resolve the error? The only thing I have seen online is to
decrease failureDetectionTimeout

Our settings are
- Zookeeper SPI
- Persistence enabled

Cheers,
Eugene


Re: ignte cluster hang with GridCachePartitionExchangeManager

2018-09-07 Thread eugene miretsky
Hi Wangsan,
So what was the original cause of the issue? Was it blocking the listening
thread in your test code or something else?

We are having similar issues

Cheers,
Eugene

On Mon, Sep 3, 2018 at 1:23 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> The operation will execute after partition map exchange (or maybe several
> ones). Just be sure to avoid waiting on operation from discovery event
> listener.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 3 сент. 2018 г. в 17:37, wangsan :
>
>> Thanks!
>>
>> Can I do cache operations(update cache item) in another thread from
>> discovery event listeners? And the operation(update cache item) will
>> execute
>> concurrently or execute before partition map exchange?
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>


Query 3x slower with index

2018-09-01 Thread eugene miretsky
Hello,

Schema:

   -

   PUBLIC.GATABLE2.CUSTOMER_ID

   PUBLIC.GATABLE2.DT

   PUBLIC.GATABLE2.CATEGORY_ID

   PUBLIC.GATABLE2.VERTICAL_ID

   PUBLIC.GATABLE2.SERVICE

   PUBLIC.GATABLE2.PRODUCT_VIEWS_APP

   PUBLIC.GATABLE2.PRODUCT_CLICKS_APP

   PUBLIC.GATABLE2.PRODUCT_VIEWS_WEB

   PUBLIC.GATABLE2.PRODUCT_CLICKS_WEB

   PUBLIC.GATABLE2.PDP_SESSIONS_APP

   PUBLIC.GATABLE2.PDP_SESSIONS_WEB
   - pkey = customer_id,dt
   - affinityKey = customer

Query:

   - select COUNT(*) FROM( Select customer_id from GATABLE2 where
   category_id in (175925, 101450, 9005, 175930, 175930, 175940,175945,101450,
   6453) group by customer_id having SUM(product_views_app) > 2 OR
   SUM(product_clicks_app) > 1 )

The table has 600M rows.
At first, the query took 1m, when we added an index on category_id the
query started taking 3m.

The SQL execution plan for both queries is attached.

We are using a single x1.16xlarge insntace with query parallelism set to 32

Cheers,
Eugene


QueryWithoutIndex
Description: Binary data


QueryWithIndex
Description: Binary data


Re: How to set node Id?

2018-08-31 Thread eugene miretsky
Thanks Denis,

The version is 2.5 and we are indeed using persistence. Full config
attached.

Another weird thing that happened is that after restarting the node a few
time it starts properly and joins the cluster. However, when I try to
create a SQL table (from spark) using template "ga_template" (defined in
the config) I get the error *Cache doesn't exist. *After some time, with
the same code and settings, creating the table starts working.

I know it sounds odd, but I have observed both a few times.

Based on the link you sent, the node Id should be automatically picked up
from the name of the file in persistence directory. Is it possible that the
pst lock was not being released properly, so when the node was restarted it
tried to create a new UUID?

Cheers,
Eugene

On Fri, Aug 31, 2018 at 1:57 AM Denis Magda  wrote:

> Strange, I've never seen consistent IDs collisions before. Are you using
> Ignite persistence and what's your version? If you scroll to the end of
> this paragraph, you'll find an explanation on how the IDs are generated:
>
> https://apacheignite.readme.io/docs/distributed-persistent-store#section-usage
>
> --
> Denis
>
> On Thu, Aug 30, 2018 at 9:10 PM eugene miretsky 
> wrote:
>
>> Hello,
>>
>> Is it possible to set a nodeId when restarting a node? How is the id
>> generated?
>>
>> Sometimes after the cluster crashes, when I restart a node I get the
>> following error: Caused by: class
>> org.apache.ignite.spi.IgniteSpiException: Failed to add node to topology
>> because it has the same hash code for partitioned affinity as one of
>> existing nodes [cacheName=SQL_PUBLIC_GAL3EC2,
>> existingNodeId=598e3ead-99b8-4c49-b7df-04d578dcbf5f]
>>
>> It looks like the node is trying to start with another nodeId, and cannot
>> because it's old nodeId owns the same partitions.
>>
>> Cheers,
>> Eugene
>>
>




http://www.springframework.org/schema/beans;
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
   xsi:schemaLocation="
   http://www.springframework.org/schema/beans
   http://www.springframework.org/schema/beans/spring-beans.xsd;>






  

  












			
			
			
		
































 








How to set node Id?

2018-08-30 Thread eugene miretsky
Hello,

Is it possible to set a nodeId when restarting a node? How is the id
generated?

Sometimes after the cluster crashes, when I restart a node I get the
following error: Caused by: class org.apache.ignite.spi.IgniteSpiException:
Failed to add node to topology because it has the same hash code for
partitioned affinity as one of existing nodes
[cacheName=SQL_PUBLIC_GAL3EC2,
existingNodeId=598e3ead-99b8-4c49-b7df-04d578dcbf5f]

It looks like the node is trying to start with another nodeId, and cannot
because it's old nodeId owns the same partitions.

Cheers,
Eugene


Node keeps crashing under load

2018-08-30 Thread eugene miretsky
Hello,

I have a medium cluster set up for testings - 3 x r4.8xlarge EC2 nodes. It
has persistence enabled, and zero backup.
- Full configs are attached.
- JVM settings are: JVM_OPTS="-Xms16g -Xmx64g -server -XX:+AggressiveOpts
-XX:MaxMetaspaceSize=256m  -XX:+AlwaysPreTouch -XX:+UseG1GC
-XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC"

The table has 145M rows, and takes up about 180G of memory
I testing 2 things
1) Writing SQL tables from Spark
2) Performing large SQL queries (from the web console): for example Select
COUNT (*) FROM (SELECT customer_id FROM MyTable where dt > '2018-05-12'
GROUP BY customer_id having SUM(column1) > 2 AND MAX(column2) < 1)

Most of the times I run the query it fails after one of the nodes crashes
(it has finished a few times, and then crashed the next time). I have also
similar stability issues when writing from Spark - at some point, one of
the nodes crashes. All I can see in the logs is

[21:51:58,548][SEVERE][disco-event-worker-#101%Server%][] Critical system
error detected. Will be handled accordingly to configured handler
[hnd=class o.a.i.failure.StopNodeFailureHandler, failureCtx=FailureContext
[type=SEGMENTATION, err=null]]

[21:51:58,549][SEVERE][disco-event-worker-#101%Server%][FailureProcessor]
Ignite node is in invalid state due to a critical failure.

[21:51:58,549][SEVERE][node-stopper][] Stopping local node on Ignite
failure: [failureCtx=FailureContext [type=SEGMENTATION, err=null]]

[21:52:03] Ignite node stopped OK [name=Server, uptime=00:07:06.780]

My questions are:
1) What is causing the issue?
2) How can I debug it better?

The rate of crashes and our lack of ability to debug them is becoming quite
a concern.

Cheers,
Eugene




http://www.springframework.org/schema/beans;
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
   xsi:schemaLocation="
   http://www.springframework.org/schema/beans
   http://www.springframework.org/schema/beans/spring-beans.xsd;>






  

  











			
			
			
		
































 








Re: How much heap to allocate

2018-08-30 Thread eugene miretsky
Thanks against for the detailed response!

Our main use case is preforming large SQL queries over tables with 200M+
rows  - wanted to give you a bit more details and context you can pass along

A simple example would be:

   - Table: customer_id, date, category, views, clicks ( pkey =
   "customer_id, date", affinity key = date )
   - Query: SELECT count(*) where date < X AND categroy in (C1, C2, C3)
   GROUP BY customer_id HAVING SUM(views) > 20

My main concernse are
1) How is the group by performed. You mentioend that it is performend on
the coordinator, I was coping that singe we are grouping using an colomn
that is an affintiy key, each node will be able to do it's own group by
2) How much heap should I allocate for the group by stage
3) General performance tips

Cheers,
Eugene


On Thu, Aug 30, 2018 at 1:32 AM Denis Magda  wrote:

> Eugene,
>
> Just want to be sure you know about the existence of the following pages
> which elaborate on Ignite memory architecture in details:
>
>-
>
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Entriesandpagesindurablememory
>-
>
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>
>
>
>> 1) Are indexs loaded into heap (when used)?
>>
>
> Something might be copied to disk but in most of the cases we perform
> comparisons and other operations directly off-heap.
> See org.apache.ignite.internal.processors.query.h2.database.InlineIndexHelper
> and related classes.
>
> 2) Are full pages loaded into heap, or only the matching records?
>>
>
> Matching records (result set) are presently loaded. The pages are not.
>
>
>> 3) When the query needs more processing than the exisiting index
>> (non-indexed columns, groupBy, aggreag) where/how does it happen?
>>
>
> We will be doing a full scan. Grouping and aggregations are finalized on
> the query coordinator which needs to get a full result set.
>
> 4) How is the query coordinator chosen? Is it the client node? How about
>> when using the web console?
>>
>
> That's your application. Web Console uses Ignite SQL APIs as well.
>
>
>> 5) What paralalism settings would your recomend, we were thinking to set
>> parallelJobsNumber  to 1  and task parallelism to number of cores * 2 -
>> this way we can make sure that each job gets al the heap memory instead of
>> all jobs fighting each other. Not sure if it makes sense, and it will also
>> prevent us from making real time transactional transactional queries.(we
>> are hoping to use ignite for both olap and simple real time queries)
>
>
> I would start a separate discussion for this bringing this question to the
> attention of our SQL experts. I'm not the one of them.
>
> --
> Denis
>
> On Mon, Aug 27, 2018 at 8:54 PM eugene miretsky 
> wrote:
>
>> Denis, thanks for the detailed response.
>>
>> A few more follow up questions
>> 1) Are indexs loaded into heap (when used)?
>> 2) Are full pages loaded into heap, or only the matching records?
>> 3) When the query needs more processing than the exisiting index
>> (non-indexed columns, groupBy, aggreag) where/how does it happen?
>> 4) How is the query coordinator chosen? Is it the client node? How about
>> when using the web console?
>> 5) What paralalism settings would your recomend, we were thinking to set
>> parallelJobsNumber  to 1  and task parallelism to number of cores * 2 -
>> this way we can make sure that each job gets al the heap memory instead of
>> all jobs fighting each other. Not sure if it makes sense, and it will also
>> prevent us from making real time transactional transactional queries.(we
>> are hoping to use ignite for both olap and simple real time queries)
>>
>> Cheers,
>> Eugene
>>
>>
>> On Sat, Aug 25, 2018 at 3:25 AM Denis Magda  wrote:
>>
>>> Hello Eugene,
>>>
>>> 1) In what format is data stored off heap?
>>>
>>>
>>> Data is always stored in the binary format let it be on-heap, off-heap
>>> or Ignite persistence.
>>> https://apacheignite.readme.io/docs/binary-marshaller
>>>
>>> 2) What happens when a SQL query is executed, in particular
>>>
>>>>
>>>>- How is H2 used? How is data loaded in H2? What if some of the
>>>>data is on disk?
>>>>
>>>> H2 is used to build execution plans for SELECTs. H2 calls Ignite's
>>> B+Tree based indexing implementation to see which indexes are set. All the
>>> data and indexes are always stored in Ignite (of

Re: How much heap to allocate

2018-08-27 Thread eugene miretsky
Denis, thanks for the detailed response.

A few more follow up questions
1) Are indexs loaded into heap (when used)?
2) Are full pages loaded into heap, or only the matching records?
3) When the query needs more processing than the exisiting index
(non-indexed columns, groupBy, aggreag) where/how does it happen?
4) How is the query coordinator chosen? Is it the client node? How about
when using the web console?
5) What paralalism settings would your recomend, we were thinking to set
parallelJobsNumber  to 1  and task parallelism to number of cores * 2 -
this way we can make sure that each job gets al the heap memory instead of
all jobs fighting each other. Not sure if it makes sense, and it will also
prevent us from making real time transactional transactional queries.(we
are hoping to use ignite for both olap and simple real time queries)

Cheers,
Eugene


On Sat, Aug 25, 2018 at 3:25 AM Denis Magda  wrote:

> Hello Eugene,
>
> 1) In what format is data stored off heap?
>
>
> Data is always stored in the binary format let it be on-heap, off-heap or
> Ignite persistence.
> https://apacheignite.readme.io/docs/binary-marshaller
>
> 2) What happens when a SQL query is executed, in particular
>
>>
>>- How is H2 used? How is data loaded in H2? What if some of the  data
>>is on disk?
>>
>> H2 is used to build execution plans for SELECTs. H2 calls Ignite's B+Tree
> based indexing implementation to see which indexes are set. All the data
> and indexes are always stored in Ignite (off-heap + disk).
>
>>
>>- When is data loaded into heap, and how much? Is only the output of
>>H2 loaded, or everything?
>>
>> Queries results are stored in Java heap temporarily. Once the result set
> is read by your application, it will be garbage collected.
>
>>
>>- How is the reduce stage performed? Is it performed only on one node
>>(hence that node needs to load all the data into memory)
>>
>> Correct, the final result set is reduced on a query coordinator - your
> application that executed a SELECT.
>
> 3) What happens when Ingite runs out of memory during execution? Is data
>> evictied to disk (if persistence is enabled)?
>
>
> I guess you mean what happens if a result set doesn't fit in RAM during
> the execution, right? If so, then OOM will occur. We're working on an
> improvement that will offload the result set to disk to avoid OOM for all
> the scenarious:
> https://issues.apache.org/jira/browse/IGNITE-7526
>
>
>
>> 4) Based on the code, it looks like I need to set my data region size to
>> at most 50% of available memory (to avoid the warning), this seems a bit
>> wastefull.
>
>
> There is no such a requirement. I know many deployments use cases when one
> data region is given 20% of RAM, the other is given 40% and everything else
> is persisted to disk.
>
> 5) Do you have any general advice on benchmarking the memory requirpement?
>> So far I have not been able to find a way to check how much memory each
>> table takes on and off heap, and how much memory each query takes.
>
>
> We use Yardstick for performance benchmarking:
> https://apacheignite.readme.io/docs/perfomance-benchmarking
>
> --
> Denis
>
> On Fri, Aug 24, 2018 at 7:06 AM eugene miretsky 
> wrote:
>
>> Thanks!
>>
>> I am trying to understand when and how data is moved from off-heap to on
>> heap, particularly when using SQL.  I took a look at the wiki
>> <https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood>
>>  but
>> still have a few questions
>>
>> My understanding is that data is always store off-heap
>>
>> 1) In what format is data stored off heap?
>> 2) What happens when a SQL query is executed, in particular
>>
>>- How is H2 used? How is data loaded in H2? What if some of the  data
>>is on disk?
>>- When is data loaded into heap, and how much? Is only the output of
>>H2 loaded, or everything?
>>- How is the reduce stage performed? Is it performed only on one node
>>(hence that node needs to load all the data into memory)
>>
>> 3) What happens when Ingite runs out of memory during execution? Is data
>> evictied to disk (if persistence is enabled)?
>> 4) Based on the code, it looks like I need to set my data region size to
>> at most 50% of available memory (to avoid the warning), this seems a bit
>> wastefull.
>> 5) Do you have any general advice on benchmarking the memory
>> requirpement? So far I have not been able to find a way to check how much
>> memory each table takes on and off heap, and how much memory each query
&g

Re: Thin client vs client node performance in Spark

2018-08-24 Thread eugene miretsky
Attached is the error I get from ignitevisorcmd.sh after calling the cache
command (the command just hangs).
To me it looks like all the spark executrors (10 in my test) start a new
client node, and some of those nodes get terminated and restarted as the
executor die. This seems to really confuse Ignite.

[15:45:10,741][INFO][grid-nio-worker-tcp-comm-0-#23%console%][TcpCommunicationSpi]
Established outgoing communication connection [locAddr=/127.0.0.1:40984,
rmtAddr=/127.0.0.1:47101]

[15:45:10,741][INFO][grid-nio-worker-tcp-comm-1-#24%console%][TcpCommunicationSpi]
Established outgoing communication connection [locAddr=/127.0.0.1:49872,
rmtAddr=/127.0.0.1:47100]

[15:45:10,742][INFO][grid-nio-worker-tcp-comm-3-#26%console%][TcpCommunicationSpi]
Established outgoing communication connection [locAddr=/127.0.0.1:40988,
rmtAddr=/127.0.0.1:47101]

[15:45:10,743][INFO][grid-nio-worker-tcp-comm-1-#24%console%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/127.0.0.1:47101,
rmtAddr=/127.0.0.1:40992]

[15:45:10,745][INFO][grid-nio-worker-tcp-comm-0-#23%console%][TcpCommunicationSpi]
Established outgoing communication connection [locAddr=/127.0.0.1:49876,
rmtAddr=/127.0.0.1:47100]

[15:45:11,725][SEVERE][grid-nio-worker-tcp-comm-2-#25%console%][TcpCommunicationSpi]
Failed to process selector key [ses=GridSelectorNioSessionImpl
[worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=2,
bytesRcvd=180, bytesSent=18, bytesRcvd0=18, bytesSent0=0, select=true,
super=GridWorker [name=grid-nio-worker-tcp-comm-2,
igniteInstanceName=console, finished=false, hashCode=1827979135,
interrupted=false, runner=grid-nio-worker-tcp-comm-2-#25%console%]]],
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=166400 cap=166400],
readBuf=java.nio.DirectByteBuffer[pos=18 lim=18 cap=117948],
inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/
172.21.85.37:39942, rmtAddr=ip-172-21-85-213.ap-south-1.compute.internal/
172.21.85.213:47100, createTime=1535125510724, closeTime=0, bytesSent=0,
bytesRcvd=18, bytesSent0=0, bytesRcvd0=18, sndSchedTime=1535125510724,
lastSndTime=1535125510724, lastRcvTime=1535125510724, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@7ae6182a, directMode=true],
GridConnectionBytesVerifyFilter], accepted=false]]]

java.lang.NullPointerException

at
org.apache.ignite.internal.util.nio.GridNioServer.cancelConnect(GridNioServer.java:885)

at
org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$SingleAddressConnectFuture.cancel(TcpCommunicationConnectionCheckFuture.java:338)

at
org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$MultipleAddressesConnectFuture.cancelFutures(TcpCommunicationConnectionCheckFuture.java:475)

at
org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$MultipleAddressesConnectFuture.receivedAddressStatus(TcpCommunicationConnectionCheckFuture.java:494)

at
org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$MultipleAddressesConnectFuture$1.onStatusReceived(TcpCommunicationConnectionCheckFuture.java:433)

at
org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$SingleAddressConnectFuture.finish(TcpCommunicationConnectionCheckFuture.java:362)

at
org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$SingleAddressConnectFuture.onConnected(TcpCommunicationConnectionCheckFuture.java:348)

at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onMessage(TcpCommunicationSpi.java:773)

at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onMessage(TcpCommunicationSpi.java:383)

at
org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279)

at
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)

at
org.apache.ignite.internal.util.nio.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:117)

at
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)

at
org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onMessageReceived(GridConnectionBytesVerifyFilter.java:88)

at
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)

at
org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3490)


On Fri, Aug 24, 2018 at 11:18 AM, eugene miretsky  wrote:

>  Thanks,
>
> So the way I understand it, thick client will use the affinitly key to
> send data to the right node, and hence will split the traiffic between all
> the nodes, the thin client will just send 

Re: Thin client vs client node performance in Spark

2018-08-24 Thread eugene miretsky
 Thanks,

So the way I understand it, thick client will use the affinitly key to send
data to the right node, and hence will split the traiffic between all the
nodes, the thin client will just send the data to one node, and that node
will be responsible to send it to the actual node that owns the 'shard'?

I keep getting the following error when using the Spark driver, the driver
keeps writing, but very slowly. Any idea what is causing the error, or how
to fix it?

Cheers,
Eugene

"

[15:04:58,030][SEVERE][data-streamer-stripe-10-#43%Server%][DataStreamProcessor]
Failed to respond to node [nodeId=78af5d88-cbfa-4529-aaee-ff4982985cdf,
res=DataStreamerResponse [reqId=192, forceLocDep=true]]

class org.apache.ignite.IgniteCheckedException: Failed to send message
(node may have left the grid or TCP connection cannot be established due to
firewall issues) [node=ZookeeperClusterNode
[id=78af5d88-cbfa-4529-aaee-ff4982985cdf, addrs=[127.0.0.1], order=377,
loc=false, client=true], topic=T1 [topic=TOPIC_DATASTREAM,
id=b8d675c6561-78af5d88-cbfa-4529-aaee-ff4982985cdf],
msg=DataStreamerResponse [reqId=192, forceLocDep=true], policy=9]

at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1651)

at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToCustomTopic(GridIoManager.java:1703)

at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToCustomTopic(GridIoManager.java:1673)

at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.sendResponse(DataStreamProcessor.java:440)

at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:402)

at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:305)

at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:60)

at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:90)

at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)

at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)

at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)

at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)

at
org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511)

at java.lang.Thread.run(Thread.java:748)

Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send
message to remote node: ZookeeperClusterNode
[id=78af5d88-cbfa-4529-aaee-ff4982985cdf, addrs=[127.0.0.1], order=377,
loc=false, client=true]

at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2718)

at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2651)

at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1643)

... 13 more

Caused by: class org.apache.ignite.IgniteCheckedException: Failed to
connect to node (is node still alive?). Make sure that each ComputeTask and
cache Transaction has a timeout set in order to prevent parties from
waiting forever in case of network issues
[nodeId=78af5d88-cbfa-4529-aaee-ff4982985cdf, addrs=[/127.0.0.1:47101]]

at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3422)

at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2958)

at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2841)

at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2692)

... 15 more

Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address [addr=/127.0.0.1:47101, err=Connection refused]

at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3425)

... 18 more

Caused by: java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)

at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3262)

... 18 more

"

On Tue, Aug 14, 2018 at 4:39 PM, akurbanov  wrote:

> Hi,
>
> Spark integration was implemented before java thin 

Re: Recommended HW on AWS EC2 - vertical vs horizontal scaling

2018-08-24 Thread eugene miretsky
Thanks Andrei,

For user case, please see my email ("Data modeling for segmenting a huge
data set: precomputing vs real time computations").

I think our main confusion right now is trying to understand how exactly
SQL queries work (when memory is moved to heap, when/how is H2 used, how
the reduce step is performed, etc.), because of that we don't really
understand when data is moved between heap, off-heap and disk, and hence
have hard time sizing it properly (I have crushed Ignite many times during
testing).  There is already a simialr email thread ("How much heap to
allocate").

We also cannot get some of our OLAP queries to execute in parallel (see
"Slow SQL query uses only a single CPU"), which again makes it harder to
size the HW (no point using huge instances if only a single CPU is going to
be used per query).

Cheers,
Eugene


Furher

On Fri, Aug 24, 2018 at 6:16 AM, aealexsandrov 
wrote:

> Hi,
>
> Ignite doesn't have such kind of benchmarks because they are very specific
> for every case and setup.
>
> However, exists several common tips:
>
> 1)In case if you will use EBS then try to avoid the NVMe. It is fast but
> looks like doesn't provide the guarantees for saving your data. We face the
> corruption of the working directory on this type of devices.
> 2)To get the best performance you should have enough of RAM to store your
> data in Ignite off-heap
> 3)Volume Type - EBS Provisioned IOPS SSD (io1)
>
> I suggest using x1 or x1e instances from
> https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/
> memory-optimized-instances.html
> list.
>
> Your choice will depend on your case and expectations. But for example:
>
> x1e.32xlarge
> ESB = io1
> 2 disks with 2 TB each
>
> It will provide to the capability to store your data in the memory in one
> node and the disk speed will be around 14000MB/SEC.
>
> Is it possible to describe your case in more detail?
>
> BR,
> Andrei
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: How much heap to allocate

2018-08-24 Thread eugene miretsky
Thanks!

I am trying to understand when and how data is moved from off-heap to on
heap, particularly when using SQL.  I took a look at the wiki

but
still have a few questions

My understanding is that data is always store off-heap

1) In what format is data stored off heap?
2) What happens when a SQL query is executed, in particular

   - How is H2 used? How is data loaded in H2? What if some of the  data is
   on disk?
   - When is data loaded into heap, and how much? Is only the output of H2
   loaded, or everything?
   - How is the reduce stage performed? Is it performed only on one node
   (hence that node needs to load all the data into memory)

3) What happens when Ingite runs out of memory during execution? Is data
evictied to disk (if persistence is enabled)?
4) Based on the code, it looks like I need to set my data region size to at
most 50% of available memory (to avoid the warning), this seems a bit
wastefull.
5) Do you have any general advice on benchmarking the memory requirpement?
So far I have not been able to find a way to check how much memory each
table takes on and off heap, and how much memory each query takes.

Cheers,
Eugene

On Fri, Aug 24, 2018 at 8:06 AM, NSAmelchev  wrote:

> Hi Eugene,
>
> Yes, it's a misprint as Dmitry wrote.
>
> Ignite print this warning if nodes on local machine require more than 80%
> of
> physical RAM.
>
> From code, you can see that total heap/offheap memory summing
> from nodes having the same mac address. This way calculates total memory
> used
> by the local machine.
>
> --
> Best wishes,
> Amelchev Nikita
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


How much heap to allocate

2018-08-22 Thread eugene miretsky
Hi,

I am getting the following warning when starting Ignite - "

Nodes started on local machine require more than 20% of physical RAM what
can lead to significant slowdown due to swapping
"

The 20% is a typo in version 2.5, it should be 80%.

We have increased the max size of the default region to 70% of the
available memory on the instance (since that's the only region we use at
the moment).

>From reading the code

that
generates the error, it seems like
1) Ignite adds all the memory across all nodes to check if it is above the
safeToUse threshold. I would expect the check to be done per node
2) totalOffheap seems to be the sum of the maxSizes of all regions, and
totalHeap retrieved from the JVM configs. ingnite.sh sets  -Xmx200g.

Assuming we are not enabling on-heap caching, what should we set the heap
size to?

Cheers,
Eugene


Re: Slow SQL query uses only a single CPU

2018-08-22 Thread eugene miretsky
Thanks,

I tried the composite key and provided the index tip in the query but the
query plan and execution time stayed the same. So really I am back to where
we started.

Right now I suspect that I am not setting the affinity key properly - I
intend the key to be customer_id, but maybe Ignite sets it to the _key
(composite of customer_id, dt). Is there a way to check it?

Also, is there documentation of how exactly data is stored, and how SQL
queries are performed? Where is the data stored (off-heap?)? When is the
data loaded to heap?  How do the ignite pages/segments get to H2?
Are Ignite and H2 indexes the same thing?

Cheers,
Eugene

On Wed, Aug 22, 2018 at 10:36 AM, Andrey Mashenkov <
andrey.mashen...@gmail.com> wrote:

> 1. /* PUBLIC.AFFINITY_KEY */ means index on affinity column is used. Full
> index will be scanned against date condition.
> As I wrote you can create composite index to speedup index scan.
> 2. "group sorted" means index is used for grouping. Looks like H2 have
> optimization for this and grouping can applied on fly.
> Unsorted grouping would means that we have to fetch full dataset and only
> then grouping.
>
> On Wed, Aug 22, 2018 at 5:21 PM eugene miretsky 
> wrote:
>
>> Just as a reference, bellow are 2 execution plans with and without the
>> index on a very similar table.
>>
>> Adding the index remove /* PUBLIC.AFFINITY_KEY */ and /* group sorted
>> */.
>> 1) Does PUBLIC.AFFINITY_KEY mean that DT is the affinity key. We are
>> setting customer_id as an affinity key. Is there a way to verify that?
>> 2) Is it possible that the removal of /* group sorted */ indicates that
>> the result of group_by must be sorted? (hence taking a long time)
>>
>> *Query*
>> Select COUNT (*) FROM (SELECT customer_id FROM GAL2RU where dt >
>> '2018-06-12' GROUP BY customer_id having SUM(ru_total_app_sessions_count)
>> > 2 AND MAX(ru_total_web_sessions_count) < 1)
>>
>> *Without an index*
>>
>> SELECT
>>
>> __Z0.CUSTOMER_ID AS __C0_0,
>>
>> SUM(__Z0.RU_TOTAL_APP_SESSIONS_COUNT) AS __C0_1,
>>
>> MAX(__Z0.RU_TOTAL_WEB_SESSIONS_COUNT) AS __C0_2
>>
>> FROM PUBLIC.GAL2RU __Z0
>>
>> /* PUBLIC.AFFINITY_KEY */
>>
>> WHERE __Z0.DT > '2018-06-12'
>>
>> GROUP BY __Z0.CUSTOMER_ID
>>
>> /* group sorted */
>>
>>
>> SELECT
>>
>> COUNT(*)
>>
>> FROM (
>>
>> SELECT
>>
>> __C0_0 AS CUSTOMER_ID
>>
>> FROM PUBLIC.__T0
>>
>> GROUP BY __C0_0
>>
>> HAVING (SUM(__C0_1) > 2)
>>
>> AND (MAX(__C0_2) < 1)
>>
>>
>> *With an index*
>>
>> SELECT
>>
>> __Z0.CUSTOMER_ID AS __C0_0,
>>
>> SUM(__Z0.RU_TOTAL_APP_SESSIONS_COUNT) AS __C0_1,
>>
>> MAX(__Z0.RU_TOTAL_WEB_SESSIONS_COUNT) AS __C0_2
>>
>> FROM PUBLIC.GAL2RU __Z0
>>
>> /* PUBLIC.DT_IDX2: DT > '2018-06-12' */
>>
>> WHERE __Z0.DT > '2018-06-12'
>>
>> GROUP BY __Z0.CUSTOMER_ID
>>
>>
>> SELECT
>>
>> COUNT(*)
>>
>> FROM (
>>
>> SELECT
>>
>> __C0_0 AS CUSTOMER_ID
>>
>> FROM PUBLIC.__T0
>>
>> GROUP BY __C0_0
>>
>> HAVING (SUM(__C0_1) > 2)
>>
>> AND (MAX(__C0_2) < 1)
>>
>> ) _0__Z1
>>
>>
>> On Wed, Aug 22, 2018 at 9:43 AM, eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> Thanks Andrey,
>>>
>>> We are using the Ignite notebook, any idea if there is a way to provide
>>> these flags and hints directly from SQL?
>>>
>>> From your description, it seems like the query is executed in the
>>> following order
>>> 1) Group by customer_id
>>> 2) For each group, perform the filtering on date using the index and
>>> aggregates
>>>
>>> My impressions was that the order is
>>> 1)  On each node, filter rows by date (using the index)
>>> 2)  On each node, group by the remaining rows by customer id, and then
>>> perform the aggrate
>>>
>>> That's why we created the index on the dt field, as opposed to
>>> customer_id field.
>>>
>>> Cheers,
>>> Eugene
>>>
>>>
>>> On Wed, Aug 22, 2018 at 8:44 AM, Andrey Mashenkov <
>>> andrey.mashen...@gmail.com> wrote:
>>>
>>>> Eugene,
>>>>
>>>> 1. Note that queryParallelism splits indices and Ignite work similar
>>>> way as if index data resides on several nodes. These index part can be
>>>> looked up 

Re: Slow SQL query uses only a single CPU

2018-08-22 Thread eugene miretsky
Just as a reference, bellow are 2 execution plans with and without the
index on a very similar table.

Adding the index remove /* PUBLIC.AFFINITY_KEY */ and /* group sorted */.
1) Does PUBLIC.AFFINITY_KEY mean that DT is the affinity key. We are
setting customer_id as an affinity key. Is there a way to verify that?
2) Is it possible that the removal of /* group sorted */ indicates that the
result of group_by must be sorted? (hence taking a long time)

*Query*
Select COUNT (*) FROM (SELECT customer_id FROM GAL2RU where dt >
'2018-06-12' GROUP BY customer_id having SUM(ru_total_app_sessions_count) >
2 AND MAX(ru_total_web_sessions_count) < 1)

*Without an index*

SELECT

__Z0.CUSTOMER_ID AS __C0_0,

SUM(__Z0.RU_TOTAL_APP_SESSIONS_COUNT) AS __C0_1,

MAX(__Z0.RU_TOTAL_WEB_SESSIONS_COUNT) AS __C0_2

FROM PUBLIC.GAL2RU __Z0

/* PUBLIC.AFFINITY_KEY */

WHERE __Z0.DT > '2018-06-12'

GROUP BY __Z0.CUSTOMER_ID

/* group sorted */


SELECT

COUNT(*)

FROM (

SELECT

__C0_0 AS CUSTOMER_ID

FROM PUBLIC.__T0

GROUP BY __C0_0

HAVING (SUM(__C0_1) > 2)

AND (MAX(__C0_2) < 1)


*With an index*

SELECT

__Z0.CUSTOMER_ID AS __C0_0,

SUM(__Z0.RU_TOTAL_APP_SESSIONS_COUNT) AS __C0_1,

MAX(__Z0.RU_TOTAL_WEB_SESSIONS_COUNT) AS __C0_2

FROM PUBLIC.GAL2RU __Z0

/* PUBLIC.DT_IDX2: DT > '2018-06-12' */

WHERE __Z0.DT > '2018-06-12'

GROUP BY __Z0.CUSTOMER_ID


SELECT

COUNT(*)

FROM (

SELECT

__C0_0 AS CUSTOMER_ID

FROM PUBLIC.__T0

GROUP BY __C0_0

HAVING (SUM(__C0_1) > 2)

AND (MAX(__C0_2) < 1)

) _0__Z1


On Wed, Aug 22, 2018 at 9:43 AM, eugene miretsky 
wrote:

> Thanks Andrey,
>
> We are using the Ignite notebook, any idea if there is a way to provide
> these flags and hints directly from SQL?
>
> From your description, it seems like the query is executed in the
> following order
> 1) Group by customer_id
> 2) For each group, perform the filtering on date using the index and
> aggregates
>
> My impressions was that the order is
> 1)  On each node, filter rows by date (using the index)
> 2)  On each node, group by the remaining rows by customer id, and then
> perform the aggrate
>
> That's why we created the index on the dt field, as opposed to customer_id
> field.
>
> Cheers,
> Eugene
>
>
> On Wed, Aug 22, 2018 at 8:44 AM, Andrey Mashenkov <
> andrey.mashen...@gmail.com> wrote:
>
>> Eugene,
>>
>> 1. Note that queryParallelism splits indices and Ignite work similar way
>> as if index data resides on several nodes. These index part can be looked
>> up in parallel threads.
>> 2. It is not a simple query as you data distributed among partitions and
>> is not collocated and aggregate function are used.
>> HAVING clause here is a reason, Ignite can apply it on reduce phase only
>> as HAVING requires aggregate value from all index parts.
>> 3. If you data already collocated on customer_id then you can hit Ignite
>> with set SqlFieldsQuery.setCollocated(true). This should force Ignite to
>> optimize grouping and push down aggregates to map phase.
>> 4. In query plan you attached you can see H2 uses DT_IDX
>> /* PUBLIC.DT_IDX: DT > '2018-05-12' */
>> It is not effective. With this index H2 have to process all data to
>> calculate aggregate for group. Index on affinity field may be more
>> effective as data can be processed group by group.
>> once all group data is process then result can be passed to reducer.
>> Hope, H2 is smart enough to do such streaming.
>>
>> Also, you can try to use composite index on (customer_id, date) columns.
>> Most likely. hint will needed [2].
>>
>> See also about collocated flag [1] and Hits [2]
>>
>> [1] https://ignite.apache.org/releases/latest/javadoc/org/apache
>> /ignite/cache/query/SqlFieldsQuery.html#setCollocated-boolean-
>> [2] https://apacheignite.readme.io/v2.0/docs/sql-performance-and
>> -debugging#index-hints
>>
>>
>> On Wed, Aug 22, 2018 at 3:10 PM eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> Thanks Andrey,
>>>
>>> Right now we are testing with only one big node, so the reduce step
>>> should not take any time.
>>>
>>> 1) We already set parallelism to 32, and I can still see only 1 core
>>> working. Anything else could be preventing multiple cores from working on
>>> the job?
>>> 2) Why would the reduce phase need to look at all the data? It seems
>>> like a fairly simple query
>>> 3) We are already collocating data  by customer_id (though as I
>>> mentioned, right now there is only one node)
>>> 4) We already using collocation and tried using an index, and other
>>> advice? Is there a way to check what Ignite is actually doi

Re: Slow SQL query uses only a single CPU

2018-08-22 Thread eugene miretsky
Thanks Andrey,

We are using the Ignite notebook, any idea if there is a way to provide
these flags and hints directly from SQL?

>From your description, it seems like the query is executed in the following
order
1) Group by customer_id
2) For each group, perform the filtering on date using the index and
aggregates

My impressions was that the order is
1)  On each node, filter rows by date (using the index)
2)  On each node, group by the remaining rows by customer id, and then
perform the aggrate

That's why we created the index on the dt field, as opposed to customer_id
field.

Cheers,
Eugene


On Wed, Aug 22, 2018 at 8:44 AM, Andrey Mashenkov <
andrey.mashen...@gmail.com> wrote:

> Eugene,
>
> 1. Note that queryParallelism splits indices and Ignite work similar way
> as if index data resides on several nodes. These index part can be looked
> up in parallel threads.
> 2. It is not a simple query as you data distributed among partitions and
> is not collocated and aggregate function are used.
> HAVING clause here is a reason, Ignite can apply it on reduce phase only
> as HAVING requires aggregate value from all index parts.
> 3. If you data already collocated on customer_id then you can hit Ignite
> with set SqlFieldsQuery.setCollocated(true). This should force Ignite to
> optimize grouping and push down aggregates to map phase.
> 4. In query plan you attached you can see H2 uses DT_IDX
> /* PUBLIC.DT_IDX: DT > '2018-05-12' */
> It is not effective. With this index H2 have to process all data to
> calculate aggregate for group. Index on affinity field may be more
> effective as data can be processed group by group.
> once all group data is process then result can be passed to reducer. Hope,
> H2 is smart enough to do such streaming.
>
> Also, you can try to use composite index on (customer_id, date) columns.
> Most likely. hint will needed [2].
>
> See also about collocated flag [1] and Hits [2]
>
> [1] https://ignite.apache.org/releases/latest/javadoc/org/
> apache/ignite/cache/query/SqlFieldsQuery.html#setCollocated-boolean-
> [2] https://apacheignite.readme.io/v2.0/docs/sql-performance-
> and-debugging#index-hints
>
>
> On Wed, Aug 22, 2018 at 3:10 PM eugene miretsky 
> wrote:
>
>> Thanks Andrey,
>>
>> Right now we are testing with only one big node, so the reduce step
>> should not take any time.
>>
>> 1) We already set parallelism to 32, and I can still see only 1 core
>> working. Anything else could be preventing multiple cores from working on
>> the job?
>> 2) Why would the reduce phase need to look at all the data? It seems like
>> a fairly simple query
>> 3) We are already collocating data  by customer_id (though as I
>> mentioned, right now there is only one node)
>> 4) We already using collocation and tried using an index, and other
>> advice? Is there a way to check what Ignite is actually doing? How are
>> indexs used (by Ignite or H2)?
>>
>> Cheers,
>> Eugene
>>
>> On Wed, Aug 22, 2018 at 3:54 AM, Andrey Mashenkov <
>> andrey.mashen...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> 1. Possible there are too much data should be looked for the query. With
>>> single node and parallelism=1 query will always run in single thread.
>>>  You can try to add more nodes or increase query parallelism to utilize
>>> more CPU cores.
>>>
>>> 2. Index on date field may be not effective as reduce phase should look
>>> all the data for further grouping.
>>> Try add index on customer_id or use collocation in customer_id (usually
>>> more preferable way).
>>>
>>> Also it is possible the bottleneck is the reduce phase.
>>> Is it possible to collocate data by group by column  (customer_id)? This
>>> collocation will allow you use collocated flag [1] and Ignite will use more
>>> optimal plan.
>>>
>>> 4. The main techniques is trying to reduce amount to data to be looked
>>> up on every phase with using data collocation and indices
>>> Ignite provide 2 plans for distributed queries: map and reduce. You can
>>> analyse and check these queries separately to understand how much data are
>>> processed on map phase and on reduce.
>>> Map query process node local data (until distributed joins on), while
>>> reduce fetch data from remote node that may costs. .
>>>
>>>
>>> On Wed, Aug 22, 2018 at 6:07 AM eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
>>>> Here is the result of EXPLAIN for the afermantioned query:
>>>>
>>>> SELECT
>>>> __Z0.CUSTOMER_ID

Re: Slow SQL query uses only a single CPU

2018-08-22 Thread eugene miretsky
Thanks Andrey,

Right now we are testing with only one big node, so the reduce step should
not take any time.

1) We already set parallelism to 32, and I can still see only 1 core
working. Anything else could be preventing multiple cores from working on
the job?
2) Why would the reduce phase need to look at all the data? It seems like a
fairly simple query
3) We are already collocating data  by customer_id (though as I mentioned,
right now there is only one node)
4) We already using collocation and tried using an index, and other advice?
Is there a way to check what Ignite is actually doing? How are indexs used
(by Ignite or H2)?

Cheers,
Eugene

On Wed, Aug 22, 2018 at 3:54 AM, Andrey Mashenkov <
andrey.mashen...@gmail.com> wrote:

> Hi,
>
> 1. Possible there are too much data should be looked for the query. With
> single node and parallelism=1 query will always run in single thread.
>  You can try to add more nodes or increase query parallelism to utilize
> more CPU cores.
>
> 2. Index on date field may be not effective as reduce phase should look
> all the data for further grouping.
> Try add index on customer_id or use collocation in customer_id (usually
> more preferable way).
>
> Also it is possible the bottleneck is the reduce phase.
> Is it possible to collocate data by group by column  (customer_id)? This
> collocation will allow you use collocated flag [1] and Ignite will use more
> optimal plan.
>
> 4. The main techniques is trying to reduce amount to data to be looked up
> on every phase with using data collocation and indices
> Ignite provide 2 plans for distributed queries: map and reduce. You can
> analyse and check these queries separately to understand how much data are
> processed on map phase and on reduce.
> Map query process node local data (until distributed joins on), while
> reduce fetch data from remote node that may costs. .
>
>
> On Wed, Aug 22, 2018 at 6:07 AM eugene miretsky 
> wrote:
>
>> Here is the result of EXPLAIN for the afermantioned query:
>>
>> SELECT
>> __Z0.CUSTOMER_ID AS __C0_0,
>> SUM(__Z0.EC1_BKNT_TOTAL_PRODUCT_VIEWS_APP) AS __C0_1,
>> MAX(__Z0.EC1_HNK_TOTAL_PRODUCT_CLICKS_APP) AS __C0_2
>> FROM PUBLIC.GAL3EC1 __Z0
>> /* PUBLIC.DT_IDX: DT > '2018-05-12' */
>> WHERE __Z0.DT > '2018-05-12'
>> GROUP BY __Z0.CUSTOMER_ID
>> SELECT
>> COUNT(*)
>> FROM (
>> SELECT
>> __C0_0 AS CUSTOMER_ID
>> FROM PUBLIC.__T0
>> GROUP BY __C0_0
>> HAVING (SUM(__C0_1) > 2)
>> AND (MAX(__C0_2) < 1)
>> ) _0__Z1
>>
>>
>>
>> On Tue, Aug 21, 2018 at 8:18 PM, eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a cache called GAL3EC1, it has
>>>
>>>1. A composite pKey consisting of customer_id and date
>>>2. An Index on the date column
>>>3. 300 sparse columns
>>>
>>> We are running a single EC2 4x8xlarge node.
>>>
>>> The following query takes 8min to finish
>>> Select COUNT (*) FROM (SELECT customer_id FROM GAl3ec1 where dt >
>>> '2018-05-12' GROUP BY customer_id having 
>>> SUM(ec1_bknt_total_product_views_app)
>>> > 2 AND MAX(ec1_hnk_total_product_clicks_app) < 1)
>>>
>>> I have a few questions:
>>>
>>>1. 'top' command shows %100 cpu utilization (i.e only one of the 32
>>>CPUs is used). How can I get the query to use all 32 CPUs? I have tried
>>>setting Query Parallelism to 32, but it didn't help,
>>>2. Adding the index on date column seems to have slowed down the
>>>query. The 8min time from above was without the index, with the index the
>>>query doesn't finish (I gave up after 30min). A similar query on a
>>>smaller date range showed a 10x slow down with the index. Why?
>>>3. Our loads from Spark are very slow as well, and also seem to not
>>>use the system resource properly, can that be related?
>>>4. What are some good tools and techniques to troubleshoot these
>>>problems in Ignite?
>>>
>>>
>>> All the relevant info is attached (configs, cache stats, node stats,
>>> etc.).
>>>
>>> Cheers,
>>> Eugene
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


Re: Slow SQL query uses only a single CPU

2018-08-21 Thread eugene miretsky
Here is the result of EXPLAIN for the afermantioned query:

SELECT
__Z0.CUSTOMER_ID AS __C0_0,
SUM(__Z0.EC1_BKNT_TOTAL_PRODUCT_VIEWS_APP) AS __C0_1,
MAX(__Z0.EC1_HNK_TOTAL_PRODUCT_CLICKS_APP) AS __C0_2
FROM PUBLIC.GAL3EC1 __Z0
/* PUBLIC.DT_IDX: DT > '2018-05-12' */
WHERE __Z0.DT > '2018-05-12'
GROUP BY __Z0.CUSTOMER_ID
SELECT
COUNT(*)
FROM (
SELECT
__C0_0 AS CUSTOMER_ID
FROM PUBLIC.__T0
GROUP BY __C0_0
HAVING (SUM(__C0_1) > 2)
AND (MAX(__C0_2) < 1)
) _0__Z1



On Tue, Aug 21, 2018 at 8:18 PM, eugene miretsky 
wrote:

> Hi,
>
> We have a cache called GAL3EC1, it has
>
>1. A composite pKey consisting of customer_id and date
>2. An Index on the date column
>3. 300 sparse columns
>
> We are running a single EC2 4x8xlarge node.
>
> The following query takes 8min to finish
> Select COUNT (*) FROM (SELECT customer_id FROM GAl3ec1 where dt >
> '2018-05-12' GROUP BY customer_id having SUM(ec1_bknt_total_product_views_app)
> > 2 AND MAX(ec1_hnk_total_product_clicks_app) < 1)
>
> I have a few questions:
>
>1. 'top' command shows %100 cpu utilization (i.e only one of the 32
>CPUs is used). How can I get the query to use all 32 CPUs? I have tried
>setting Query Parallelism to 32, but it didn't help,
>2. Adding the index on date column seems to have slowed down the
>query. The 8min time from above was without the index, with the index the
>query doesn't finish (I gave up after 30min). A similar query on a
>smaller date range showed a 10x slow down with the index. Why?
>3. Our loads from Spark are very slow as well, and also seem to not
>use the system resource properly, can that be related?
>4. What are some good tools and techniques to troubleshoot these
>problems in Ignite?
>
>
> All the relevant info is attached (configs, cache stats, node stats,
> etc.).
>
> Cheers,
> Eugene
>
>
>
>
>
>
>


Slow SQL query uses only a single CPU

2018-08-21 Thread eugene miretsky
Hi,

We have a cache called GAL3EC1, it has

   1. A composite pKey consisting of customer_id and date
   2. An Index on the date column
   3. 300 sparse columns

We are running a single EC2 4x8xlarge node.

The following query takes 8min to finish
Select COUNT (*) FROM (SELECT customer_id FROM GAl3ec1 where dt >
'2018-05-12' GROUP BY customer_id having
SUM(ec1_bknt_total_product_views_app) > 2 AND
MAX(ec1_hnk_total_product_clicks_app) < 1)

I have a few questions:

   1. 'top' command shows %100 cpu utilization (i.e only one of the 32 CPUs
   is used). How can I get the query to use all 32 CPUs? I have tried setting
   Query Parallelism to 32, but it didn't help,
   2. Adding the index on date column seems to have slowed down the query.
   The 8min time from above was without the index, with the index the query
   doesn't finish (I gave up after 30min). A similar query on a smaller date
   range showed a 10x slow down with the index. Why?
   3. Our loads from Spark are very slow as well, and also seem to not use
   the system resource properly, can that be related?
   4. What are some good tools and techniques to troubleshoot these
   problems in Ignite?


All the relevant info is attached (configs, cache stats, node stats, etc.).

Cheers,
Eugene
Cache

++
| Name(@) |Mode | Nodes |   Entries (Heap / 
Off-heap)   |   Hits|Misses|Reads |  Writes  |
++
| cache1(@c0) | PARTITIONED | 1 | min: 0 (0 / 0)
| min: 0| min: 0   | min: 0   | min: 0   |
| | |   | avg: 0.00 (0.00 / 0.00)   
| avg: 0.00 | avg: 0.00| avg: 0.00| avg: 0.00|
| | |   | max: 0 (0 / 0)
| max: 0| max: 0   | max: 0   | max: 0   |
+-+-+---+---+---+--+--+--+
| cache2(@c1) | PARTITIONED | 1 | min: 0 (0 / 0)
| min: 0| min: 0   | min: 0   | min: 0   |
| | |   | avg: 0.00 (0.00 / 0.00)   
| avg: 0.00 | avg: 0.00| avg: 0.00| avg: 0.00|
| | |   | max: 0 (0 / 0)
| max: 0| max: 0   | max: 0   | max: 0   |
+-+-+---+---+---+--+--+--+
| SQL_PUBLIC_GAL2RU(@c2)  | PARTITIONED | 1 | min: 8229539 (0 / 8229539)
| min: 0| min: 0   | min: 0   | min: 8229539 |
| | |   | avg: 8229539.00 (0.00 / 
8229539.00)   | avg: 0.00 | avg: 0.00| avg: 0.00| avg: 8229539.00  |
| | |   | max: 8229539 (0 / 8229539)
| max: 0| max: 0   | max: 0   | max: 8229539 |
+-+-+---+---+---+--+--+--+
| SQL_PUBLIC_GAL3EC1(@c3) | PARTITIONED | 1 | min: 63991599 (0 / 63991599)  
| min: 0| min: 9028| min: 9028| min: 83335247|
| | |   | avg: 63991599.00 (0.00 / 
63991599.00) | avg: 0.00 | avg: 9028.00 | avg: 9028.00 | avg: 83335247.00 |
| | |   | max: 63991599 (0 / 63991599)  
| max: 0| max: 9028| max: 9028| max: 83335247|
+-+-+---+---+---+--+--+--+
| SQL_PUBLIC_PERSON(@c4)  | PARTITIONED | 1 | min: 0 (0 / 0)
| min: 0| min: 0   | min: 0   | min: 0   |
| | |   | avg: 0.00 (0.00 / 0.00)   
| avg: 0.00 | avg: 0.00| avg: 0.00| avg: 0.00|
| | |   | max: 0 (0 / 0)
| max: 0| max: 0   | max: 0   | max: 0   |
++

Node

+---+
| ID  | 2dff6868-b253-48d1-aa7b-128082a26332
|
| ID8 | 

Recommended HW on AWS EC2 - vertical vs horizontal scaling

2018-08-21 Thread eugene miretsky
Hello,

We are looking to set up a fairly large (a few TB) cluster in AWS for OLAP
and transactional use cases.


   1. Are there any Ignite benchmarks on horizontally vs vertical scaling?
   2. What EC2 instances are other people using in prod? (I am assuming
   that one of the memory optimized
   

   )


Cheers,
Eugene


Spark Dataframe write is hanging

2018-08-21 Thread eugene miretsky
Hi,


When I am saving Spark Dataframe to Ignite, the job sometimes get
stuck with the attached error.

It seems to happen at random, and usualy at the end of the job (all
the data has already been writen to Ignite)


Has anybody encountered this before?


java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processConnect(GridNioServer.java:2714)
> at
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2333)
> at
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2110)
> at
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> at java.lang.Thread.run(Thread.java:748)


Configurations precedence and consistency across the cluster

2018-08-21 Thread eugene miretsky
Hello,

It looks like there are several ways to set cluster configuration (I am
currently looking at CacheConfiguration in particualr)

   1. Via the configuration XML file provided at startup
   2. Via the Java client (CacheConfiguration object)
   3. Via SQL commands

I have a few questions about how these configuartions works

   1. What is the precedence of configrations? What if I specify configs in
   xml file, and then create a cache via SQL or Java client - will the config
   get overriten?
   2. What happens when a node restarts? Where does it get the config?
   Would it pick up the configs from the XML config?
   3. Are configs syncronized across the cluster in any way? What happens
   if 2 nodes have different XML configs? What happens if the
CacheConfiguration
   temple used in a SQL create table requests is avalible only on one node?

Cheers,
Eugene


Data modeling for segmenting a huge data set: precomputing vs real time computations

2018-08-21 Thread eugene miretsky
Hello,

We have a very big data set (200M+ users), for each user we store their
activity (views, sales, etc.) and we need to be able to segment the users
based on that data.

One (very simplified) use case looks something like:
- Data: customer_id, date, category, sub_category, action
- Query: All customers that had X views in last Y days in sub_category Z.

The problem becomes huge very fast since we have hundereds of categories
and actions and several aggreagation functions (total views, min purchase
amount, etc.)

There are 2 ways to go about it
1) Pre-compute everything (memory intensive):
-  Very wide rows for each user: sub_categories x features x aggregation
functions  (we don't need every single permutation, but we still end up
with 1000+ columns).
- The tables are very sparse since most of the columns are null for most
users
2) Store the row data and perform the grouping in real time (compute
intensive)

Currently we are trying a middle version
-  Precompute all the features every day: (customer_id, day,
man_fashion_views, electronics_views, electronics_purchase_amount)
-  Filter by date and perform the aggregation in real time:  (SELECT
customer_id FROM testTable where date > '2018-08-02' GROUP BY customer_id
HAVING SUM(man_fashion_views) > 2 AND MAX(electronics_purchase_amount) <
100) )

Currently we are trying to understand the internals of Ignite a bit better
in order to understand how to model and optimize the above use case
1) How are sparse rows stored? Do nulls take the same amount of space as
actual values?
2) What is the cost of bringing data from off-heap memory to the heap?
3) How are filtering, group_by and aggregates performed?
 -- Does all the data get moved from off-heap to heap on every query?
 -- Is any filtering performed off-heap?
 -- If we need to calculate  aggregate A over a 3 day period and aggregate
B over 2 weeks period. Should we do it in 2 queries and then join the
results or in 1 query?
 -- Are indexes used during group_by? (say we are grouping by category,
would an index on category column help significantly?)
 -- Are there any optimization tricks to speed group_by?
4) One big table vs many smaller tables (all with customer_id as
affinityKey)
-- What is the cost of joining across colocated tables?

We are going to benchmark most of these, but we just started testing Ignite
and don't have a good intuition about what is likely to work.

Would apprecaite any tips you can provide.

Cheers,
Eugene



Cheers,
Eugene


Confusion/inaccurate visor stats

2018-08-17 Thread eugene miretsky
Hello,

I am running a single Ignite node on a r4.8xlarge EC2 node. I am using the
default settings with 132G allocated for the default memory region. So far
I have uploaded 1 large table 60M rows using Spark

The output of node and cache commands is pasted bellow.

A few questions
1) In Data region metrics, why is everything 0?
2) How come Non-heap memory usage is minimal?
3) How can I tell how much memory the table is consuming?
4) Total busy time is 15s, the upload took longer than that.

Cheers,
Eugene


*Visor output*

Time of the snapshot: 2018-08-17 15:06:37

+---+

| ID  | b19229cd-c6c2-4cfe-ba82-3fb0ac0946b9
|

| ID8 | B19229CD
|

| Node Type   | Server
|

| Order   | 1
|

| Address (0) | 172.17.0.1
|

| Address (1) | 172.21.85.213
|

| Address (2) | 127.0.0.1
|

| Address (3) | 0:0:0:0:0:0:0:1%lo
|

| OS info | Linux amd64 4.4.0-1062-aws
|

| OS user | root
|

| Deployment mode | SHARED
|

| Language runtime| Java Platform API Specification ver. 1.8
|

| Ignite version  | 2.5.0
|

| Ignite instance name| 
|

| JRE information | HotSpot 64-Bit Tiered Compilers
|

| JVM start time  | 2018-08-17 13:52:05
|

| Node start time | 2018-08-17 13:52:07
|

| Up time | 01:02:17.619
|

| CPUs| 32
|

| Last metric update  | 2018-08-17 14:54:23
|

| Non-loopback IPs| 172.17.0.1, 172.21.85.213,
fe80:0:0:0:59:71ff:fe32:36e%ens3 |

| Enabled MACs| 0242652B992A, 02597132036E
|

| Maximum active jobs | 1
|

| Current active jobs | 0
|

| Average active jobs | 0.00
|

| Maximum waiting jobs| 0
|

| Current waiting jobs| 0
|

| Average waiting jobs| 0.00
|

| Maximum rejected jobs   | 0
|

| Current rejected jobs   | 0
|

| Average rejected jobs   | 0.00
|

| Maximum cancelled jobs  | 0
|

| Current cancelled jobs  | 0
|

| Average cancelled jobs  | 0.00
|

| Total rejected jobs | 0
|

| Total executed jobs | 9
|

| Total cancelled jobs| 0
|

| Maximum job wait time   | 0ms
|

| Current job wait time   | 0ms
|

| Average job wait time   | 0.00ms
|

| Maximum job execute time| 0ms
|

| Current job execute time| 0ms
|

| Average job execute time| 0.00ms
|

| Total busy time | 15724ms
|

| Busy time % | 0.42%
|

| Current CPU load %  | 0.03%
|

| Average CPU load %  | 5.03%
|

| Heap memory initialized | 32gb
|

| Heap memory used| 57gb
|

| Heap memory committed   | 113gb
|

| Heap memory maximum | 178gb
|

| Non-heap memory initialized | 2mb
|

| Non-heap memory used| 75mb
|

| Non-heap memory committed   | 76mb
|

| Non-heap memory maximum | 496mb
|

| Current thread count| 230
|

| Maximum thread count| 233
|

| Total started thread count  | 391
|

| Current daemon thread count | 13
|

+---+


Data region metrics:

++

|  Name  | Page size |   Pages|  Memory   |  Rates
| Checkpoint buffer | Large entries |

++

| Default_Region | 0 | Total:  0  | Total:  0 | Allocation:
0.00 | Pages: 0  | 0.00% |

||   | Dirty:  0  | In RAM: 0 |
Eviction:   0.00
| Size:  0  |   |

||   | Memory: 0  |   |
Replace:

Source code of latest benchmarks

2018-08-16 Thread eugene miretsky
Hi,

I am trying to write my own benchmark. The Ignite benchmarks github linked
on the website is very old, is there a newer version I could work off?

Cheers,
Eugene


Thin client vs client node performance in Spark

2018-08-14 Thread eugene miretsky
Hello,

What are the tradeoffs of using the thin client vs client node? Are there
any benchmarks?

The Spark client is using the latter (client node) - is that for
performance reasons or just legacy?

Cheers,
Eugene