Ignite Cluster: Cache Misses happening for existing keys

2018-09-12 Thread HEWA WIDANA GAMAGE, SUBASH
Hi all,
We're observing this in a 3 node server cluster(3 separate JVMs, in 3 separate 
VMs within same datacenter, node to node latency is 2-3 milliseconds within the 
network).

As with following code is wrapped inside an http API, and that API is getting 
called by a 5000 users ramping up(60seconds) forever, hitting 3 nodes in a 
round robin manner.

With this, for same cache key, I could see more than one "Cache put key=" log 
appearing with in 15mins window(actually I am getting this duplicate put logs 
after 2-3mins of the load test)..

For the SAME cache key, there cannot be more than one put within 15mins. Based 
on cache size, it's well below eviction size, and since it's well within the 
expiry window, looks to me some timing issue when replicating the cache between 
the nodes.

Time between same key cache put logs is about 8-12 seconds usually.  Am I doing 
something wrong here ? Any way we can make a cache.put operation synchronously 
complete only upon a full node replication (not quite sure whether it helps 
though)?

Version: Ignite 1.9

Code to create the cache

IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setDiscoverySpi(getDiscoverySpi()); // static ip list on tcp discovery
cfg.setClientMode(false);
cfg.setIncludeEventTypes(EventType.EVT_NODE_SEGMENTED,EventType.EVT_NODE_FAILED);
Ignite ignite = Ignition.start(cfg);

ignite.events().localListen(event -> {
LOG.info("Cache event received: {} ", event);
return true;},EventType.EVT_NODE_SEGMENTED, EventType.EVT_NODE_FAILED);


CacheConfiguration cc  = new CacheConfiguration<>();
cc.setName("mycache1");
cc.setExpiryPolicyFactory(CreatedExpiryPolicy.factoryOf(new 
Duration(TimeUnit.MINUTES, 15)));
cc.setCacheMode(CacheMode.REPLICATED);
  LruEvictionPolicy evictionPolicy = new LruEvictionPolicy();
  evictionPolicy.setMaxMemorySize(500 * 1024 * 1024);
cc.setEvictionPolicy(evictionPolicy);

IgniteCache cache = ignite.getOrCreateCache(cc);


Code to cache operations.(following method can be accessed by multiple threads 
at the same time)


private static String processKey(String key) {
String value = cache.get(key);
if (value == null) {
cache.put(key, getNewValue());
LOG.info("Cache put key={} ", key);
} else {
LOG.info("Cache hit key={}", key);
}
return value;
}





Re: how to configure apache ignite as cache api and as spring cache provider

2018-09-12 Thread vkulichenko
Ignition.start is supposed to start an Ignite instance, so passing
spring-cache.xml file that doesn't contain any Ignite configuration doesn't
make sense. The SpringCacheManager bean should be part of the Spring
Application Context, it then will be used as an entry point to Ignite
cluster. It looks like you're using annotation based config for Spring, so
that's where you need to configure the bean. I believe you don't need this
second XML at all.

-Val



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Failed to wait for initial partition map exchange

2018-09-12 Thread ndipiazza3565
No. Persistence is disabled in my case. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Failed to wait for initial partition map exchange

2018-09-12 Thread eugene miretsky
Do you have persistence enabled?

On Wed, Sep 12, 2018 at 6:31 PM ndipiazza3565 <
nicholas.dipia...@lucidworks.com> wrote:

> I'm trying to build up a list of possible causes for this issue.
>
> I'm only really interested in the issues that occur after successful
> production deployments. Meaning the environment has been up for some time
> successfully, but then later on our ignite nodes will not start and stick
>
> But as of now, a certain bad behavior from a single node in the ignite
> cluster can cause a deadlock
>
> * Anything that causes one of the ignite nodes to become unresponsive
>   * oom
>   * high gc
>   * high cpu
>   * high disk usage
> * Network issues?
>
> I'm trying to get a list of the causes for this issue so I can troubleshoot
> further.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Failed to wait for initial partition map exchange

2018-09-12 Thread ndipiazza3565
I'm trying to build up a list of possible causes for this issue.

I'm only really interested in the issues that occur after successful
production deployments. Meaning the environment has been up for some time
successfully, but then later on our ignite nodes will not start and stick 

But as of now, a certain bad behavior from a single node in the ignite
cluster can cause a deadlock 

* Anything that causes one of the ignite nodes to become unresponsive 
  * oom
  * high gc
  * high cpu
  * high disk usage
* Network issues?

I'm trying to get a list of the causes for this issue so I can troubleshoot
further. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Query 3x slower with index

2018-09-12 Thread eugene miretsky
Thanks!

Tried joining with an inlined table instead of IN as per the second
suggestion, and it didn't quite work.

Query1:

   - Select COUNT(*) FROM( Select customer_id from GATABLE3  use Index( )
   where category_id in (9005, 175930, 175930, 175940,175945,101450, 6453)
   group by customer_id having SUM(product_views_app) > 2 OR
   SUM(product_clicks_app) > 1 )
   - exec time = 17s
   - *Result: 3105868*
   - Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
   customer_id index
   - Using an index on category_id increases the query time 33s

Query2:

   - Select COUNT(*) FROM( Select customer_id from GATABLE3 ga  use index
   (PUBLIC."_key_PK") inner join table(category_id int = (9005, 175930,
   175930, 175940,175945,101450, 6453)) cats on cats.category_id =
   ga.category_id   group by customer_id having SUM(product_views_app) > 2 OR
   SUM(product_clicks_app) > 1 )
   - exec time = 38s
   - *Result: 3113921*
   - Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
   customer_id index or category_id index
   - Using an index on category_id doesnt change the run time

Query plans are attached.

3 questions:

   1. Why is the result differnt for the 2 queries - this is quite
   concerning.
   2. Why is the 2nd query taking longer
   3. Why  category_id index doesn't work in case of query 2.


On Wed, Sep 5, 2018 at 8:31 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> I don't think that we're able to use index with IN () clauses. Please
> convert it into OR clauses.
>
> Please see
> https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-sql-performance-and-usability-considerations
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 3 сент. 2018 г. в 12:46, Andrey Mashenkov  >:
>
>> Hi
>>
>> Actually, first query uses index on affinity key which looks more
>> efficient than index on category_id column.
>> The first query can process groups one by one and stream partial results
>> from map phase to reduce phase as it use sorted index lookup,
>> while second query should process full dataset on map phase before pass
>> it for reducing.
>>
>> Try to use composite index (customer_id, category_id).
>>
>> Also, SqlQueryFields.setCollocated(true) flag can help Ignite to build
>> more efficient plan when group by on collocated column is used.
>>
>> On Sun, Sep 2, 2018 at 2:02 AM eugene miretsky 
>> wrote:
>>
>>> Hello,
>>>
>>> Schema:
>>>
>>>-
>>>
>>>PUBLIC.GATABLE2.CUSTOMER_ID
>>>
>>>PUBLIC.GATABLE2.DT
>>>
>>>PUBLIC.GATABLE2.CATEGORY_ID
>>>
>>>PUBLIC.GATABLE2.VERTICAL_ID
>>>
>>>PUBLIC.GATABLE2.SERVICE
>>>
>>>PUBLIC.GATABLE2.PRODUCT_VIEWS_APP
>>>
>>>PUBLIC.GATABLE2.PRODUCT_CLICKS_APP
>>>
>>>PUBLIC.GATABLE2.PRODUCT_VIEWS_WEB
>>>
>>>PUBLIC.GATABLE2.PRODUCT_CLICKS_WEB
>>>
>>>PUBLIC.GATABLE2.PDP_SESSIONS_APP
>>>
>>>PUBLIC.GATABLE2.PDP_SESSIONS_WEB
>>>- pkey = customer_id,dt
>>>- affinityKey = customer
>>>
>>> Query:
>>>
>>>- select COUNT(*) FROM( Select customer_id from GATABLE2 where
>>>category_id in (175925, 101450, 9005, 175930, 175930, 
>>> 175940,175945,101450,
>>>6453) group by customer_id having SUM(product_views_app) > 2 OR
>>>SUM(product_clicks_app) > 1 )
>>>
>>> The table has 600M rows.
>>> At first, the query took 1m, when we added an index on category_id the
>>> query started taking 3m.
>>>
>>> The SQL execution plan for both queries is attached.
>>>
>>> We are using a single x1.16xlarge insntace with query parallelism set
>>> to 32
>>>
>>> Cheers,
>>> Eugene
>>>
>>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>


Query1_pKeyIdx
Description: Binary data


Query1_categoryIdIdx
Description: Binary data


Query2_categoryIdx
Description: Binary data


Query2_pKeyIdx
Description: Binary data


Re: How much heap to allocate

2018-09-12 Thread eugene miretsky
Thanks!

For #2: wouldn't H2 need to bring the data into the heap to make the
queries? Or at least some of the date to do the group_by and sum operation?

On Mon, Sep 10, 2018 at 6:19 AM Vladimir Ozerov 
wrote:

> Hi Eugene,
>
> Answering your questions:
> 1) Grouping is performed on both mapper and reducer (coordinator). If you
> group be affinity key, you may try setting "SqlFieldsQuery.colocated=true"
> to bypass grouping on reducer
> 2) For this specific query H2 will store (customer_id, count(*),
> sum(views)) for every customer_id. It is hard to guess how much space it
> would take in heap, but I think it would be ~50-100 bytes per customer_id.
> So if you have N customers, it would be (100 * N) bytes
> 3) Please see
> https://apacheignite-sql.readme.io/docs/performance-and-debugging
>
> Vladimir.
>
> On Thu, Aug 30, 2018 at 5:57 PM eugene miretsky 
> wrote:
>
>> Thanks against for the detailed response!
>>
>> Our main use case is preforming large SQL queries over tables with 200M+
>> rows  - wanted to give you a bit more details and context you can pass along
>>
>> A simple example would be:
>>
>>- Table: customer_id, date, category, views, clicks ( pkey =
>>"customer_id, date", affinity key = date )
>>- Query: SELECT count(*) where date < X AND categroy in (C1, C2, C3)
>>GROUP BY customer_id HAVING SUM(views) > 20
>>
>> My main concernse are
>> 1) How is the group by performed. You mentioend that it is performend on
>> the coordinator, I was coping that singe we are grouping using an colomn
>> that is an affintiy key, each node will be able to do it's own group by
>> 2) How much heap should I allocate for the group by stage
>> 3) General performance tips
>>
>> Cheers,
>> Eugene
>>
>>
>> On Thu, Aug 30, 2018 at 1:32 AM Denis Magda  wrote:
>>
>>> Eugene,
>>>
>>> Just want to be sure you know about the existence of the following pages
>>> which elaborate on Ignite memory architecture in details:
>>>
>>>-
>>>
>>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Entriesandpagesindurablememory
>>>-
>>>
>>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>>>
>>>
>>>
 1) Are indexs loaded into heap (when used)?

>>>
>>> Something might be copied to disk but in most of the cases we perform
>>> comparisons and other operations directly off-heap.
>>> See 
>>> org.apache.ignite.internal.processors.query.h2.database.InlineIndexHelper
>>> and related classes.
>>>
>>> 2) Are full pages loaded into heap, or only the matching records?

>>>
>>> Matching records (result set) are presently loaded. The pages are not.
>>>
>>>
 3) When the query needs more processing than the exisiting index
 (non-indexed columns, groupBy, aggreag) where/how does it happen?

>>>
>>> We will be doing a full scan. Grouping and aggregations are finalized on
>>> the query coordinator which needs to get a full result set.
>>>
>>> 4) How is the query coordinator chosen? Is it the client node? How about
 when using the web console?

>>>
>>> That's your application. Web Console uses Ignite SQL APIs as well.
>>>
>>>
 5) What paralalism settings would your recomend, we were thinking to
 set parallelJobsNumber  to 1  and task parallelism to number of cores * 2 -
 this way we can make sure that each job gets al the heap memory instead of
 all jobs fighting each other. Not sure if it makes sense, and it will also
 prevent us from making real time transactional transactional queries.(we
 are hoping to use ignite for both olap and simple real time queries)
>>>
>>>
>>> I would start a separate discussion for this bringing this question to
>>> the attention of our SQL experts. I'm not the one of them.
>>>
>>> --
>>> Denis
>>>
>>> On Mon, Aug 27, 2018 at 8:54 PM eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
 Denis, thanks for the detailed response.

 A few more follow up questions
 1) Are indexs loaded into heap (when used)?
 2) Are full pages loaded into heap, or only the matching records?
 3) When the query needs more processing than the exisiting index
 (non-indexed columns, groupBy, aggreag) where/how does it happen?
 4) How is the query coordinator chosen? Is it the client node? How
 about when using the web console?
 5) What paralalism settings would your recomend, we were thinking to
 set parallelJobsNumber  to 1  and task parallelism to number of cores * 2 -
 this way we can make sure that each job gets al the heap memory instead of
 all jobs fighting each other. Not sure if it makes sense, and it will also
 prevent us from making real time transactional transactional queries.(we
 are hoping to use ignite for both olap and simple real time queries)

 Cheers,
 Eugene


 On Sat, Aug 25, 2018 at 3:25 AM Denis Magda  wrote:

> Hello Eugene,

how to configure apache ignite as cache api and as spring cache provider

2018-09-12 Thread mshah
I want to use ignite as cache api as well as spring cache provider. I am
running following configuration in ignite-config.xml file as below

http://www.springframework.org/schema/beans;
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
   xmlns:cache="http://www.springframework.org/schema/cache;
   xsi:schemaLocation="http://www.springframework.org/schema/beans
   
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/cache
   
http://www.springframework.org/schema/cache/spring-cache.xsd;
>





  
  






























 
  
  10.80.211.76









I want to use ignite as cache api as well as spring cache provider. I am
running following configuration in ignite-config.xml file as below

http://www.springframework.org/schema/beans;
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
   xmlns:cache="http://www.springframework.org/schema/cache;
   xsi:schemaLocation="http://www.springframework.org/schema/beans
   
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/cache
   
http://www.springframework.org/schema/cache/spring-cache.xsd;
>





  
  






























 
  
  10.80.211.76







I am initializing the ignite node as below.

ApplicationContext context= SpringApplication.run(Application.class, args);
Ignite ignite=IgniteSpring.start("ignite-config.xml", context);

The node gets up and running. Now I am using second configuration file for
spring cache manager spring-cache.xml


http://www.springframework.org/schema/beans;
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
   xmlns:cache="http://www.springframework.org/schema/cache;
   xsi:schemaLocation="http://www.springframework.org/schema/beans
   
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/cache

http://www.springframework.org/schema/cache/spring-cache.xsd;
>


 


 





and I am initializing this file as below

Ignite ignite=Ignition.start("spring-cache.xml");

I am getting the below exception

Caused by: class org.apache.ignite.IgniteCheckedException: Failed to find
configuration in: file:/E:/Workspace/ms.claim/spring-cache.xml at
org.apache.ignite.internal.util.spring.IgniteSpringHelperImpl.loadConfigurations(IgniteSpringHelperImpl.java:116)
at
org.apache.ignite.internal.util.spring.IgniteSpringHelperImpl.loadConfigurations(IgniteSpringHelperImpl.java:98)
at
org.apache.ignite.internal.IgnitionEx.loadConfigurations(IgnitionEx.java:744)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:945) at
org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:854) at
org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:724) at
org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:693) at
org.apache.ignite.Ignition.start(Ignition.java:352) ... 2 more

While debugging the ignite source code I found that ignite internally uses
SpringHelper to get the Spring Application Context and after that it tries
to fetch IgniteConfiguration instance as spring bean using getBeansOfType()
method in one of the classes and from there it does not get the
configuration instance and throws the above exception.

As mentioned in the Ignite documentation I am using the same gridName or
instanceName property in SpringCacheManager in spring-cache.xml.

Can somebody please check the issue? Also I am not sure I am following the
correct method. First I am staring the ignite node with 

Re: Partition map exchange in detail

2018-09-12 Thread Вячеслав Коптилин
Hello Eugene,

I hope you meant PME (partitions map exchange) instead of NPE :)

> What constitutes a transaction in this context?
If I am not mistaken, it is about Ignite transactions.
Please take a look at this page
https://apacheignite.readme.io/docs/transactions

> Does it mean that if the cluster constantly receives transaction
requests, NPE will never happen?
> Or will all transactions that were received after the NPE request wait
for the NPE to complete?
Transactions that were initiated after the PME request will wait for the
PME is completed.

Thanks,
S.

ср, 12 сент. 2018 г. в 22:51, eugene miretsky :

> Make sense
> I think the actual issue that was affecting me is
> https://issues.apache.org/jira/browse/IGNITE-9562. (which IEP-25 should
> solve).
>
> Final 2 questions:
> 1) If all NPE waits for all pending transactions
>   a) What constitutes a transaction in this context? (any query, a SQL
> transaction, etc)
>   b) Does it mean that if the cluster constantly receives transaction
> requests, NPE will never happen? (Or will all transactions that were
> received after the NPE request wait for the NPE to complete?)
> 2) Any other advice on how to avoid NPE? (transaction timeouts, graceful
> shutdown/restart of nodes, etc)
>
> Cheers,
> Eugene
>
>
>
>
>
> On Wed, Sep 12, 2018 at 12:18 PM Pavel Kovalenko 
> wrote:
>
>> Eugene,
>>
>> In the case of Zookeeper Discovery is enabled and communication problem
>> between some nodes, a subset of problem nodes will be automatically killed
>> to reach cluster state where each node can communicate with each other
>> without problems. So, you're absolutely right, dead nodes will be removed
>> from a cluster and will not participate in PME.
>> IEP-25 is trying to solve a more general problem related only to PME.
>> Network problems are only part of the problem can happen during PME. A node
>> may break down before it even tried to send a message because of unexpected
>> exceptions (e.g. NullPointer, Runtime, Assertion e.g.). In general, IEP-25
>> tries to defend us from any kind of unexpected problems to make sure that
>> PME will not be blocked in that case and the cluster will continue to live.
>>
>>
>> ср, 12 сент. 2018 г. в 18:53, eugene miretsky > >:
>>
>>> Hi Pavel,
>>>
>>> The issue we are discussing is PME failing because one node cannot
>>> communicate to another node, that's what IEP-25 is trying to solve. But in
>>> that case (where one node is either down, or there is a communication
>>> problem between two nodes) I would expect the split brain resolver to kick
>>> in, and shut down one of the nodes. I would also expect the dead node to be
>>> removed from the cluster, and no longer take part in PME.
>>>
>>>
>>>
>>> On Wed, Sep 12, 2018 at 11:25 AM Pavel Kovalenko 
>>> wrote:
>>>
 Hi Eugene,

 Sorry, but I didn't catch the meaning of your question about Zookeeper
 Discovery. Could you please re-phrase it?

 ср, 12 сент. 2018 г. в 17:54, Ilya Lantukh :

> Pavel K., can you please answer about Zookeeper discovery?
>
> On Wed, Sep 12, 2018 at 5:49 PM, eugene miretsky <
> eugene.miret...@gmail.com> wrote:
>
>> Thanks for the patience with my questions - just trying to understand
>> the system better.
>>
>> 3) I was referring to
>> https://apacheignite.readme.io/docs/zookeeper-discovery#section-failures-and-split-brain-handling.
>> How come it doesn't get the node to shut down?
>> 4) Are there any docs/JIRAs that explain how counters are used, and
>> why they are required in the state?
>>
>> Cheers,
>> Eugene
>>
>>
>> On Wed, Sep 12, 2018 at 10:04 AM Ilya Lantukh 
>> wrote:
>>
>>> 3) Such mechanics will be implemented in IEP-25 (linked above).
>>> 4) Partition map states include update counters, which are
>>> incremented on every cache update and play important role in new state
>>> calculation. So, technically, every cache operation can lead to 
>>> partition
>>> map change, and for obvious reasons we can't route them through
>>> coordinator. Ignite is a more complex system than Akka or Kafka and such
>>> simple solutions won't work here (in general case). However, it is true
>>> that PME could be simplified or completely avoid for certain cases and 
>>> the
>>> community is currently working on such optimizations (
>>> https://issues.apache.org/jira/browse/IGNITE-9558 for example).
>>>
>>> On Wed, Sep 12, 2018 at 9:08 AM, eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
 2b) I had a few situations where the cluster went into a state
 where PME constantly failed, and could never recover. I think the root
 cause was that a transaction got stuck and didn't timeout/rollback.  I 
 will
 try to reproduce it again and get back to you
 3) If a node is down, I would expect it to get detected and the
 node to get removed 

IGNITE-8386 question (composite pKeys)

2018-09-12 Thread eugene miretsky
Hi,

A question regarding
https://issues.apache.org/jira/browse/IGNITE-8386?focusedCommentId=16511394=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16511394

It states that a pkey index with a  compoise pKey is "effectively useless".
Could you please explain why is that? We have a pKey that we are using as
an index.

Also, on our pKey is (customer_id, date) and affinity column is
customer_id. I have noticed that most queries use AFFINITY_KEY index.
Looking at the source code, AFFINITY_KEY index should not even be created
since the first field of the pKey  is the affinity key. Any idea what may
be happening?

Cheers,
Eugene


Re: POJO field having wrapper type, mapped to cassandra table are getting initialized to respective default value of primitive type instead of null if column value is null.

2018-09-12 Thread Denis Magda
Igor R,

Could you please review this C* contribution?

--
Denis

On Wed, Sep 12, 2018 at 6:33 AM Dmitriy Pavlov 
wrote:

> Hi Igniters,
>
> I can see that ticket is still in patch available state.
>
> Denis M.
>
> could you please review the patch?
>
> Sincerely,
> Dmitriy Pavlov
>
> вт, 26 сент. 2017 г. в 12:10, Denis Mekhanikov :
>
>> There is a page in confluence with description of the process:
>> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>>
>> To get permission to assign tickets to yourself you should write a letter
>> to dev list and ask for it by telling your JIRA username.
>>
>> While waiting for JIRA permissions you can start configuring your work
>> environment as described in the article and working on the fix.
>>
>> Denis
>>
>> вт, 26 сент. 2017 г. в 12:00, kotamrajuyashasvi <
>> kotamrajuyasha...@gmail.com>:
>>
>>> Hi Denis
>>>
>>> I want to work on this issue. But I'm a newbie and I do not know the
>>> actual
>>> process like, How to assign the ticket to myself(JIRA username:
>>> kotamrajuyashasvi) and who would review the code , merging the code etc.
>>> Do
>>> you know any links which explain the process?
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>


Re: Partition map exchange in detail

2018-09-12 Thread eugene miretsky
Make sense
I think the actual issue that was affecting me is
https://issues.apache.org/jira/browse/IGNITE-9562. (which IEP-25 should
solve).

Final 2 questions:
1) If all NPE waits for all pending transactions
  a) What constitutes a transaction in this context? (any query, a SQL
transaction, etc)
  b) Does it mean that if the cluster constantly receives transaction
requests, NPE will never happen? (Or will all transactions that were
received after the NPE request wait for the NPE to complete?)
2) Any other advice on how to avoid NPE? (transaction timeouts, graceful
shutdown/restart of nodes, etc)

Cheers,
Eugene





On Wed, Sep 12, 2018 at 12:18 PM Pavel Kovalenko  wrote:

> Eugene,
>
> In the case of Zookeeper Discovery is enabled and communication problem
> between some nodes, a subset of problem nodes will be automatically killed
> to reach cluster state where each node can communicate with each other
> without problems. So, you're absolutely right, dead nodes will be removed
> from a cluster and will not participate in PME.
> IEP-25 is trying to solve a more general problem related only to PME.
> Network problems are only part of the problem can happen during PME. A node
> may break down before it even tried to send a message because of unexpected
> exceptions (e.g. NullPointer, Runtime, Assertion e.g.). In general, IEP-25
> tries to defend us from any kind of unexpected problems to make sure that
> PME will not be blocked in that case and the cluster will continue to live.
>
>
> ср, 12 сент. 2018 г. в 18:53, eugene miretsky :
>
>> Hi Pavel,
>>
>> The issue we are discussing is PME failing because one node cannot
>> communicate to another node, that's what IEP-25 is trying to solve. But in
>> that case (where one node is either down, or there is a communication
>> problem between two nodes) I would expect the split brain resolver to kick
>> in, and shut down one of the nodes. I would also expect the dead node to be
>> removed from the cluster, and no longer take part in PME.
>>
>>
>>
>> On Wed, Sep 12, 2018 at 11:25 AM Pavel Kovalenko 
>> wrote:
>>
>>> Hi Eugene,
>>>
>>> Sorry, but I didn't catch the meaning of your question about Zookeeper
>>> Discovery. Could you please re-phrase it?
>>>
>>> ср, 12 сент. 2018 г. в 17:54, Ilya Lantukh :
>>>
 Pavel K., can you please answer about Zookeeper discovery?

 On Wed, Sep 12, 2018 at 5:49 PM, eugene miretsky <
 eugene.miret...@gmail.com> wrote:

> Thanks for the patience with my questions - just trying to understand
> the system better.
>
> 3) I was referring to
> https://apacheignite.readme.io/docs/zookeeper-discovery#section-failures-and-split-brain-handling.
> How come it doesn't get the node to shut down?
> 4) Are there any docs/JIRAs that explain how counters are used, and
> why they are required in the state?
>
> Cheers,
> Eugene
>
>
> On Wed, Sep 12, 2018 at 10:04 AM Ilya Lantukh 
> wrote:
>
>> 3) Such mechanics will be implemented in IEP-25 (linked above).
>> 4) Partition map states include update counters, which are
>> incremented on every cache update and play important role in new state
>> calculation. So, technically, every cache operation can lead to partition
>> map change, and for obvious reasons we can't route them through
>> coordinator. Ignite is a more complex system than Akka or Kafka and such
>> simple solutions won't work here (in general case). However, it is true
>> that PME could be simplified or completely avoid for certain cases and 
>> the
>> community is currently working on such optimizations (
>> https://issues.apache.org/jira/browse/IGNITE-9558 for example).
>>
>> On Wed, Sep 12, 2018 at 9:08 AM, eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> 2b) I had a few situations where the cluster went into a state where
>>> PME constantly failed, and could never recover. I think the root cause 
>>> was
>>> that a transaction got stuck and didn't timeout/rollback.  I will try to
>>> reproduce it again and get back to you
>>> 3) If a node is down, I would expect it to get detected and the node
>>> to get removed from the cluster. In such case, PME should not even be
>>> attempted with that node. Hence you would expect PME to fail very rarely
>>> (any faulty node will be removed before it has a chance to fail PME)
>>> 4) Don't all partition map changes go through the coordinator? I
>>> believe a lot of distributed systems work in this way (all decisions are
>>> made by the coordinator/leader) - In Akka the leader is responsible for
>>> making all cluster membership changes, in Kafka the controller does the
>>> leader election.
>>>
>>> On Tue, Sep 11, 2018 at 11:11 AM Ilya Lantukh 
>>> wrote:
>>>
 1) It is.
 2a) Ignite has retry mechanics for all messages, including
 PME-related ones.

Re: Unable to connect ignite pods in Kubernetes using Ip-finder

2018-09-12 Thread Denis Magda
Hi Rishikesh,

Thanks for pointing out to this. I've updated the stateless deployment doc.

--
Denis

On Tue, Sep 11, 2018 at 7:06 AM rishi007bansod 
wrote:

> "serviceAccountName: ignite" should be present in Pod Deployment
> specification as mentioned by Anton in post
>
> https://stackoverflow.com/questions/49395481/how-to-setmasterurl-in-ignite-xml-config-for-kubernetes-ipfinder/49405879#49405879
> <
> https://stackoverflow.com/questions/49395481/how-to-setmasterurl-in-ignite-xml-config-for-kubernetes-ipfinder/49405879#49405879>
>
> .  It is currently absent in
> https://apacheignite.readme.io/docs/stateless-deployment
> 
> "ignite-deployment.yaml" file
>
> Thanks,
> Rishikesh
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: How to throttle/limit the cache-store read threads?

2018-09-12 Thread Denis Magda
System thread pool is used for cache and cache store operation. However, I
would discourage you from limiting it to 1 thread.

In general, extend the cache store implementation you use and forward all
CacheStore.read() operations to your single-threaded pool.

--
Denis


On Sun, Sep 9, 2018 at 11:41 AM Saikat Maitra 
wrote:

> Hi Mridul,
>
> Have you considered the option of creating a dedicated Threadpool for the
> cache store operations. Here is an example
>
> http://commons.apache.org/dormant/threadpool/
>
> Another option would be to consider Hystrix Command Threadpool since you
> mentioned that the cache store is remote web service.
>
> https://github.com/Netflix/Hystrix/wiki/How-To-Use#Command%20Thread-Pool
>
> HTH
> Regards
> Saikat
>
> On Sun, Sep 9, 2018 at 6:04 AM, Mridul Chopra 
> wrote:
>
>> Hello,
>>
>> I have implemented a cache store for read-through ignite caching, whenever
>> there is a read-through operation through the cache-store, I want to limit
>> this to single thread or at max 2. The reason behind the same is due to
>> the
>> fact that the underlying cache store is a remote Rest based webservice
>> that
>> can only support limited number of connections. Hence I want to have
>> limited number of requests being sent for read-through. Please note that
>> if
>> I configure publicThreadPoolSize =1 , this would mean all cache.get and
>> cache.getAll operations would be single threaded. Hence this behaviour is
>> not desired, I want to have throttling at the cache store level only.
>> Thanks,
>> Mridul
>>
>
>


Re: Node keeps crashing under load

2018-09-12 Thread eugene miretsky
Good question :)
yardstick does this, but not sure if it is a valid prod solution.
https://github.com/apache/ignite/blob/3307a8b26ccb5f0bb7e9c387c73fd221b98ab668/modules/yardstick/src/main/java/org/apache/ignite/yardstick/jdbc/AbstractJdbcBenchmark.java

We have set preferIPv4Stack=true and provided localAddress in the config -
it seems to have solved the problem. (Didn't run it enough to be 100% sure)

On Wed, Sep 12, 2018 at 10:59 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> How would you distinguish the wrong interface (172.17.0.1) from the right
> one if you were Ignite?
>
> I think it's not the first time I have seen this problem but I have
> positively no idea how to tackle it.
> Maybe Docker experts could chime in?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 12 сент. 2018 г. в 3:29, eugene miretsky :
>
>> Thanks Ilya,
>>
>> We are writing to Ignite from Spark running in EMR. We don't know the
>> address of the node in advance, we have tried
>> 1) Set localHost in Ignite configuration to 127.0.0.1, as per the example
>> online
>> 2) Leave localHost unset, and let ignite figure out the host
>>
>> I have attached more logs at the end.
>>
>> My understanding is that Ignite should pick the first non-local address
>> to publish, however, it seems like it picks randomly one of (a) proper
>> address, (b) ipv6 address, (c) 127.0.0.1, (d)  172.17.0.1.
>>
>> A few questions:
>> 1) How do we force Spark client to use the proper address
>> 2) Where is 172.17.0.1 coming from? It is usually the default docker
>> network host address, and it seems like Ignite creates a network interface
>> for it on the instance. (otherwise I have no idea where the interface is
>> coming from)
>> 3) If there are communication errors, shouldn't the Zookeeper split brain
>> resolver kick in and shut down the dead node. Or shouldn't at least the
>> initiating node mark the remote node as dead?
>>
>> [19:36:26,189][INFO][grid-nio-worker-tcp-comm-15-#88%Server%][TcpCommunicationSpi]
>> Accepted incoming communication connection [locAddr=/172.17.0.1:47100,
>> rmtAddr=/172.21.86.7:41648]
>>
>> [19:36:26,190][INFO][grid-nio-worker-tcp-comm-3-#76%Server%][TcpCommunicationSpi]
>> Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100,
>> rmtAddr=/0:0:0:0:0:0:0:1:52484]
>>
>> [19:36:26,191][INFO][grid-nio-worker-tcp-comm-5-#78%Server%][TcpCommunicationSpi]
>> Accepted incoming communication connection [locAddr=/127.0.0.1:47100,
>> rmtAddr=/127.0.0.1:37656]
>>
>> [19:36:26,191][INFO][grid-nio-worker-tcp-comm-1-#74%Server%][TcpCommunicationSpi]
>> Established outgoing communication connection [locAddr=/172.21.86.7:53272,
>> rmtAddr=ip-172-21-86-175.ap-south-1.compute.internal/172.21.86.175:47100]
>>
>> [19:36:26,191][INFO][grid-nio-worker-tcp-comm-0-#73%Server%][TcpCommunicationSpi]
>> Established outgoing communication connection [locAddr=/172.17.0.1:41648,
>> rmtAddr=ip-172-17-0-1.ap-south-1.compute.internal/172.17.0.1:47100]
>>
>> [19:36:26,193][INFO][grid-nio-worker-tcp-comm-4-#77%Server%][TcpCommunicationSpi]
>> Established outgoing communication connection [locAddr=/127.0.0.1:37656,
>> rmtAddr=/127.0.0.1:47100]
>>
>> [19:36:26,193][INFO][grid-nio-worker-tcp-comm-2-#75%Server%][TcpCommunicationSpi]
>> Established outgoing communication connection
>> [locAddr=/0:0:0:0:0:0:0:1:52484, rmtAddr=/0:0:0:0:0:0:0:1%lo:47100]
>>
>> [19:36:26,195][INFO][grid-nio-worker-tcp-comm-8-#81%Server%][TcpCommunicationSpi]
>> Accepted incoming communication connection [locAddr=/172.17.0.1:47100,
>> rmtAddr=/172.21.86.7:41656]
>>
>> [19:36:26,195][INFO][grid-nio-worker-tcp-comm-10-#83%Server%][TcpCommunicationSpi]
>> Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100,
>> rmtAddr=/0:0:0:0:0:0:0:1:52492]
>>
>> [19:36:26,195][INFO][grid-nio-worker-tcp-comm-12-#85%Server%][TcpCommunicationSpi]
>> Accepted incoming communication connection [locAddr=/127.0.0.1:47100,
>> rmtAddr=/127.0.0.1:37664]
>>
>> [19:36:26,196][INFO][grid-nio-worker-tcp-comm-7-#80%Server%][TcpCommunicationSpi]
>> Established outgoing communication connection [locAddr=/172.21.86.7:41076,
>> rmtAddr=ip-172-21-86-229.ap-south-1.compute.internal/172.21.86.229:47100]
>>
>>
>>
>>
>> On Mon, Sep 10, 2018 at 12:04 PM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> I can see a lot of errors like this one:
>>>
>>> [04:05:29,268][INFO][tcp-comm-worker-#1%Server%][ZookeeperDiscoveryImpl]
>>> Created new communication error process future
>>> [errNode=598e3ead-99b8-4c49-b7df-04d578dcbf5f, err=class
>>> org.apache.ignite.IgniteCheckedException: Failed to connect to node (is
>>> node still alive?). Make sure that each ComputeTask and cache Transaction
>>> has a timeout set in order to prevent parties from waiting forever in case
>>> of network issues [nodeId=598e3ead-99b8-4c49-b7df-04d578dcbf5f,
>>> addrs=[ip-172-17-0-1.ap-south-1.compute.internal/172.17.0.1:47100,
>>> ip-172-21-85-213.ap-south-1.compute.internal/172.21.85.213:47100,
>>> 

Re: Partition map exchange in detail

2018-09-12 Thread Pavel Kovalenko
Eugene,

In the case of Zookeeper Discovery is enabled and communication problem
between some nodes, a subset of problem nodes will be automatically killed
to reach cluster state where each node can communicate with each other
without problems. So, you're absolutely right, dead nodes will be removed
from a cluster and will not participate in PME.
IEP-25 is trying to solve a more general problem related only to PME.
Network problems are only part of the problem can happen during PME. A node
may break down before it even tried to send a message because of unexpected
exceptions (e.g. NullPointer, Runtime, Assertion e.g.). In general, IEP-25
tries to defend us from any kind of unexpected problems to make sure that
PME will not be blocked in that case and the cluster will continue to live.


ср, 12 сент. 2018 г. в 18:53, eugene miretsky :

> Hi Pavel,
>
> The issue we are discussing is PME failing because one node cannot
> communicate to another node, that's what IEP-25 is trying to solve. But in
> that case (where one node is either down, or there is a communication
> problem between two nodes) I would expect the split brain resolver to kick
> in, and shut down one of the nodes. I would also expect the dead node to be
> removed from the cluster, and no longer take part in PME.
>
>
>
> On Wed, Sep 12, 2018 at 11:25 AM Pavel Kovalenko 
> wrote:
>
>> Hi Eugene,
>>
>> Sorry, but I didn't catch the meaning of your question about Zookeeper
>> Discovery. Could you please re-phrase it?
>>
>> ср, 12 сент. 2018 г. в 17:54, Ilya Lantukh :
>>
>>> Pavel K., can you please answer about Zookeeper discovery?
>>>
>>> On Wed, Sep 12, 2018 at 5:49 PM, eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
 Thanks for the patience with my questions - just trying to understand
 the system better.

 3) I was referring to
 https://apacheignite.readme.io/docs/zookeeper-discovery#section-failures-and-split-brain-handling.
 How come it doesn't get the node to shut down?
 4) Are there any docs/JIRAs that explain how counters are used, and why
 they are required in the state?

 Cheers,
 Eugene


 On Wed, Sep 12, 2018 at 10:04 AM Ilya Lantukh 
 wrote:

> 3) Such mechanics will be implemented in IEP-25 (linked above).
> 4) Partition map states include update counters, which are incremented
> on every cache update and play important role in new state calculation. 
> So,
> technically, every cache operation can lead to partition map change, and
> for obvious reasons we can't route them through coordinator. Ignite is a
> more complex system than Akka or Kafka and such simple solutions won't 
> work
> here (in general case). However, it is true that PME could be simplified 
> or
> completely avoid for certain cases and the community is currently working
> on such optimizations (
> https://issues.apache.org/jira/browse/IGNITE-9558 for example).
>
> On Wed, Sep 12, 2018 at 9:08 AM, eugene miretsky <
> eugene.miret...@gmail.com> wrote:
>
>> 2b) I had a few situations where the cluster went into a state where
>> PME constantly failed, and could never recover. I think the root cause 
>> was
>> that a transaction got stuck and didn't timeout/rollback.  I will try to
>> reproduce it again and get back to you
>> 3) If a node is down, I would expect it to get detected and the node
>> to get removed from the cluster. In such case, PME should not even be
>> attempted with that node. Hence you would expect PME to fail very rarely
>> (any faulty node will be removed before it has a chance to fail PME)
>> 4) Don't all partition map changes go through the coordinator? I
>> believe a lot of distributed systems work in this way (all decisions are
>> made by the coordinator/leader) - In Akka the leader is responsible for
>> making all cluster membership changes, in Kafka the controller does the
>> leader election.
>>
>> On Tue, Sep 11, 2018 at 11:11 AM Ilya Lantukh 
>> wrote:
>>
>>> 1) It is.
>>> 2a) Ignite has retry mechanics for all messages, including
>>> PME-related ones.
>>> 2b) In this situation PME will hang, but it isn't a "deadlock".
>>> 3) Sorry, I didn't understand your question. If a node is down, but
>>> DiscoverySpi doesn't detect it, it isn't PME-related problem.
>>> 4) How can you ensure that partition maps on coordinator are *latest
>>> *without "freezing" cluster state for some time?
>>>
>>> On Sat, Sep 8, 2018 at 3:21 AM, eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
 Thanks!

 We are using persistence, so I am not sure if shutting down nodes
 will be the desired outcome for us since we would need to modify the
 baseline topolgy.

 A couple more follow up questions

 1) Is PME triggered when client nodes 

Re: a node fails and restarts in a cluster

2018-09-12 Thread Pavel Kovalenko
Hi Eugene,

I've reproduced your problem and filed a ticket for that:
https://issues.apache.org/jira/browse/IGNITE-9562

As a temporary workaround, I can suggest you delete persistence data
(cache.dat and partition files) related to that cache in starting node work
directory or don't destroy caches without necessary if your baseline is not
complete.

вт, 11 сент. 2018 г. в 16:50, es70 :

> Hi Pavel
>
> I've  prepared the logs you requested. Please download it from this link
>
> https://cloud.mail.ru/public/A9wK/bKGEXK397
>
> hope this will help
>
> regards,
> Evgeny
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Partition map exchange in detail

2018-09-12 Thread eugene miretsky
Hi Pavel,

The issue we are discussing is PME failing because one node cannot
communicate to another node, that's what IEP-25 is trying to solve. But in
that case (where one node is either down, or there is a communication
problem between two nodes) I would expect the split brain resolver to kick
in, and shut down one of the nodes. I would also expect the dead node to be
removed from the cluster, and no longer take part in PME.



On Wed, Sep 12, 2018 at 11:25 AM Pavel Kovalenko  wrote:

> Hi Eugene,
>
> Sorry, but I didn't catch the meaning of your question about Zookeeper
> Discovery. Could you please re-phrase it?
>
> ср, 12 сент. 2018 г. в 17:54, Ilya Lantukh :
>
>> Pavel K., can you please answer about Zookeeper discovery?
>>
>> On Wed, Sep 12, 2018 at 5:49 PM, eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> Thanks for the patience with my questions - just trying to understand
>>> the system better.
>>>
>>> 3) I was referring to
>>> https://apacheignite.readme.io/docs/zookeeper-discovery#section-failures-and-split-brain-handling.
>>> How come it doesn't get the node to shut down?
>>> 4) Are there any docs/JIRAs that explain how counters are used, and why
>>> they are required in the state?
>>>
>>> Cheers,
>>> Eugene
>>>
>>>
>>> On Wed, Sep 12, 2018 at 10:04 AM Ilya Lantukh 
>>> wrote:
>>>
 3) Such mechanics will be implemented in IEP-25 (linked above).
 4) Partition map states include update counters, which are incremented
 on every cache update and play important role in new state calculation. So,
 technically, every cache operation can lead to partition map change, and
 for obvious reasons we can't route them through coordinator. Ignite is a
 more complex system than Akka or Kafka and such simple solutions won't work
 here (in general case). However, it is true that PME could be simplified or
 completely avoid for certain cases and the community is currently working
 on such optimizations (
 https://issues.apache.org/jira/browse/IGNITE-9558 for example).

 On Wed, Sep 12, 2018 at 9:08 AM, eugene miretsky <
 eugene.miret...@gmail.com> wrote:

> 2b) I had a few situations where the cluster went into a state where
> PME constantly failed, and could never recover. I think the root cause was
> that a transaction got stuck and didn't timeout/rollback.  I will try to
> reproduce it again and get back to you
> 3) If a node is down, I would expect it to get detected and the node
> to get removed from the cluster. In such case, PME should not even be
> attempted with that node. Hence you would expect PME to fail very rarely
> (any faulty node will be removed before it has a chance to fail PME)
> 4) Don't all partition map changes go through the coordinator? I
> believe a lot of distributed systems work in this way (all decisions are
> made by the coordinator/leader) - In Akka the leader is responsible for
> making all cluster membership changes, in Kafka the controller does the
> leader election.
>
> On Tue, Sep 11, 2018 at 11:11 AM Ilya Lantukh 
> wrote:
>
>> 1) It is.
>> 2a) Ignite has retry mechanics for all messages, including
>> PME-related ones.
>> 2b) In this situation PME will hang, but it isn't a "deadlock".
>> 3) Sorry, I didn't understand your question. If a node is down, but
>> DiscoverySpi doesn't detect it, it isn't PME-related problem.
>> 4) How can you ensure that partition maps on coordinator are *latest
>> *without "freezing" cluster state for some time?
>>
>> On Sat, Sep 8, 2018 at 3:21 AM, eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> Thanks!
>>>
>>> We are using persistence, so I am not sure if shutting down nodes
>>> will be the desired outcome for us since we would need to modify the
>>> baseline topolgy.
>>>
>>> A couple more follow up questions
>>>
>>> 1) Is PME triggered when client nodes join us well? We are using
>>> Spark client, so new nodes are created/destroy every time.
>>> 2) It sounds to me like there is a pontential for the cluster to get
>>> into a deadlock if
>>>a) single PME message is lost (PME never finishes, there are no
>>> retries, and all future operations are blocked on the pending PME)
>>>b) one of the nodes has a  long running/stuck pending operation
>>> 3) Under what circumastance can PME fail, while DiscoverySpi fails
>>> to detect the node being down? We are using ZookeeperSpi so I would 
>>> expect
>>> the split brain resolver to shut down the node.
>>> 4) Why is PME needed? Doesn't the coordinator know the altest
>>> toplogy/pertition map of the cluster through regualr gossip?
>>>
>>> Cheers,
>>> Eugene
>>>
>>> On Fri, Sep 7, 2018 at 5:18 PM Ilya Lantukh 
>>> wrote:
>>>
 Hi Eugene,

 1) PME happens when topology is 

Re: Partition map exchange in detail

2018-09-12 Thread Pavel Kovalenko
Hi Eugene,

Sorry, but I didn't catch the meaning of your question about Zookeeper
Discovery. Could you please re-phrase it?

ср, 12 сент. 2018 г. в 17:54, Ilya Lantukh :

> Pavel K., can you please answer about Zookeeper discovery?
>
> On Wed, Sep 12, 2018 at 5:49 PM, eugene miretsky <
> eugene.miret...@gmail.com> wrote:
>
>> Thanks for the patience with my questions - just trying to understand the
>> system better.
>>
>> 3) I was referring to
>> https://apacheignite.readme.io/docs/zookeeper-discovery#section-failures-and-split-brain-handling.
>> How come it doesn't get the node to shut down?
>> 4) Are there any docs/JIRAs that explain how counters are used, and why
>> they are required in the state?
>>
>> Cheers,
>> Eugene
>>
>>
>> On Wed, Sep 12, 2018 at 10:04 AM Ilya Lantukh 
>> wrote:
>>
>>> 3) Such mechanics will be implemented in IEP-25 (linked above).
>>> 4) Partition map states include update counters, which are incremented
>>> on every cache update and play important role in new state calculation. So,
>>> technically, every cache operation can lead to partition map change, and
>>> for obvious reasons we can't route them through coordinator. Ignite is a
>>> more complex system than Akka or Kafka and such simple solutions won't work
>>> here (in general case). However, it is true that PME could be simplified or
>>> completely avoid for certain cases and the community is currently working
>>> on such optimizations (https://issues.apache.org/jira/browse/IGNITE-9558
>>> for example).
>>>
>>> On Wed, Sep 12, 2018 at 9:08 AM, eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
 2b) I had a few situations where the cluster went into a state where
 PME constantly failed, and could never recover. I think the root cause was
 that a transaction got stuck and didn't timeout/rollback.  I will try to
 reproduce it again and get back to you
 3) If a node is down, I would expect it to get detected and the node to
 get removed from the cluster. In such case, PME should not even be
 attempted with that node. Hence you would expect PME to fail very rarely
 (any faulty node will be removed before it has a chance to fail PME)
 4) Don't all partition map changes go through the coordinator? I
 believe a lot of distributed systems work in this way (all decisions are
 made by the coordinator/leader) - In Akka the leader is responsible for
 making all cluster membership changes, in Kafka the controller does the
 leader election.

 On Tue, Sep 11, 2018 at 11:11 AM Ilya Lantukh 
 wrote:

> 1) It is.
> 2a) Ignite has retry mechanics for all messages, including PME-related
> ones.
> 2b) In this situation PME will hang, but it isn't a "deadlock".
> 3) Sorry, I didn't understand your question. If a node is down, but
> DiscoverySpi doesn't detect it, it isn't PME-related problem.
> 4) How can you ensure that partition maps on coordinator are *latest 
> *without
> "freezing" cluster state for some time?
>
> On Sat, Sep 8, 2018 at 3:21 AM, eugene miretsky <
> eugene.miret...@gmail.com> wrote:
>
>> Thanks!
>>
>> We are using persistence, so I am not sure if shutting down nodes
>> will be the desired outcome for us since we would need to modify the
>> baseline topolgy.
>>
>> A couple more follow up questions
>>
>> 1) Is PME triggered when client nodes join us well? We are using
>> Spark client, so new nodes are created/destroy every time.
>> 2) It sounds to me like there is a pontential for the cluster to get
>> into a deadlock if
>>a) single PME message is lost (PME never finishes, there are no
>> retries, and all future operations are blocked on the pending PME)
>>b) one of the nodes has a  long running/stuck pending operation
>> 3) Under what circumastance can PME fail, while DiscoverySpi fails to
>> detect the node being down? We are using ZookeeperSpi so I would expect 
>> the
>> split brain resolver to shut down the node.
>> 4) Why is PME needed? Doesn't the coordinator know the altest
>> toplogy/pertition map of the cluster through regualr gossip?
>>
>> Cheers,
>> Eugene
>>
>> On Fri, Sep 7, 2018 at 5:18 PM Ilya Lantukh 
>> wrote:
>>
>>> Hi Eugene,
>>>
>>> 1) PME happens when topology is modified (TopologyVersion is
>>> incremented). The most common events that trigger it are: node
>>> start/stop/fail, cluster activation/deactivation, dynamic cache 
>>> start/stop.
>>> 2) It is done by a separate ExchangeWorker. Events that trigger PME
>>> are transferred using DiscoverySpi instead of CommunicationSpi.
>>> 3) All nodes wait for all pending cache operations to finish and
>>> then send their local partition maps to the coordinator (oldest node). 
>>> Then
>>> coordinator calculates new global partition maps and sends them to every

Re: Node keeps crashing under load

2018-09-12 Thread Ilya Kasnacheev
Hello!

How would you distinguish the wrong interface (172.17.0.1) from the right
one if you were Ignite?

I think it's not the first time I have seen this problem but I have
positively no idea how to tackle it.
Maybe Docker experts could chime in?

Regards,
-- 
Ilya Kasnacheev


ср, 12 сент. 2018 г. в 3:29, eugene miretsky :

> Thanks Ilya,
>
> We are writing to Ignite from Spark running in EMR. We don't know the
> address of the node in advance, we have tried
> 1) Set localHost in Ignite configuration to 127.0.0.1, as per the example
> online
> 2) Leave localHost unset, and let ignite figure out the host
>
> I have attached more logs at the end.
>
> My understanding is that Ignite should pick the first non-local address to
> publish, however, it seems like it picks randomly one of (a) proper
> address, (b) ipv6 address, (c) 127.0.0.1, (d)  172.17.0.1.
>
> A few questions:
> 1) How do we force Spark client to use the proper address
> 2) Where is 172.17.0.1 coming from? It is usually the default docker
> network host address, and it seems like Ignite creates a network interface
> for it on the instance. (otherwise I have no idea where the interface is
> coming from)
> 3) If there are communication errors, shouldn't the Zookeeper split brain
> resolver kick in and shut down the dead node. Or shouldn't at least the
> initiating node mark the remote node as dead?
>
> [19:36:26,189][INFO][grid-nio-worker-tcp-comm-15-#88%Server%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/172.17.0.1:47100,
> rmtAddr=/172.21.86.7:41648]
>
> [19:36:26,190][INFO][grid-nio-worker-tcp-comm-3-#76%Server%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100,
> rmtAddr=/0:0:0:0:0:0:0:1:52484]
>
> [19:36:26,191][INFO][grid-nio-worker-tcp-comm-5-#78%Server%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100,
> rmtAddr=/127.0.0.1:37656]
>
> [19:36:26,191][INFO][grid-nio-worker-tcp-comm-1-#74%Server%][TcpCommunicationSpi]
> Established outgoing communication connection [locAddr=/172.21.86.7:53272,
> rmtAddr=ip-172-21-86-175.ap-south-1.compute.internal/172.21.86.175:47100]
>
> [19:36:26,191][INFO][grid-nio-worker-tcp-comm-0-#73%Server%][TcpCommunicationSpi]
> Established outgoing communication connection [locAddr=/172.17.0.1:41648,
> rmtAddr=ip-172-17-0-1.ap-south-1.compute.internal/172.17.0.1:47100]
>
> [19:36:26,193][INFO][grid-nio-worker-tcp-comm-4-#77%Server%][TcpCommunicationSpi]
> Established outgoing communication connection [locAddr=/127.0.0.1:37656,
> rmtAddr=/127.0.0.1:47100]
>
> [19:36:26,193][INFO][grid-nio-worker-tcp-comm-2-#75%Server%][TcpCommunicationSpi]
> Established outgoing communication connection
> [locAddr=/0:0:0:0:0:0:0:1:52484, rmtAddr=/0:0:0:0:0:0:0:1%lo:47100]
>
> [19:36:26,195][INFO][grid-nio-worker-tcp-comm-8-#81%Server%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/172.17.0.1:47100,
> rmtAddr=/172.21.86.7:41656]
>
> [19:36:26,195][INFO][grid-nio-worker-tcp-comm-10-#83%Server%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100,
> rmtAddr=/0:0:0:0:0:0:0:1:52492]
>
> [19:36:26,195][INFO][grid-nio-worker-tcp-comm-12-#85%Server%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100,
> rmtAddr=/127.0.0.1:37664]
>
> [19:36:26,196][INFO][grid-nio-worker-tcp-comm-7-#80%Server%][TcpCommunicationSpi]
> Established outgoing communication connection [locAddr=/172.21.86.7:41076,
> rmtAddr=ip-172-21-86-229.ap-south-1.compute.internal/172.21.86.229:47100]
>
>
>
>
> On Mon, Sep 10, 2018 at 12:04 PM Ilya Kasnacheev <
> ilya.kasnach...@gmail.com> wrote:
>
>> Hello!
>>
>> I can see a lot of errors like this one:
>>
>> [04:05:29,268][INFO][tcp-comm-worker-#1%Server%][ZookeeperDiscoveryImpl]
>> Created new communication error process future
>> [errNode=598e3ead-99b8-4c49-b7df-04d578dcbf5f, err=class
>> org.apache.ignite.IgniteCheckedException: Failed to connect to node (is
>> node still alive?). Make sure that each ComputeTask and cache Transaction
>> has a timeout set in order to prevent parties from waiting forever in case
>> of network issues [nodeId=598e3ead-99b8-4c49-b7df-04d578dcbf5f,
>> addrs=[ip-172-17-0-1.ap-south-1.compute.internal/172.17.0.1:47100,
>> ip-172-21-85-213.ap-south-1.compute.internal/172.21.85.213:47100,
>> /0:0:0:0:0:0:0:1%lo:47100, /127.0.0.1:47100]]]
>>
>> I think the problem is, you have two nodes, they both have 172.17.0.1
>> address but it's the different address (totally unrelated private nets).
>>
>> Try to specify your external address (such as 172.21.85.213) with
>> TcpCommunicationSpi.setLocalAddress() on each node.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пт, 7 сент. 2018 г. в 20:01, eugene miretsky :
>>
>>> Hi all,
>>>
>>> Can somebody please provide some pointers on what could be the issue or
>>> how to debug it? We have a fairly large Ignite use case, 

Re: Partition map exchange in detail

2018-09-12 Thread Ilya Lantukh
Pavel K., can you please answer about Zookeeper discovery?

On Wed, Sep 12, 2018 at 5:49 PM, eugene miretsky 
wrote:

> Thanks for the patience with my questions - just trying to understand the
> system better.
>
> 3) I was referring to https://apacheignite.readme.io/docs/
> zookeeper-discovery#section-failures-and-split-brain-handling. How come
> it doesn't get the node to shut down?
> 4) Are there any docs/JIRAs that explain how counters are used, and why
> they are required in the state?
>
> Cheers,
> Eugene
>
>
> On Wed, Sep 12, 2018 at 10:04 AM Ilya Lantukh 
> wrote:
>
>> 3) Such mechanics will be implemented in IEP-25 (linked above).
>> 4) Partition map states include update counters, which are incremented on
>> every cache update and play important role in new state calculation. So,
>> technically, every cache operation can lead to partition map change, and
>> for obvious reasons we can't route them through coordinator. Ignite is a
>> more complex system than Akka or Kafka and such simple solutions won't work
>> here (in general case). However, it is true that PME could be simplified or
>> completely avoid for certain cases and the community is currently working
>> on such optimizations (https://issues.apache.org/jira/browse/IGNITE-9558
>> for example).
>>
>> On Wed, Sep 12, 2018 at 9:08 AM, eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> 2b) I had a few situations where the cluster went into a state where PME
>>> constantly failed, and could never recover. I think the root cause was that
>>> a transaction got stuck and didn't timeout/rollback.  I will try to
>>> reproduce it again and get back to you
>>> 3) If a node is down, I would expect it to get detected and the node to
>>> get removed from the cluster. In such case, PME should not even be
>>> attempted with that node. Hence you would expect PME to fail very rarely
>>> (any faulty node will be removed before it has a chance to fail PME)
>>> 4) Don't all partition map changes go through the coordinator? I believe
>>> a lot of distributed systems work in this way (all decisions are made by
>>> the coordinator/leader) - In Akka the leader is responsible for making all
>>> cluster membership changes, in Kafka the controller does the leader
>>> election.
>>>
>>> On Tue, Sep 11, 2018 at 11:11 AM Ilya Lantukh 
>>> wrote:
>>>
 1) It is.
 2a) Ignite has retry mechanics for all messages, including PME-related
 ones.
 2b) In this situation PME will hang, but it isn't a "deadlock".
 3) Sorry, I didn't understand your question. If a node is down, but
 DiscoverySpi doesn't detect it, it isn't PME-related problem.
 4) How can you ensure that partition maps on coordinator are *latest 
 *without
 "freezing" cluster state for some time?

 On Sat, Sep 8, 2018 at 3:21 AM, eugene miretsky <
 eugene.miret...@gmail.com> wrote:

> Thanks!
>
> We are using persistence, so I am not sure if shutting down nodes will
> be the desired outcome for us since we would need to modify the baseline
> topolgy.
>
> A couple more follow up questions
>
> 1) Is PME triggered when client nodes join us well? We are using Spark
> client, so new nodes are created/destroy every time.
> 2) It sounds to me like there is a pontential for the cluster to get
> into a deadlock if
>a) single PME message is lost (PME never finishes, there are no
> retries, and all future operations are blocked on the pending PME)
>b) one of the nodes has a  long running/stuck pending operation
> 3) Under what circumastance can PME fail, while DiscoverySpi fails to
> detect the node being down? We are using ZookeeperSpi so I would expect 
> the
> split brain resolver to shut down the node.
> 4) Why is PME needed? Doesn't the coordinator know the altest
> toplogy/pertition map of the cluster through regualr gossip?
>
> Cheers,
> Eugene
>
> On Fri, Sep 7, 2018 at 5:18 PM Ilya Lantukh 
> wrote:
>
>> Hi Eugene,
>>
>> 1) PME happens when topology is modified (TopologyVersion is
>> incremented). The most common events that trigger it are: node
>> start/stop/fail, cluster activation/deactivation, dynamic cache 
>> start/stop.
>> 2) It is done by a separate ExchangeWorker. Events that trigger PME
>> are transferred using DiscoverySpi instead of CommunicationSpi.
>> 3) All nodes wait for all pending cache operations to finish and then
>> send their local partition maps to the coordinator (oldest node). Then
>> coordinator calculates new global partition maps and sends them to every
>> node.
>> 4) All cache operations.
>> 5) Exchange is never retried. Ignite community is currently working
>> on PME failure handling that should kick all problematic nodes after
>> timeout is reached (see https://cwiki.apache.org/
>> confluence/display/IGNITE/IEP-25%3A+Partition+Map+Exchange+

Re: Partition map exchange in detail

2018-09-12 Thread eugene miretsky
Thanks for the patience with my questions - just trying to understand the
system better.

3) I was referring to
https://apacheignite.readme.io/docs/zookeeper-discovery#section-failures-and-split-brain-handling.
How come it doesn't get the node to shut down?
4) Are there any docs/JIRAs that explain how counters are used, and why
they are required in the state?

Cheers,
Eugene


On Wed, Sep 12, 2018 at 10:04 AM Ilya Lantukh  wrote:

> 3) Such mechanics will be implemented in IEP-25 (linked above).
> 4) Partition map states include update counters, which are incremented on
> every cache update and play important role in new state calculation. So,
> technically, every cache operation can lead to partition map change, and
> for obvious reasons we can't route them through coordinator. Ignite is a
> more complex system than Akka or Kafka and such simple solutions won't work
> here (in general case). However, it is true that PME could be simplified or
> completely avoid for certain cases and the community is currently working
> on such optimizations (https://issues.apache.org/jira/browse/IGNITE-9558
> for example).
>
> On Wed, Sep 12, 2018 at 9:08 AM, eugene miretsky <
> eugene.miret...@gmail.com> wrote:
>
>> 2b) I had a few situations where the cluster went into a state where PME
>> constantly failed, and could never recover. I think the root cause was that
>> a transaction got stuck and didn't timeout/rollback.  I will try to
>> reproduce it again and get back to you
>> 3) If a node is down, I would expect it to get detected and the node to
>> get removed from the cluster. In such case, PME should not even be
>> attempted with that node. Hence you would expect PME to fail very rarely
>> (any faulty node will be removed before it has a chance to fail PME)
>> 4) Don't all partition map changes go through the coordinator? I believe
>> a lot of distributed systems work in this way (all decisions are made by
>> the coordinator/leader) - In Akka the leader is responsible for making all
>> cluster membership changes, in Kafka the controller does the leader
>> election.
>>
>> On Tue, Sep 11, 2018 at 11:11 AM Ilya Lantukh 
>> wrote:
>>
>>> 1) It is.
>>> 2a) Ignite has retry mechanics for all messages, including PME-related
>>> ones.
>>> 2b) In this situation PME will hang, but it isn't a "deadlock".
>>> 3) Sorry, I didn't understand your question. If a node is down, but
>>> DiscoverySpi doesn't detect it, it isn't PME-related problem.
>>> 4) How can you ensure that partition maps on coordinator are *latest 
>>> *without
>>> "freezing" cluster state for some time?
>>>
>>> On Sat, Sep 8, 2018 at 3:21 AM, eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
 Thanks!

 We are using persistence, so I am not sure if shutting down nodes will
 be the desired outcome for us since we would need to modify the baseline
 topolgy.

 A couple more follow up questions

 1) Is PME triggered when client nodes join us well? We are using Spark
 client, so new nodes are created/destroy every time.
 2) It sounds to me like there is a pontential for the cluster to get
 into a deadlock if
a) single PME message is lost (PME never finishes, there are no
 retries, and all future operations are blocked on the pending PME)
b) one of the nodes has a  long running/stuck pending operation
 3) Under what circumastance can PME fail, while DiscoverySpi fails to
 detect the node being down? We are using ZookeeperSpi so I would expect the
 split brain resolver to shut down the node.
 4) Why is PME needed? Doesn't the coordinator know the altest
 toplogy/pertition map of the cluster through regualr gossip?

 Cheers,
 Eugene

 On Fri, Sep 7, 2018 at 5:18 PM Ilya Lantukh 
 wrote:

> Hi Eugene,
>
> 1) PME happens when topology is modified (TopologyVersion is
> incremented). The most common events that trigger it are: node
> start/stop/fail, cluster activation/deactivation, dynamic cache 
> start/stop.
> 2) It is done by a separate ExchangeWorker. Events that trigger PME
> are transferred using DiscoverySpi instead of CommunicationSpi.
> 3) All nodes wait for all pending cache operations to finish and then
> send their local partition maps to the coordinator (oldest node). Then
> coordinator calculates new global partition maps and sends them to every
> node.
> 4) All cache operations.
> 5) Exchange is never retried. Ignite community is currently working on
> PME failure handling that should kick all problematic nodes after timeout
> is reached (see
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-25%3A+Partition+Map+Exchange+hangs+resolving
> for details), but it isn't done yet.
> 6) You shouldn't consider PME failure as a error by itself, but rather
> as a result of some other error. The most common reason of PME hang-up is
> pending cache 

Does ignite supported calculated columns ?

2018-09-12 Thread wengyao04
Hi, in our cache we have a column which is calculated by other columns,
for example column-C is calculated by column-A / column-B

When there is an update from column-A or column B, can ignite handle the
calculation of column-C ? Thanks
 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Partition map exchange in detail

2018-09-12 Thread Ilya Lantukh
3) Such mechanics will be implemented in IEP-25 (linked above).
4) Partition map states include update counters, which are incremented on
every cache update and play important role in new state calculation. So,
technically, every cache operation can lead to partition map change, and
for obvious reasons we can't route them through coordinator. Ignite is a
more complex system than Akka or Kafka and such simple solutions won't work
here (in general case). However, it is true that PME could be simplified or
completely avoid for certain cases and the community is currently working
on such optimizations (https://issues.apache.org/jira/browse/IGNITE-9558
for example).

On Wed, Sep 12, 2018 at 9:08 AM, eugene miretsky 
wrote:

> 2b) I had a few situations where the cluster went into a state where PME
> constantly failed, and could never recover. I think the root cause was that
> a transaction got stuck and didn't timeout/rollback.  I will try to
> reproduce it again and get back to you
> 3) If a node is down, I would expect it to get detected and the node to
> get removed from the cluster. In such case, PME should not even be
> attempted with that node. Hence you would expect PME to fail very rarely
> (any faulty node will be removed before it has a chance to fail PME)
> 4) Don't all partition map changes go through the coordinator? I believe a
> lot of distributed systems work in this way (all decisions are made by the
> coordinator/leader) - In Akka the leader is responsible for making all
> cluster membership changes, in Kafka the controller does the leader
> election.
>
> On Tue, Sep 11, 2018 at 11:11 AM Ilya Lantukh 
> wrote:
>
>> 1) It is.
>> 2a) Ignite has retry mechanics for all messages, including PME-related
>> ones.
>> 2b) In this situation PME will hang, but it isn't a "deadlock".
>> 3) Sorry, I didn't understand your question. If a node is down, but
>> DiscoverySpi doesn't detect it, it isn't PME-related problem.
>> 4) How can you ensure that partition maps on coordinator are *latest *without
>> "freezing" cluster state for some time?
>>
>> On Sat, Sep 8, 2018 at 3:21 AM, eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> Thanks!
>>>
>>> We are using persistence, so I am not sure if shutting down nodes will
>>> be the desired outcome for us since we would need to modify the baseline
>>> topolgy.
>>>
>>> A couple more follow up questions
>>>
>>> 1) Is PME triggered when client nodes join us well? We are using Spark
>>> client, so new nodes are created/destroy every time.
>>> 2) It sounds to me like there is a pontential for the cluster to get
>>> into a deadlock if
>>>a) single PME message is lost (PME never finishes, there are no
>>> retries, and all future operations are blocked on the pending PME)
>>>b) one of the nodes has a  long running/stuck pending operation
>>> 3) Under what circumastance can PME fail, while DiscoverySpi fails to
>>> detect the node being down? We are using ZookeeperSpi so I would expect the
>>> split brain resolver to shut down the node.
>>> 4) Why is PME needed? Doesn't the coordinator know the altest
>>> toplogy/pertition map of the cluster through regualr gossip?
>>>
>>> Cheers,
>>> Eugene
>>>
>>> On Fri, Sep 7, 2018 at 5:18 PM Ilya Lantukh 
>>> wrote:
>>>
 Hi Eugene,

 1) PME happens when topology is modified (TopologyVersion is
 incremented). The most common events that trigger it are: node
 start/stop/fail, cluster activation/deactivation, dynamic cache start/stop.
 2) It is done by a separate ExchangeWorker. Events that trigger PME are
 transferred using DiscoverySpi instead of CommunicationSpi.
 3) All nodes wait for all pending cache operations to finish and then
 send their local partition maps to the coordinator (oldest node). Then
 coordinator calculates new global partition maps and sends them to every
 node.
 4) All cache operations.
 5) Exchange is never retried. Ignite community is currently working on
 PME failure handling that should kick all problematic nodes after timeout
 is reached (see https://cwiki.apache.org/confluence/display/IGNITE/IEP-
 25%3A+Partition+Map+Exchange+hangs+resolving for details), but it
 isn't done yet.
 6) You shouldn't consider PME failure as a error by itself, but rather
 as a result of some other error. The most common reason of PME hang-up is
 pending cache operation that couldn't finish. Check your logs - it should
 list pending transactions and atomic updates. Search for "Found long
 running" substring.

 Hope this helps.

 On Fri, Sep 7, 2018 at 11:45 PM, eugene miretsky <
 eugene.miret...@gmail.com> wrote:

> Hello,
>
> Out cluster occasionally fails with "partition map exchange failure"
> errors, I have searched around and it seems that a lot of people have had 
> a
> similar issue in the past. My high-level understanding is that when one of
> the nodes fails (out of 

Re: ignite multi thread stop bug

2018-09-12 Thread ezhuravlev
Hi, 

it's a really bad idea to do cache operations in Discovery thread. If you
want to add something to the cache, you can start a new thread and do this
operation inside the new thread.

Evgenii



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Data migration using persistent Queues

2018-09-12 Thread ezhuravlev
Sorry, here is the link for the BinaryObject doc:
https://apacheignite.readme.io/docs/binary-marshaller



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Data migration using persistent Queues

2018-09-12 Thread ezhuravlev
Hi,

Do you mean Ignite Queue? Or just your data structure that stored in simple
Ignite cache?

> Store data in another format let's say JSON and provide backwards
> compatibility at code level.
There is no need for this, Ignite internally stores everything as
BinaryObject and this format supports schema changes: Store data in another
format let's say JSON and provide backwards 
compatibility at code level.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: cache doesn't exist

2018-09-12 Thread Igor Sapego
Hi,

Did your thick .net client connected to the cluster?
It should be visible in server node's logs.

Best Regards,
Igor


On Wed, Sep 12, 2018 at 1:58 PM Som Som <2av10...@gmail.com> wrote:

> the cache names list is empty then i call the method of non thin client
> butmy TestCache is stil exists and i can see it under the thin client. i
> cant see the cache from NON thin client if it is created from thin client.
>
> ср, 12 сент. 2018 г., 13:29 Evgenii Zhuravlev :
>
>> Hi,
>>
>> You can always get the list of the all caches in cluster using
>> Ignite.cacheNames() method.
>>
>> Evgenii
>>
>> ср, 12 сент. 2018 г. в 13:26, Som Som <2av10...@gmail.com>:
>>
>>> Hello
>>>
>>>
>>>
>>> I created new cache using thin .net client:
>>>
>>>
>>>
>>> public class TestEntity
>>>
>>> {
>>>
>>> public string ValueString { get; set; }
>>>
>>>
>>>
>>> public DateTime ValueDateTime { get; set; }
>>>
>>> }
>>>
>>>
>>>
>>> class Program
>>>
>>> {
>>>
>>> static void Main(string[] args)
>>>
>>> {
>>>
>>> var ignite = Ignition.StartClient(new
>>> IgniteClientConfiguration
>>>
>>> {
>>>
>>> Host = "127.0.0.1"
>>>
>>> ,
>>>
>>> BinaryConfiguration = newApache.Ignite.Core.Binary.
>>> BinaryConfiguration { Serializer = new Apache.Ignite.Core.Binary.
>>> BinaryReflectiveSerializer { ForceTimestamp = true } }
>>>
>>> });
>>>
>>>
>>>
>>> var queryEntity = new QueryEntity();
>>>
>>> queryEntity.KeyTypeName =typeof(int).FullName;
>>>
>>> queryEntity.KeyType = typeof(int);
>>>
>>>
>>>
>>> queryEntity.ValueTypeName =typeof(TestEntity).FullName;
>>>
>>> queryEntity.ValueType = typeof(TestEntity);
>>>
>>>
>>>
>>> queryEntity.Fields = new QueryField[]
>>>
>>> { new QueryField("ValueString", typeof(string))
>>>
>>> , new QueryField("ValueDateTime",typeof(DateTime))
>>>
>>> };
>>>
>>>
>>>
>>> var cache = ignite.GetOrCreateCache(
>>>
>>>newCacheClientConfiguration(
>>> "TestEntity", queryEntity) { SqlSchema = "PUBLIC" });
>>>
>>>
>>>
>>>
>>>
>>> cache.Put(1, new TestEntity { ValueString ="test",
>>> ValueDateTime = DateTime.UtcNow });
>>>
>>>
>>>
>>> ignite.Dispose();
>>>
>>> }
>>>
>>> }
>>>
>>>
>>>
>>> Then i tried  to get this cache using typical .net client but instead of
>>> cache I got an error “Additional information: Cache doesn't exist:
>>> TestEntity”:
>>>
>>>
>>>
>>> class Program
>>>
>>> {
>>>
>>> static void Main(string[] args)
>>>
>>> {
>>>
>>> var ignite = Ignition.Start(newIgniteConfiguration
>>>
>>> {
>>>
>>> DiscoverySpi = new TcpDiscoverySpi
>>>
>>> {
>>>
>>> IpFinder = new TcpDiscoveryStaticIpFinder
>>>
>>> {
>>>
>>> Endpoints = new[] { "127.0.0.1" }
>>>
>>> }
>>>
>>> }
>>>
>>> });
>>>
>>>
>>>
>>>
>>>
>>> var cache = ignite.GetCache("TestEntity");
>>>
>>>
>>>
>>> ignite.Dispose();
>>>
>>> }
>>>
>>> }
>>>
>>>
>>>
>>> How can I get a cache correctly?
>>>
>>


Re: cache doesn't exist

2018-09-12 Thread Som Som
the cache names list is empty then i call the method of non thin client
butmy TestCache is stil exists and i can see it under the thin client. i
cant see the cache from NON thin client if it is created from thin client.

ср, 12 сент. 2018 г., 13:29 Evgenii Zhuravlev :

> Hi,
>
> You can always get the list of the all caches in cluster using
> Ignite.cacheNames() method.
>
> Evgenii
>
> ср, 12 сент. 2018 г. в 13:26, Som Som <2av10...@gmail.com>:
>
>> Hello
>>
>>
>>
>> I created new cache using thin .net client:
>>
>>
>>
>> public class TestEntity
>>
>> {
>>
>> public string ValueString { get; set; }
>>
>>
>>
>> public DateTime ValueDateTime { get; set; }
>>
>> }
>>
>>
>>
>> class Program
>>
>> {
>>
>> static void Main(string[] args)
>>
>> {
>>
>> var ignite = Ignition.StartClient(new
>> IgniteClientConfiguration
>>
>> {
>>
>> Host = "127.0.0.1"
>>
>> ,
>>
>> BinaryConfiguration = newApache.Ignite.Core.Binary.
>> BinaryConfiguration { Serializer = new Apache.Ignite.Core.Binary.
>> BinaryReflectiveSerializer { ForceTimestamp = true } }
>>
>> });
>>
>>
>>
>> var queryEntity = new QueryEntity();
>>
>> queryEntity.KeyTypeName =typeof(int).FullName;
>>
>> queryEntity.KeyType = typeof(int);
>>
>>
>>
>> queryEntity.ValueTypeName =typeof(TestEntity).FullName;
>>
>> queryEntity.ValueType = typeof(TestEntity);
>>
>>
>>
>> queryEntity.Fields = new QueryField[]
>>
>> { new QueryField("ValueString", typeof(string))
>>
>> , new QueryField("ValueDateTime",typeof(DateTime))
>>
>> };
>>
>>
>>
>> var cache = ignite.GetOrCreateCache(
>>
>>newCacheClientConfiguration(
>> "TestEntity", queryEntity) { SqlSchema = "PUBLIC" });
>>
>>
>>
>>
>>
>> cache.Put(1, new TestEntity { ValueString ="test",
>> ValueDateTime = DateTime.UtcNow });
>>
>>
>>
>> ignite.Dispose();
>>
>> }
>>
>> }
>>
>>
>>
>> Then i tried  to get this cache using typical .net client but instead of
>> cache I got an error “Additional information: Cache doesn't exist:
>> TestEntity”:
>>
>>
>>
>> class Program
>>
>> {
>>
>> static void Main(string[] args)
>>
>> {
>>
>> var ignite = Ignition.Start(newIgniteConfiguration
>>
>> {
>>
>> DiscoverySpi = new TcpDiscoverySpi
>>
>> {
>>
>> IpFinder = new TcpDiscoveryStaticIpFinder
>>
>> {
>>
>> Endpoints = new[] { "127.0.0.1" }
>>
>> }
>>
>> }
>>
>> });
>>
>>
>>
>>
>>
>> var cache = ignite.GetCache("TestEntity");
>>
>>
>>
>> ignite.Dispose();
>>
>> }
>>
>> }
>>
>>
>>
>> How can I get a cache correctly?
>>
>


Re: POJO field having wrapper type, mapped to cassandra table are getting initialized to respective default value of primitive type instead of null if column value is null.

2018-09-12 Thread Dmitriy Pavlov
Hi Igniters,

I can see that ticket is still in patch available state.

Denis M.

could you please review the patch?

Sincerely,
Dmitriy Pavlov

вт, 26 сент. 2017 г. в 12:10, Denis Mekhanikov :

> There is a page in confluence with description of the process:
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>
> To get permission to assign tickets to yourself you should write a letter
> to dev list and ask for it by telling your JIRA username.
>
> While waiting for JIRA permissions you can start configuring your work
> environment as described in the article and working on the fix.
>
> Denis
>
> вт, 26 сент. 2017 г. в 12:00, kotamrajuyashasvi <
> kotamrajuyasha...@gmail.com>:
>
>> Hi Denis
>>
>> I want to work on this issue. But I'm a newbie and I do not know the
>> actual
>> process like, How to assign the ticket to myself(JIRA username:
>> kotamrajuyashasvi) and who would review the code , merging the code etc.
>> Do
>> you know any links which explain the process?
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>


Re: cache doesn't exist

2018-09-12 Thread Evgenii Zhuravlev
Hi,

You can always get the list of the all caches in cluster using
Ignite.cacheNames() method.

Evgenii

ср, 12 сент. 2018 г. в 13:26, Som Som <2av10...@gmail.com>:

> Hello
>
>
>
> I created new cache using thin .net client:
>
>
>
> public class TestEntity
>
> {
>
> public string ValueString { get; set; }
>
>
>
> public DateTime ValueDateTime { get; set; }
>
> }
>
>
>
> class Program
>
> {
>
> static void Main(string[] args)
>
> {
>
> var ignite = Ignition.StartClient(newIgniteClientConfiguration
>
> {
>
> Host = "127.0.0.1"
>
> ,
>
> BinaryConfiguration = newApache.Ignite.Core.Binary.
> BinaryConfiguration { Serializer = new Apache.Ignite.Core.Binary.
> BinaryReflectiveSerializer { ForceTimestamp = true } }
>
> });
>
>
>
> var queryEntity = new QueryEntity();
>
> queryEntity.KeyTypeName =typeof(int).FullName;
>
> queryEntity.KeyType = typeof(int);
>
>
>
> queryEntity.ValueTypeName =typeof(TestEntity).FullName;
>
> queryEntity.ValueType = typeof(TestEntity);
>
>
>
> queryEntity.Fields = new QueryField[]
>
> { new QueryField("ValueString", typeof(string))
>
> , new QueryField("ValueDateTime",typeof(DateTime))
>
> };
>
>
>
> var cache = ignite.GetOrCreateCache(
>
>newCacheClientConfiguration(
> "TestEntity", queryEntity) { SqlSchema = "PUBLIC" });
>
>
>
>
>
> cache.Put(1, new TestEntity { ValueString ="test",
> ValueDateTime = DateTime.UtcNow });
>
>
>
> ignite.Dispose();
>
> }
>
> }
>
>
>
> Then i tried  to get this cache using typical .net client but instead of
> cache I got an error “Additional information: Cache doesn't exist:
> TestEntity”:
>
>
>
> class Program
>
> {
>
> static void Main(string[] args)
>
> {
>
> var ignite = Ignition.Start(newIgniteConfiguration
>
> {
>
> DiscoverySpi = new TcpDiscoverySpi
>
> {
>
> IpFinder = new TcpDiscoveryStaticIpFinder
>
> {
>
> Endpoints = new[] { "127.0.0.1" }
>
> }
>
> }
>
> });
>
>
>
>
>
> var cache = ignite.GetCache("TestEntity");
>
>
>
> ignite.Dispose();
>
> }
>
> }
>
>
>
> How can I get a cache correctly?
>


cache doesn't exist

2018-09-12 Thread Som Som
Hello



I created new cache using thin .net client:



public class TestEntity

{

public string ValueString { get; set; }



public DateTime ValueDateTime { get; set; }

}



class Program

{

static void Main(string[] args)

{

var ignite = Ignition.StartClient(newIgniteClientConfiguration

{

Host = "127.0.0.1"

,

BinaryConfiguration = newApache.Ignite.Core.Binary.
BinaryConfiguration { Serializer = new Apache.Ignite.Core.Binary.
BinaryReflectiveSerializer { ForceTimestamp = true } }

});



var queryEntity = new QueryEntity();

queryEntity.KeyTypeName =typeof(int).FullName;

queryEntity.KeyType = typeof(int);



queryEntity.ValueTypeName =typeof(TestEntity).FullName;

queryEntity.ValueType = typeof(TestEntity);



queryEntity.Fields = new QueryField[]

{ new QueryField("ValueString", typeof(string))

, new QueryField("ValueDateTime",typeof(DateTime))

};



var cache = ignite.GetOrCreateCache(

   newCacheClientConfiguration(
"TestEntity", queryEntity) { SqlSchema = "PUBLIC" });





cache.Put(1, new TestEntity { ValueString ="test",
ValueDateTime = DateTime.UtcNow });



ignite.Dispose();

}

}



Then i tried  to get this cache using typical .net client but instead of
cache I got an error “Additional information: Cache doesn't exist:
TestEntity”:



class Program

{

static void Main(string[] args)

{

var ignite = Ignition.Start(newIgniteConfiguration

{

DiscoverySpi = new TcpDiscoverySpi

{

IpFinder = new TcpDiscoveryStaticIpFinder

{

Endpoints = new[] { "127.0.0.1" }

}

}

});





var cache = ignite.GetCache("TestEntity");



ignite.Dispose();

}

}



How can I get a cache correctly?


Re: Looking for information on these methods

2018-09-12 Thread wt
hows it going with this Luqman 

I am still exploring an approach to single signon using kerberos without
needing config files. I think i have made progress but i need to register a
service principal name first.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: IgniteInterruptedException: Got interrupted while waiting for future to complete

2018-09-12 Thread Maxim.Pudov
Hello.
Do you really need to add all this data in a single transaction? That's not
a typical use case. Do you have any other operations running in parallel?
Unfortunately, having this stack trace is not enough to help. Could you
provide a full ignite log and your configuration as well?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: The system cache size was slowly increased

2018-09-12 Thread Justin Ji
Evgenii - 

Thank for your reply!

"system memory cache" means the result printed by 'free -m' command.

After some test, I solved the problem, it caused by my Linux setting, not
Ignite.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Partition map exchange in detail

2018-09-12 Thread eugene miretsky
2b) I had a few situations where the cluster went into a state where PME
constantly failed, and could never recover. I think the root cause was that
a transaction got stuck and didn't timeout/rollback.  I will try to
reproduce it again and get back to you
3) If a node is down, I would expect it to get detected and the node to get
removed from the cluster. In such case, PME should not even be attempted
with that node. Hence you would expect PME to fail very rarely (any faulty
node will be removed before it has a chance to fail PME)
4) Don't all partition map changes go through the coordinator? I believe a
lot of distributed systems work in this way (all decisions are made by the
coordinator/leader) - In Akka the leader is responsible for making all
cluster membership changes, in Kafka the controller does the leader
election.

On Tue, Sep 11, 2018 at 11:11 AM Ilya Lantukh  wrote:

> 1) It is.
> 2a) Ignite has retry mechanics for all messages, including PME-related
> ones.
> 2b) In this situation PME will hang, but it isn't a "deadlock".
> 3) Sorry, I didn't understand your question. If a node is down, but
> DiscoverySpi doesn't detect it, it isn't PME-related problem.
> 4) How can you ensure that partition maps on coordinator are *latest *without
> "freezing" cluster state for some time?
>
> On Sat, Sep 8, 2018 at 3:21 AM, eugene miretsky  > wrote:
>
>> Thanks!
>>
>> We are using persistence, so I am not sure if shutting down nodes will be
>> the desired outcome for us since we would need to modify the baseline
>> topolgy.
>>
>> A couple more follow up questions
>>
>> 1) Is PME triggered when client nodes join us well? We are using Spark
>> client, so new nodes are created/destroy every time.
>> 2) It sounds to me like there is a pontential for the cluster to get into
>> a deadlock if
>>a) single PME message is lost (PME never finishes, there are no
>> retries, and all future operations are blocked on the pending PME)
>>b) one of the nodes has a  long running/stuck pending operation
>> 3) Under what circumastance can PME fail, while DiscoverySpi fails to
>> detect the node being down? We are using ZookeeperSpi so I would expect the
>> split brain resolver to shut down the node.
>> 4) Why is PME needed? Doesn't the coordinator know the altest
>> toplogy/pertition map of the cluster through regualr gossip?
>>
>> Cheers,
>> Eugene
>>
>> On Fri, Sep 7, 2018 at 5:18 PM Ilya Lantukh 
>> wrote:
>>
>>> Hi Eugene,
>>>
>>> 1) PME happens when topology is modified (TopologyVersion is
>>> incremented). The most common events that trigger it are: node
>>> start/stop/fail, cluster activation/deactivation, dynamic cache start/stop.
>>> 2) It is done by a separate ExchangeWorker. Events that trigger PME are
>>> transferred using DiscoverySpi instead of CommunicationSpi.
>>> 3) All nodes wait for all pending cache operations to finish and then
>>> send their local partition maps to the coordinator (oldest node). Then
>>> coordinator calculates new global partition maps and sends them to every
>>> node.
>>> 4) All cache operations.
>>> 5) Exchange is never retried. Ignite community is currently working on
>>> PME failure handling that should kick all problematic nodes after timeout
>>> is reached (see
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-25%3A+Partition+Map+Exchange+hangs+resolving
>>> for details), but it isn't done yet.
>>> 6) You shouldn't consider PME failure as a error by itself, but rather
>>> as a result of some other error. The most common reason of PME hang-up is
>>> pending cache operation that couldn't finish. Check your logs - it should
>>> list pending transactions and atomic updates. Search for "Found long
>>> running" substring.
>>>
>>> Hope this helps.
>>>
>>> On Fri, Sep 7, 2018 at 11:45 PM, eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
 Hello,

 Out cluster occasionally fails with "partition map exchange failure"
 errors, I have searched around and it seems that a lot of people have had a
 similar issue in the past. My high-level understanding is that when one of
 the nodes fails (out of memory, exception, GC etc.) nodes fail to exchange
 partition maps. However, I have a few questions
 1) When does partition map exchange happen? Periodically, when a node
 joins, etc.
 2) Is it done in the same thread as communication SPI, or is a separate
 worker?
 3) How does the exchange happen? Via a coordinator, peer to peer, etc?
 4) What does the exchange block?
 5) When is the exchange retried?
 5) How to resolve the error? The only thing I have seen online is to
 decrease failureDetectionTimeout

 Our settings are
 - Zookeeper SPI
 - Persistence enabled

 Cheers,
 Eugene

>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Ilya
>>>
>>
>
>
> --
> Best regards,
> Ilya
>