Events and selective marshalling between nodes?

2019-10-10 Thread KJQ
Hello everyone,

I am not even sure how to phrase this so I will try to describe the
environment...

We have numerous embedded Ignite instances (server mode) that deploy into a
K8S cluster.  Between those instances we have Ignite events being raised for
PUT and EXPIRED.  We have at least one instance, also a part of the Ignite
cluster, which uses Drools and distributed compute (this works good so far).

The problem we are having is when the distributed compute node raises the
cache events to other nodes, they do not have those classes there.  For
example, Drools caches some stuff specific to it and when that event gets
raised to other nodes (who have no knowledge of Drools) an exception is of
course thrown.

So my questions are:

- Can I selectively choose what to raise the event for based on the type? 
For example, check the event "value" and ignore if something I am not
concerned with and not raise it.

- Is the only way to share classes between nodes, whose Java applications
may not have knowledge of each other, through peer-classloading or
downloading a jar?  For example, projectB depends on projectA so it is aware
of A's classes but projectA has no knowledge of projectB in the event B
raises an event from its cached object to A.

- How does all of this play into Spring?  If I do somehow get the classes
into each node, do they have to be within the SpringContext (IoC configured)
or is having the classes available good enough?  This applies to using the
@SpringResource in something like a callable.

Any help or guidance is greatly appreciated.



-
KJQ
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


RE: REST API: passing custom POJOs

2019-10-10 Thread alexanderkor12
Hi
REST can accommodate the following types:
https://apacheignite.readme.io/docs/rest-api#section-data-types

You can use  a ConnectorMessageIntercetor to intercept all incoming/outgoing
rest requests :
https://apacheignite.readme.io/docs/rest-api#section-general-configuration 
(https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configu
ration/ConnectorConfiguration.html#setMessageInterceptor-org.apache.ignite.c
onfiguration.ConnectorMessageInterceptor-)

You can serialize your pojo before sending, and on the receiver end use
ConnectorMessageInterceptor#onReceive to de-serialize accordingly .
The interceptor also provides a way ConnectorMessageInterceptor#onSend-to
transform outgoing rest requests.

Thanks, Alex

-Original Message-
From: captcha  
Sent: Thursday, October 10, 2019 4:51 PM
To: user@ignite.apache.org
Subject: REST API: passing custom POJOs

Hi Ignite Users,

For the REST API, is that possible to pass custom POJO objects? An example
use case:
Let's say I have a Person class:
class Person {
   String name;
   UUID id;
   Location location;
}
Class Location {
   long lat;
   long long;
}

Let's say I cache some Person objects in the Ignite cluster, keyed on their
UUIDs. Now I want to pass in a Location object and a List of Person UUIDs
and then return the distance between the given location and each person in
the give list.

This requires passing POJOs to and back from Ignite compute gird. Is it
possible to use the REST API to do this? If not, what's the recommended way
for this? Are there any examples I could check?

Thanks.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/



Re: Issue with adding nested index dynamically

2019-10-10 Thread Hemambara
Hello Ivan, to be more specific, if I want add field 

"Users.userName" - current ignite version 2.7.6 is setting the field name as
"Users.userName" and default alias name as "userName" with which makes it
non-queryable, non -indexable and issues during restart due to mismatch in
configuration . But with this fix for fieldname "Users.userName" it will set
alias as "Users.userName" and the same can be used to add index and
everywhere and there wont be any issues



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: DataLoad using IgniteDataStreamer

2019-10-10 Thread ravichandra
I Will look into it and get back.

Thanks Saikat.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Issue with adding nested index dynamically

2019-10-10 Thread Hemambara
Thank you. One more thing to add - it is able to add index dynamically only
thing is, when I restart the client it is not able join the cluster due to
incorrect alias



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: DataLoad using IgniteDataStreamer

2019-10-10 Thread Saikat Maitra
Hello Ravichandra,

Here are the docs for Data streamers

https://apacheignite.readme.io/docs/streaming--cep

Also, a sample implementation for data streaming using Apache Flink.

https://dzone.com/articles/data-streaming-using-apache-flink-and-apache-ignit

Please review and let us know if you have any questions.

Regards,
Saikat



On Thu, Oct 10, 2019 at 4:10 PM ravichandra <
ravichandra.polemre...@gmail.com> wrote:

> Hi,
>
> I am working on a POC to preload the data into database with
> IgniteDataStreamer using client node
> and do rest of the functionalities using server node.
>
> Is there any sample code for reference?
>
> I am using spring boot, extending IgniteRepository for repository layer
> Instead of using config.xml, can I configure the required details in a
> configuration class like below
>
> @Configuration
> public class IgniteConfiguration
> {
>  public Ignite igniteInstance()
>  {
>IgniteConfiguration and CacheConfiguration details goes here
>
> return Ignition.getOrStart(igniteConfiguration)
>  }
> }
> I am trying to understand the best way is to configure xml or with out xml.
> Any sample code would really help me in proceeding further.
> Appreciate for the help
>
> Thanks,
> Ravichandra
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Exception - Method rawReader can be called only once

2019-10-10 Thread javastuff....@gmail.com
The interface does not have 2 sets of methods, so second method with "raw"
reader is extra code with special programmatic changes in extending class to
call "raw" method from superclass. This is how I am using it for now,
however it needs extra code and special handling in extended class which
seems like workaround solution to traverse class chain and does not like
conceptually. 

I have raised ticket to remove this limitation IGNITE-12280. Hope this gets
through and gets fixed soon.

Thanks,
-Sam



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


DataLoad using IgniteDataStreamer

2019-10-10 Thread ravichandra
Hi,

I am working on a POC to preload the data into database with
IgniteDataStreamer using client node
and do rest of the functionalities using server node.

Is there any sample code for reference?

I am using spring boot, extending IgniteRepository for repository layer
Instead of using config.xml, can I configure the required details in a
configuration class like below

@Configuration
public class IgniteConfiguration
{
 public Ignite igniteInstance()
 {
   IgniteConfiguration and CacheConfiguration details goes here

return Ignition.getOrStart(igniteConfiguration)
 }
}
I am trying to understand the best way is to configure xml or with out xml.
Any sample code would really help me in proceeding further.
Appreciate for the help

Thanks,
Ravichandra



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


REST API: passing custom POJOs

2019-10-10 Thread captcha
Hi Ignite Users,

For the REST API, is that possible to pass custom POJO objects? An example
use case:
Let's say I have a Person class:
class Person {
   String name;
   UUID id;
   Location location;
}
Class Location {
   long lat;
   long long;
}

Let's say I cache some Person objects in the Ignite cluster, keyed on their
UUIDs. Now I want to pass in a Location object and a List of Person UUIDs
and then return the distance between the given location and each person in
the give list.

This requires passing POJOs to and back from Ignite compute gird. Is it
possible to use the REST API to do this? If not, what's the recommended way
for this? Are there any examples I could check?

Thanks.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


REST API: passing custom POJOs

2019-10-10 Thread captcha
Hi Ignite Users,

For the REST API, is that possible to pass custom POJO objects? An example
use case:
Let's say I have a Person class:
class Person {
   String name;
   UUID id;
   Location location;
}
Class Location {
   long lat;
   long long;
}

Let's say I cache some Person objects in the Ignite cluster, keyed on their
UUIDs. Now I want to pass in a Location object and a List of Person UUIDs
and then return the distance between the given location and each person in
the give list.

This requires passing POJOs to and back from Ignite compute gird. Is it
possible to use the REST API to do this? If not, what's the recommended way
for this? Are there any examples I could check?

Thanks.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Partition loss due to node segmentation

2019-10-10 Thread Denis Magda
Akash,

Network segmentation is one of the causes of the partition loss. The latter
can also be triggered if two or more nodes go down at the same time.

This page explains how to process the partition loss events and available
remediation techniques:
https://apacheignite.readme.io/docs/partition-loss-policies

-
Denis


On Wed, Oct 9, 2019 at 2:42 AM Akash Shinde  wrote:

> Looks like I have not asked the correct question. Let me correct my
> question.
>
> If the cluster is segmented and 2 or more nodes out of total 5 nodes are
> thrown out of cluster then the data is lost (back up is set to 1).
>
> Now my question is  does ignite fire the event
> "EVT_CACHE_REBALANCE_PART_DATA_LOST" in this case or partition lost due to
> segmentation?
> I am trying to implement recovery/fallback mechanism in case of
> partition lost due to any reason. Using this event I am reloading the lost
> partitions.
>
> Thanks,
> A.
>
>
> On Wed, Oct 9, 2019 at 1:07 PM Gaurav Bajaj 
> wrote:
>
>> Hello Akash,
>>
>> Yes of course. As cluster is segmented, it is very well possible that
>> cluster segment doesn't have all partitions in that segment.
>> If you have backup partitions in the same segment, then only you have
>> complete partitions.
>>
>> Best Regards,
>> Gaurav
>>
>> On Wed, Oct 9, 2019 at 7:20 AM Akash Shinde 
>> wrote:
>>
>>> Hi,
>>>  Are there any possibilities of partition loss in case of node
>>> segmentation?
>>>
>>> Thanks,
>>> Akash
>>>
>>


Re: Ignite SQL table ALTER COLUMN and RENAME COLUMN

2019-10-10 Thread Denis Magda
Please check out this note:
https://www.gridgain.com/docs/8.7.6/perf-troubleshooting-guide/troubleshooting#cluster-doesnt-start-after-field-type-changes

There is a hacky way to change the type if you're in development but if
you're in prod then you need to add new columns instead.

-
Denis


On Thu, Oct 10, 2019 at 5:01 AM Denis Mekhanikov 
wrote:

> Favas,
>
> It’s possible to remove a column and add another one using ALTER COMAND
> SQL statement , but
> currently you can't change a column’s type.
> Note, that removing a column and adding another one with the same name but
> with a different type can lead to data corruption.
>
> Denis
> On 10 Oct 2019, 09:51 +0300, Muhammed Favas <
> favas.muham...@expeedsoftware.com>, wrote:
>
> Hi,
>
>
>
> Is there a way in ignite to ALTER the column to change the data
> type/nullable property and also RENAME of column?
>
>
>
> I saw in ignite documentation that it will add in upcoming releases. Which
> release it is planning for?
>
>
>
> *Regards,*
>
> *Favas *
>
>
>
>


Re: Issue with adding nested index dynamically

2019-10-10 Thread Denis Magda
Ivan,

As a member of the Ignite SQL group, could you please check this thread and
help Hemambara with a proper fix?


-
Denis


On Thu, Oct 10, 2019 at 11:50 AM Hemambara  wrote:

> Sorry for the push. This is a major blocker for us. We have coherence
> clients
> where they can add indexes dynamically and we want to move them to ignite
> in
> next 3 months. Coherence has a way to add indexes dynamically and ignite
> also does supporting it. This issue exists event with QuerySqlField. If
> this
> fix cannot be merged, I am not sure how to proceed further. Do you see any
> issue putting this fix ? or it is just that we are not ready yet ?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Issue with adding nested index dynamically

2019-10-10 Thread Hemambara
Sorry for the push. This is a major blocker for us. We have coherence clients
where they can add indexes dynamically and we want to move them to ignite in
next 3 months. Coherence has a way to add indexes dynamically and ignite
also does supporting it. This issue exists event with QuerySqlField. If this
fix cannot be merged, I am not sure how to proceed further. Do you see any
issue putting this fix ? or it is just that we are not ready yet ? 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Apache Ignite Change data capture functionality

2019-10-10 Thread ravichandra
Thanks Denis.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Node stopped.

2019-10-10 Thread Denis Mekhanikov
Unfortunately, I don’t.
You can ask the VM vendor or the cloud provider (if you use any) for a proper 
tooling or logs.
Make sure, that there is no such step in the VM’s lifecycle that makes it 
freeze for a minute.
Also make sure that the physical CPU is not overutilized and no VMs that run on 
it are starving.

Denis
On 10 Oct 2019, 19:03 +0300, John Smith , wrote:
> Do you know of any good tools I can use to check the VM?
>
> > On Thu, 10 Oct 2019 at 11:38, Denis Mekhanikov  
> > wrote:
> > > > Hi Dennis, so are you saying I should enable GC logs + the safe point 
> > > > logs as well?
> > >
> > > Having safepoint statistics in your GC logs may be useful, so I recommend 
> > > enabling them for troubleshooting purposes.
> > > Check the lifecycle of your virtual machines. There is a high chance that 
> > > the whole machine is frozen, not just the Ignite node.
> > >
> > > Denis
> > > On 10 Oct 2019, 18:25 +0300, John Smith , wrote:
> > > > Hi Dennis, so are you saying I should enable GC logs + the safe point 
> > > > logs as well?
> > > >
> > > > > On Thu, 10 Oct 2019 at 11:22, John Smith  
> > > > > wrote:
> > > > > > You are correct, it is running in a VM.
> > > > > >
> > > > > > > On Thu, 10 Oct 2019 at 10:11, Denis Mekhanikov 
> > > > > > >  wrote:
> > > > > > > > Hi!
> > > > > > > >
> > > > > > > > There are the following messages in the logs:
> > > > > > > >
> > > > > > > > [22:26:21,816][WARNING][jvm-pause-detector-worker][IgniteKernal%xx]
> > > > > > > >  Possible too long JVM pause: 55705 milliseconds.
> > > > > > > > ...
> > > > > > > > [22:26:21,847][SEVERE][ttl-cleanup-worker-#48%xx%][G] 
> > > > > > > > Blocked system-critical thread has been detected. This can lead 
> > > > > > > > to cluster-wide undefined behaviour 
> > > > > > > > [threadName=partition-exchanger, blockedFor=57s]
> > > > > > > >
> > > > > > > > Looks like the JVM was paused for almost a minute. It doesn’t 
> > > > > > > > seem to be caused by a garbage collection, since there is no 
> > > > > > > > evidence of GC pressure in the GC log. Usually such big pauses 
> > > > > > > > happen in virtualised environments when backups are captured 
> > > > > > > > from machines or they just don’t have enough CPU time.
> > > > > > > >
> > > > > > > > Looking at safepoint statistics may also reveal some 
> > > > > > > > interesting details. You can learn about safepoints here: 
> > > > > > > > https://blog.gceasy.io/2016/12/22/total-time-for-which-application-threads-were-stopped/
> > > > > > > >
> > > > > > > > Denis
> > > > > > > > On 9 Oct 2019, 23:14 +0300, John Smith 
> > > > > > > > , wrote:
> > > > > > > > > So the error sais to set clientFailureDetectionTimeout=3
> > > > > > > > >
> > > > > > > > > 1- Do I put a higher value than 3?
> > > > > > > > > 2- Do I do it on the client or the server nodes or all nodes?
> > > > > > > > > 3- Also if a client is misbehaving why shutoff the server 
> > > > > > > > > node?
> > > > > > > > >
> > > > > > > > > > On Thu, 3 Oct 2019 at 21:02, John Smith 
> > > > > > > > > >  wrote:
> > > > > > > > > > > But if it's the client node that's failing why is the 
> > > > > > > > > > > server node stopping? I'm pretty sure we do verry simple 
> > > > > > > > > > > put and get operations. All the client nodes are started 
> > > > > > > > > > > as client=true
> > > > > > > > > > >
> > > > > > > > > > > > On Thu., Oct. 3, 2019, 4:18 p.m. Denis Magda, 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > Hi John,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't see any GC pressure or STW pauses either. If 
> > > > > > > > > > > > > not GC then it might have been caused by a network 
> > > > > > > > > > > > > glitch or some long-running operation started by the 
> > > > > > > > > > > > > app. These logs statement
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > [22:26:21,827][WARNING][tcp-disco-client-message-worker-#10%xx%][TcpDiscoverySpi]
> > > > > > > > > > > > >  Client node considered as unreachable and will be 
> > > > > > > > > > > > > dropped from cluster, because no metrics update 
> > > > > > > > > > > > > messages received in interval: 
> > > > > > > > > > > > > TcpDiscoverySpi.clientFailureDetectionTimeout() ms. 
> > > > > > > > > > > > > It may be caused by network problems or long GC pause 
> > > > > > > > > > > > > on client node, try to increase this parameter. 
> > > > > > > > > > > > > [nodeId=b07182d0-bf70-4318-9fe3-d7d5228bd6ef, 
> > > > > > > > > > > > > clientFailureDetectionTimeout=3]
> > > > > > > > > > > > >
> > > > > > > > > > > > > [22:26:21,839][WARNING][tcp-disco-client-message-worker-#12%xx%][TcpDiscoverySpi]
> > > > > > > > > > > > >  Client node considered as unreachable and will be 
> > > > > > > > > > > > > dropped from cluster, because no metrics update 
> > > > > > > > > > > > > messages received in interval: 
> > > > > > > > > > > > > TcpDiscoverySpi.clientFailureDetectionTimeout() ms. 
> > > > 

Re: Freeing up RAM/cache

2019-10-10 Thread Denis Mekhanikov
There is no manual way to evict data from memory.
You can limit the size of your data region, so that if this limit is reached, 
then some pages will be dropped from memory automatically and replaced with new 
ones.
This process is called page replacement. You can read about it here: 
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Pagereplacement%28rotationwithdisk%29

Denis
On 10 Oct 2019, 18:41 +0300, Kamlesh Joshi , wrote:
> Hi Igniters,
>
> Is there any way, to release main memory? I just want to remove few entries 
> from RAM (and not from the persistence). If I need data again I should be 
> able to retrieve it from persistence.
>
> I have already gone through the Expiration policy for the same. But, can we 
> set Expiration policy at the runtime or do we need to re-create the caches?
>
> Thanks and Regards,
> Kamlesh Joshi
>
>
> "Confidentiality Warning: This message and any attachments are intended only 
> for the use of the intended recipient(s), are confidential and may be 
> privileged. If you are not the intended recipient, you are hereby notified 
> that any review, re-transmission, conversion to hard copy, copying, 
> circulation or other use of this message and any attachments is strictly 
> prohibited. If you are not the intended recipient, please notify the sender 
> immediately by return email and delete this message and any attachments from 
> your system.
> Virus Warning: Although the company has taken reasonable precautions to 
> ensure no viruses are present in this email. The company cannot accept 
> responsibility for any loss or damage arising from the use of this email or 
> attachment."


Re: Issue with adding nested index dynamically

2019-10-10 Thread Ilya Kasnacheev
Hello!

Currently we do not support altering tables which are created with
indexedTypes/queryEntities.

There are plans to implement it, but your fix, while solving your
particular problem, isn't fit in this scheme IMHO.

Regards,
-- 
Ilya Kasnacheev


ср, 9 окт. 2019 г. в 22:45, Hemambara :

> My apologies for multiple replies. Please consider latest reply
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Starvation in striped pool

2019-10-10 Thread Ilya Kasnacheev
Hello!

I'm almost certain is that problem is that server node cannot open
connection to client node (and while it tries, it will reject connection
attempts from client node)

clientReconnectDisabled=true will only concern discovery. In your case,
there's no problems with discovery, the problem is with communication (port
47100). Unless discovery fails, node will not be dropped.

Regards,
-- 
Ilya Kasnacheev


чт, 10 окт. 2019 г. в 08:48, maheshkr76private :

> Ilya.
>
> What is most mysterious to me is, I disabled reconnect of think client
> (clientReconnectDisabled=true). Still the server prints, the below, where
> the same thick client is making an immediate attempt to reconnect back to
> the cluster, while the previous connecting isn't still successful.
>
> [00:47:46,004][INFO][grid-nio-worker-tcp-comm-1-#29][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/192.168.1.6:47100,
> rmtAddr=/192.168.1.71:2007]
> *[00:47:46,004][INFO][grid-nio-worker-tcp-comm-1-#29][TcpCommunicationSpi]
> Received incoming connection from remote node while connecting to this
> node,
> rejecting [locNode=3240558e-72b4-4314-a970-0965654e7e6f, locNodeOrder=1,
> rmtNode=e1be7b6f-8691-4f81-a03f-c77ff14843ef, rmtNodeOrder=675]*
>
> This I feel is an internal ignite code logic, which does not seem to be
> influenced by the user configuration of IgniteConfiguration params. Can you
> cross-check this? My point is, the same node should have to attempt a
> reconnect when the user has set clientReconnectDisabled = true in
> IgniteConfiguration
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2019-10-10 Thread Pavel Kovalenko
Ibrahim,

Could you please also share the cache configuration that is used for
dynamic creation?

чт, 10 окт. 2019 г. в 19:09, Pavel Kovalenko :

> Hi Ibrahim,
>
> I see that one node didn't send acknowledgment during cache creation:
> [2019-09-27T15:00:17,727][WARN
> ][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Unable to await
> partitions release latch within timeout: ServerLatch [permits=1,
> pendingAcks=[*3561ac09-6752-4e2e-8279-d975c268d045*],
> super=CompletableLatch [id=exchange, topVer=AffinityTopologyVersion
> [topVer=92, minorTopVer=2]]]
>
> Do you have any logs from a node with id =
> "3561ac09-6752-4e2e-8279-d975c268d045".
> You can find this node by grepping the following
> "locNodeId=3561ac09-6752-4e2e-8279-d975c268d045" like in line:
> [2019-09-27T15:24:03,532][INFO ][main][TcpDiscoverySpi] Successfully bound
> to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0,*
> locNodeId=70b49e00-5b9f-4459-9055-a05ce358be10*]
>
>
> ср, 9 окт. 2019 г. в 17:34, ihalilaltun :
>
>> Hi There Igniters,
>>
>> We had a very strange cluster behivour while creating new caches on the
>> fly.
>> Just after caches are created we start get following warnings from all
>> cluster nodes, including coordinator node;
>>
>> [2019-09-27T15:00:17,727][WARN
>> ][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Unable to await
>> partitions release latch within timeout: ServerLatch [permits=1,
>> pendingAcks=[3561ac09-6752-4e2e-8279-d975c268d045], super=CompletableLatch
>> [id=exchange, topVer=AffinityTopologyVersion [topVer=92, minorTopVer=2]]]
>>
>> After a while all client nodes are seemed to disconnected from cluster
>> with
>> no logs on clients' side.
>>
>> Coordinator node has many logs like;
>> 2019-09-27T15:00:03,124][WARN
>> ][sys-#337823][GridDhtPartitionsExchangeFuture] Partition states
>> validation
>> has failed for group: acc_1306acd07be78000_userPriceDrop. Partitions cache
>> sizes are inconsistent for Part 129:
>> [9497f1c4-13bd-4f90-bbf7-be7371cea22f=757
>> 1486cd47-7d40-400c-8e36-b66947865602=2427 ] Part 138:
>> [1486cd47-7d40-400c-8e36-b66947865602=2463
>> f9cf594b-24f2-4a91-8d84-298c97eb0f98=736 ] Part 156:
>> [b7782803-10da-45d8-b042-b5b4a880eb07=672
>> 9f0c2155-50a4-4147-b444-5cc002cf6f5d=2414 ] Part 284:
>> [b7782803-10da-45d8-b042-b5b4a880eb07=690
>> 1486cd47-7d40-400c-8e36-b66947865602=1539 ] Part 308:
>> [1486cd47-7d40-400c-8e36-b66947865602=2401
>> 7750e2f1-7102-4da2-9a9d-ea202f73905a=706 ] Part 362:
>> [1486cd47-7d40-400c-8e36-b66947865602=2387
>> 7750e2f1-7102-4da2-9a9d-ea202f73905a=697 ] Part 434:
>> [53c253e1-ccbe-4af1-a3d6-178523023c8b=681
>> 1486cd47-7d40-400c-8e36-b66947865602=1541 ] Part 499:
>> [1486cd47-7d40-400c-8e36-b66947865602=2505
>> 7750e2f1-7102-4da2-9a9d-ea202f73905a=699 ] Part 622:
>> [1486cd47-7d40-400c-8e36-b66947865602=2436
>> e97a0f3f-3175-49f7-a476-54eddd59d493=662 ] Part 662:
>> [b7782803-10da-45d8-b042-b5b4a880eb07=686
>> 1486cd47-7d40-400c-8e36-b66947865602=2445 ] Part 699:
>> [1486cd47-7d40-400c-8e36-b66947865602=2427
>> f9cf594b-24f2-4a91-8d84-298c97eb0f98=646 ] Part 827:
>> [62a05754-3f3a-4dc8-b0fa-53c0a0a0da63=703
>> 1486cd47-7d40-400c-8e36-b66947865602=1549 ] Part 923:
>> [1486cd47-7d40-400c-8e36-b66947865602=2434
>> a9e9eaba-d227-4687-8c6c-7ed522e6c342=706 ] Part 967:
>> [62a05754-3f3a-4dc8-b0fa-53c0a0a0da63=673
>> 1486cd47-7d40-400c-8e36-b66947865602=1595 ] Part 976:
>> [33301384-3293-417f-b94a-ed36ebc82583=666
>> 1486cd47-7d40-400c-8e36-b66947865602=2384 ]
>>
>> Coordinator's log and one of the cluster node's log is attached.
>> coordinator_log.gz
>> <
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2515/coordinator_log.gz>
>>
>> cluster_node_log.gz
>> <
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2515/cluster_node_log.gz>
>>
>>
>> Any help/comment is appriciated.
>>
>> Thanks.
>>
>>
>>
>>
>>
>> -
>> İbrahim Halil Altun
>> Senior Software Engineer @ Segmentify
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>


Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2019-10-10 Thread Pavel Kovalenko
Hi Ibrahim,

I see that one node didn't send acknowledgment during cache creation:
[2019-09-27T15:00:17,727][WARN
][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Unable to await
partitions release latch within timeout: ServerLatch [permits=1,
pendingAcks=[*3561ac09-6752-4e2e-8279-d975c268d045*],
super=CompletableLatch [id=exchange, topVer=AffinityTopologyVersion
[topVer=92, minorTopVer=2]]]

Do you have any logs from a node with id =
"3561ac09-6752-4e2e-8279-d975c268d045".
You can find this node by grepping the following
"locNodeId=3561ac09-6752-4e2e-8279-d975c268d045" like in line:
[2019-09-27T15:24:03,532][INFO ][main][TcpDiscoverySpi] Successfully bound
to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0,*
locNodeId=70b49e00-5b9f-4459-9055-a05ce358be10*]


ср, 9 окт. 2019 г. в 17:34, ihalilaltun :

> Hi There Igniters,
>
> We had a very strange cluster behivour while creating new caches on the
> fly.
> Just after caches are created we start get following warnings from all
> cluster nodes, including coordinator node;
>
> [2019-09-27T15:00:17,727][WARN
> ][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Unable to await
> partitions release latch within timeout: ServerLatch [permits=1,
> pendingAcks=[3561ac09-6752-4e2e-8279-d975c268d045], super=CompletableLatch
> [id=exchange, topVer=AffinityTopologyVersion [topVer=92, minorTopVer=2]]]
>
> After a while all client nodes are seemed to disconnected from cluster with
> no logs on clients' side.
>
> Coordinator node has many logs like;
> 2019-09-27T15:00:03,124][WARN
> ][sys-#337823][GridDhtPartitionsExchangeFuture] Partition states validation
> has failed for group: acc_1306acd07be78000_userPriceDrop. Partitions cache
> sizes are inconsistent for Part 129:
> [9497f1c4-13bd-4f90-bbf7-be7371cea22f=757
> 1486cd47-7d40-400c-8e36-b66947865602=2427 ] Part 138:
> [1486cd47-7d40-400c-8e36-b66947865602=2463
> f9cf594b-24f2-4a91-8d84-298c97eb0f98=736 ] Part 156:
> [b7782803-10da-45d8-b042-b5b4a880eb07=672
> 9f0c2155-50a4-4147-b444-5cc002cf6f5d=2414 ] Part 284:
> [b7782803-10da-45d8-b042-b5b4a880eb07=690
> 1486cd47-7d40-400c-8e36-b66947865602=1539 ] Part 308:
> [1486cd47-7d40-400c-8e36-b66947865602=2401
> 7750e2f1-7102-4da2-9a9d-ea202f73905a=706 ] Part 362:
> [1486cd47-7d40-400c-8e36-b66947865602=2387
> 7750e2f1-7102-4da2-9a9d-ea202f73905a=697 ] Part 434:
> [53c253e1-ccbe-4af1-a3d6-178523023c8b=681
> 1486cd47-7d40-400c-8e36-b66947865602=1541 ] Part 499:
> [1486cd47-7d40-400c-8e36-b66947865602=2505
> 7750e2f1-7102-4da2-9a9d-ea202f73905a=699 ] Part 622:
> [1486cd47-7d40-400c-8e36-b66947865602=2436
> e97a0f3f-3175-49f7-a476-54eddd59d493=662 ] Part 662:
> [b7782803-10da-45d8-b042-b5b4a880eb07=686
> 1486cd47-7d40-400c-8e36-b66947865602=2445 ] Part 699:
> [1486cd47-7d40-400c-8e36-b66947865602=2427
> f9cf594b-24f2-4a91-8d84-298c97eb0f98=646 ] Part 827:
> [62a05754-3f3a-4dc8-b0fa-53c0a0a0da63=703
> 1486cd47-7d40-400c-8e36-b66947865602=1549 ] Part 923:
> [1486cd47-7d40-400c-8e36-b66947865602=2434
> a9e9eaba-d227-4687-8c6c-7ed522e6c342=706 ] Part 967:
> [62a05754-3f3a-4dc8-b0fa-53c0a0a0da63=673
> 1486cd47-7d40-400c-8e36-b66947865602=1595 ] Part 976:
> [33301384-3293-417f-b94a-ed36ebc82583=666
> 1486cd47-7d40-400c-8e36-b66947865602=2384 ]
>
> Coordinator's log and one of the cluster node's log is attached.
> coordinator_log.gz
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2515/coordinator_log.gz>
>
> cluster_node_log.gz
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2515/cluster_node_log.gz>
>
>
> Any help/comment is appriciated.
>
> Thanks.
>
>
>
>
>
> -
> İbrahim Halil Altun
> Senior Software Engineer @ Segmentify
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Node stopped.

2019-10-10 Thread John Smith
Do you know of any good tools I can use to check the VM?

On Thu, 10 Oct 2019 at 11:38, Denis Mekhanikov 
wrote:

> > Hi Dennis, so are you saying I should enable GC logs + the safe point
> logs as well?
>
> Having safepoint statistics in your GC logs may be useful, so I recommend
> enabling them for troubleshooting purposes.
> Check the lifecycle of your virtual machines. There is a high chance that
> the whole machine is frozen, not just the Ignite node.
>
> Denis
> On 10 Oct 2019, 18:25 +0300, John Smith , wrote:
>
> Hi Dennis, so are you saying I should enable GC logs + the safe point logs
> as well?
>
> On Thu, 10 Oct 2019 at 11:22, John Smith  wrote:
>
>> You are correct, it is running in a VM.
>>
>> On Thu, 10 Oct 2019 at 10:11, Denis Mekhanikov 
>> wrote:
>>
>>> Hi!
>>>
>>> There are the following messages in the logs:
>>>
>>> [22:26:21,816][WARNING][jvm-pause-detector-worker][IgniteKernal%xx]
>>> Possible too long JVM pause: *55705 milliseconds*.
>>> ...
>>> [22:26:21,847][SEVERE][ttl-cleanup-worker-#48%xx%][G] Blocked
>>> system-critical thread has been detected. This can lead to cluster-wide
>>> undefined behaviour [threadName=partition-exchanger, blockedFor=*57s*]
>>>
>>> Looks like the JVM was paused for almost a minute. It doesn’t seem to be
>>> caused by a garbage collection, since there is no evidence of GC pressure
>>> in the GC log. Usually such big pauses happen in virtualised environments
>>> when backups are captured from machines or they just don’t have enough CPU
>>> time.
>>>
>>> Looking at safepoint statistics may also reveal some interesting
>>> details. You can learn about safepoints here:
>>> https://blog.gceasy.io/2016/12/22/total-time-for-which-application-threads-were-stopped/
>>>
>>> Denis
>>> On 9 Oct 2019, 23:14 +0300, John Smith , wrote:
>>>
>>> So the error sais to set clientFailureDetectionTimeout=3
>>>
>>> 1- Do I put a higher value than 3?
>>> 2- Do I do it on the client or the server nodes or all nodes?
>>> 3- Also if a client is misbehaving why shutoff the server node?
>>>
>>> On Thu, 3 Oct 2019 at 21:02, John Smith  wrote:
>>>
 But if it's the client node that's failing why is the server node
 stopping? I'm pretty sure we do verry simple put and get operations. All
 the client nodes are started as client=true

 On Thu., Oct. 3, 2019, 4:18 p.m. Denis Magda, 
 wrote:

> Hi John,
>
> I don't see any GC pressure or STW pauses either. If not GC then it
> might have been caused by a network glitch or some long-running operation
> started by the app. These logs statement
>
>
> [22:26:21,827][WARNING][tcp-disco-client-message-worker-#10%xx%][TcpDiscoverySpi]
> Client node considered as unreachable and will be dropped from cluster,
> because no metrics update messages received in interval:
> TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by
> network problems or long GC pause on client node, try to increase this
> parameter. [nodeId=b07182d0-bf70-4318-9fe3-d7d5228bd6ef,
> clientFailureDetectionTimeout=3]
>
>
> [22:26:21,839][WARNING][tcp-disco-client-message-worker-#12%xx%][TcpDiscoverySpi]
> Client node considered as unreachable and will be dropped from cluster,
> because no metrics update messages received in interval:
> TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by
> network problems or long GC pause on client node, try to increase this
> parameter. [nodeId=302cff60-b88d-40da-9e12-b955e6bf973d,
> clientFailureDetectionTimeout=3]
>
>
> [22:26:21,847][SEVERE][ttl-cleanup-worker-#48%xx%][G] Blocked
> system-critical thread has been detected. This can lead to cluster-wide
> undefined behaviour [threadName=partition-exchanger, blockedFor=57s]
>
>
> 22:26:21,954][SEVERE][ttl-cleanup-worker-#48%xx%][] Critical
> system error detected. Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
> [name=partition-exchanger, igniteInstanceName=xx, finished=false,
> heartbeatTs=1568931981805]]]
>
>
>
>
> -
> Denis
>
>
> On Thu, Oct 3, 2019 at 11:50 AM John Smith 
> wrote:
>
>> So I have been monitoring my node and the same one seems to stop once
>> in a while.
>>
>> https://www.dropbox.com/s/7n5qfsl5uyi1obt/ignite-logs.zip?dl=0
>>
>> I have attached the GC logs and the ignite logs. From what I see from
>> gc.logs I don't see big pauses. I could be wrong.
>>
>> The machine is 16GB and I have the configs here:
>> https://www.dropbox.com/s/hkv38s3vce5a4sk/ignite-config.xml?dl=0
>>
>> Here are the 

Freeing up RAM/cache

2019-10-10 Thread Kamlesh Joshi
Hi Igniters,

Is there any way, to release main memory? I just want to remove few entries 
from RAM (and not from the persistence). If I need data again I should be able 
to retrieve it from persistence.

I have already gone through the Expiration policy for the same. But, can we set 
Expiration policy at the runtime or do we need to re-create the caches?

Thanks and Regards,
Kamlesh Joshi

"Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. 
you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other 
use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the 
sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. 
The company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachment."


Re: Node stopped.

2019-10-10 Thread Denis Mekhanikov
> Hi Dennis, so are you saying I should enable GC logs + the safe point logs as 
> well?

Having safepoint statistics in your GC logs may be useful, so I recommend 
enabling them for troubleshooting purposes.
Check the lifecycle of your virtual machines. There is a high chance that the 
whole machine is frozen, not just the Ignite node.

Denis
On 10 Oct 2019, 18:25 +0300, John Smith , wrote:
> Hi Dennis, so are you saying I should enable GC logs + the safe point logs as 
> well?
>
> > On Thu, 10 Oct 2019 at 11:22, John Smith  wrote:
> > > You are correct, it is running in a VM.
> > >
> > > > On Thu, 10 Oct 2019 at 10:11, Denis Mekhanikov  
> > > > wrote:
> > > > > Hi!
> > > > >
> > > > > There are the following messages in the logs:
> > > > >
> > > > > [22:26:21,816][WARNING][jvm-pause-detector-worker][IgniteKernal%xx]
> > > > >  Possible too long JVM pause: 55705 milliseconds.
> > > > > ...
> > > > > [22:26:21,847][SEVERE][ttl-cleanup-worker-#48%xx%][G] Blocked 
> > > > > system-critical thread has been detected. This can lead to 
> > > > > cluster-wide undefined behaviour [threadName=partition-exchanger, 
> > > > > blockedFor=57s]
> > > > >
> > > > > Looks like the JVM was paused for almost a minute. It doesn’t seem to 
> > > > > be caused by a garbage collection, since there is no evidence of GC 
> > > > > pressure in the GC log. Usually such big pauses happen in virtualised 
> > > > > environments when backups are captured from machines or they just 
> > > > > don’t have enough CPU time.
> > > > >
> > > > > Looking at safepoint statistics may also reveal some interesting 
> > > > > details. You can learn about safepoints here: 
> > > > > https://blog.gceasy.io/2016/12/22/total-time-for-which-application-threads-were-stopped/
> > > > >
> > > > > Denis
> > > > > On 9 Oct 2019, 23:14 +0300, John Smith , 
> > > > > wrote:
> > > > > > So the error sais to set clientFailureDetectionTimeout=3
> > > > > >
> > > > > > 1- Do I put a higher value than 3?
> > > > > > 2- Do I do it on the client or the server nodes or all nodes?
> > > > > > 3- Also if a client is misbehaving why shutoff the server node?
> > > > > >
> > > > > > > On Thu, 3 Oct 2019 at 21:02, John Smith  
> > > > > > > wrote:
> > > > > > > > But if it's the client node that's failing why is the server 
> > > > > > > > node stopping? I'm pretty sure we do verry simple put and get 
> > > > > > > > operations. All the client nodes are started as client=true
> > > > > > > >
> > > > > > > > > On Thu., Oct. 3, 2019, 4:18 p.m. Denis Magda, 
> > > > > > > > >  wrote:
> > > > > > > > > > Hi John,
> > > > > > > > > >
> > > > > > > > > > I don't see any GC pressure or STW pauses either. If not GC 
> > > > > > > > > > then it might have been caused by a network glitch or some 
> > > > > > > > > > long-running operation started by the app. These logs 
> > > > > > > > > > statement
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [22:26:21,827][WARNING][tcp-disco-client-message-worker-#10%xx%][TcpDiscoverySpi]
> > > > > > > > > >  Client node considered as unreachable and will be dropped 
> > > > > > > > > > from cluster, because no metrics update messages received 
> > > > > > > > > > in interval: 
> > > > > > > > > > TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may 
> > > > > > > > > > be caused by network problems or long GC pause on client 
> > > > > > > > > > node, try to increase this parameter. 
> > > > > > > > > > [nodeId=b07182d0-bf70-4318-9fe3-d7d5228bd6ef, 
> > > > > > > > > > clientFailureDetectionTimeout=3]
> > > > > > > > > >
> > > > > > > > > > [22:26:21,839][WARNING][tcp-disco-client-message-worker-#12%xx%][TcpDiscoverySpi]
> > > > > > > > > >  Client node considered as unreachable and will be dropped 
> > > > > > > > > > from cluster, because no metrics update messages received 
> > > > > > > > > > in interval: 
> > > > > > > > > > TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may 
> > > > > > > > > > be caused by network problems or long GC pause on client 
> > > > > > > > > > node, try to increase this parameter. 
> > > > > > > > > > [nodeId=302cff60-b88d-40da-9e12-b955e6bf973d, 
> > > > > > > > > > clientFailureDetectionTimeout=3]
> > > > > > > > > >
> > > > > > > > > > [22:26:21,847][SEVERE][ttl-cleanup-worker-#48%xx%][G] 
> > > > > > > > > > Blocked system-critical thread has been detected. This can 
> > > > > > > > > > lead to cluster-wide undefined behaviour 
> > > > > > > > > > [threadName=partition-exchanger, blockedFor=57s]
> > > > > > > > > >
> > > > > > > > > > 22:26:21,954][SEVERE][ttl-cleanup-worker-#48%xx%][] 
> > > > > > > > > > Critical system error detected. Will be handled accordingly 
> > > > > > > > > > to configured handler [hnd=StopNodeOrHaltFailureHandler 
> > > > > > > > > > [tryStop=false, timeout=0, super=AbstractFailureHandler 
> > > > > > > > > > [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], 
> > > > > > > > > > failureCtx=FailureContext 

Re: Node stopped.

2019-10-10 Thread John Smith
You are correct, it is running in a VM.

On Thu, 10 Oct 2019 at 10:11, Denis Mekhanikov 
wrote:

> Hi!
>
> There are the following messages in the logs:
>
> [22:26:21,816][WARNING][jvm-pause-detector-worker][IgniteKernal%xx]
> Possible too long JVM pause: *55705 milliseconds*.
> ...
> [22:26:21,847][SEVERE][ttl-cleanup-worker-#48%xx%][G] Blocked
> system-critical thread has been detected. This can lead to cluster-wide
> undefined behaviour [threadName=partition-exchanger, blockedFor=*57s*]
>
> Looks like the JVM was paused for almost a minute. It doesn’t seem to be
> caused by a garbage collection, since there is no evidence of GC pressure
> in the GC log. Usually such big pauses happen in virtualised environments
> when backups are captured from machines or they just don’t have enough CPU
> time.
>
> Looking at safepoint statistics may also reveal some interesting details.
> You can learn about safepoints here:
> https://blog.gceasy.io/2016/12/22/total-time-for-which-application-threads-were-stopped/
>
> Denis
> On 9 Oct 2019, 23:14 +0300, John Smith , wrote:
>
> So the error sais to set clientFailureDetectionTimeout=3
>
> 1- Do I put a higher value than 3?
> 2- Do I do it on the client or the server nodes or all nodes?
> 3- Also if a client is misbehaving why shutoff the server node?
>
> On Thu, 3 Oct 2019 at 21:02, John Smith  wrote:
>
>> But if it's the client node that's failing why is the server node
>> stopping? I'm pretty sure we do verry simple put and get operations. All
>> the client nodes are started as client=true
>>
>> On Thu., Oct. 3, 2019, 4:18 p.m. Denis Magda,  wrote:
>>
>>> Hi John,
>>>
>>> I don't see any GC pressure or STW pauses either. If not GC then it
>>> might have been caused by a network glitch or some long-running operation
>>> started by the app. These logs statement
>>>
>>>
>>> [22:26:21,827][WARNING][tcp-disco-client-message-worker-#10%xx%][TcpDiscoverySpi]
>>> Client node considered as unreachable and will be dropped from cluster,
>>> because no metrics update messages received in interval:
>>> TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by
>>> network problems or long GC pause on client node, try to increase this
>>> parameter. [nodeId=b07182d0-bf70-4318-9fe3-d7d5228bd6ef,
>>> clientFailureDetectionTimeout=3]
>>>
>>>
>>> [22:26:21,839][WARNING][tcp-disco-client-message-worker-#12%xx%][TcpDiscoverySpi]
>>> Client node considered as unreachable and will be dropped from cluster,
>>> because no metrics update messages received in interval:
>>> TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by
>>> network problems or long GC pause on client node, try to increase this
>>> parameter. [nodeId=302cff60-b88d-40da-9e12-b955e6bf973d,
>>> clientFailureDetectionTimeout=3]
>>>
>>>
>>> [22:26:21,847][SEVERE][ttl-cleanup-worker-#48%xx%][G] Blocked
>>> system-critical thread has been detected. This can lead to cluster-wide
>>> undefined behaviour [threadName=partition-exchanger, blockedFor=57s]
>>>
>>>
>>> 22:26:21,954][SEVERE][ttl-cleanup-worker-#48%xx%][] Critical system
>>> error detected. Will be handled accordingly to configured handler
>>> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
>>> super=AbstractFailureHandler
>>> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
>>> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
>>> [name=partition-exchanger, igniteInstanceName=xx, finished=false,
>>> heartbeatTs=1568931981805]]]
>>>
>>>
>>>
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Thu, Oct 3, 2019 at 11:50 AM John Smith 
>>> wrote:
>>>
 So I have been monitoring my node and the same one seems to stop once
 in a while.

 https://www.dropbox.com/s/7n5qfsl5uyi1obt/ignite-logs.zip?dl=0

 I have attached the GC logs and the ignite logs. From what I see from
 gc.logs I don't see big pauses. I could be wrong.

 The machine is 16GB and I have the configs here:
 https://www.dropbox.com/s/hkv38s3vce5a4sk/ignite-config.xml?dl=0

 Here are the JVM settings...

 if [ -z "$JVM_OPTS" ] ; then
 JVM_OPTS="-Xms2g -Xmx2g -server -XX:MaxMetaspaceSize=256m"
 fi

 JVM_OPTS="$JVM_OPTS -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails
 -Xloggc:/var/log/apache-ignite/gc.log"

 JVM_OPTS="${JVM_OPTS} -Xss16m"

>>>


Re: Node stopped.

2019-10-10 Thread John Smith
Hi Dennis, so are you saying I should enable GC logs + the safe point logs
as well?

On Thu, 10 Oct 2019 at 11:22, John Smith  wrote:

> You are correct, it is running in a VM.
>
> On Thu, 10 Oct 2019 at 10:11, Denis Mekhanikov 
> wrote:
>
>> Hi!
>>
>> There are the following messages in the logs:
>>
>> [22:26:21,816][WARNING][jvm-pause-detector-worker][IgniteKernal%xx]
>> Possible too long JVM pause: *55705 milliseconds*.
>> ...
>> [22:26:21,847][SEVERE][ttl-cleanup-worker-#48%xx%][G] Blocked
>> system-critical thread has been detected. This can lead to cluster-wide
>> undefined behaviour [threadName=partition-exchanger, blockedFor=*57s*]
>>
>> Looks like the JVM was paused for almost a minute. It doesn’t seem to be
>> caused by a garbage collection, since there is no evidence of GC pressure
>> in the GC log. Usually such big pauses happen in virtualised environments
>> when backups are captured from machines or they just don’t have enough CPU
>> time.
>>
>> Looking at safepoint statistics may also reveal some interesting details.
>> You can learn about safepoints here:
>> https://blog.gceasy.io/2016/12/22/total-time-for-which-application-threads-were-stopped/
>>
>> Denis
>> On 9 Oct 2019, 23:14 +0300, John Smith , wrote:
>>
>> So the error sais to set clientFailureDetectionTimeout=3
>>
>> 1- Do I put a higher value than 3?
>> 2- Do I do it on the client or the server nodes or all nodes?
>> 3- Also if a client is misbehaving why shutoff the server node?
>>
>> On Thu, 3 Oct 2019 at 21:02, John Smith  wrote:
>>
>>> But if it's the client node that's failing why is the server node
>>> stopping? I'm pretty sure we do verry simple put and get operations. All
>>> the client nodes are started as client=true
>>>
>>> On Thu., Oct. 3, 2019, 4:18 p.m. Denis Magda,  wrote:
>>>
 Hi John,

 I don't see any GC pressure or STW pauses either. If not GC then it
 might have been caused by a network glitch or some long-running operation
 started by the app. These logs statement


 [22:26:21,827][WARNING][tcp-disco-client-message-worker-#10%xx%][TcpDiscoverySpi]
 Client node considered as unreachable and will be dropped from cluster,
 because no metrics update messages received in interval:
 TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by
 network problems or long GC pause on client node, try to increase this
 parameter. [nodeId=b07182d0-bf70-4318-9fe3-d7d5228bd6ef,
 clientFailureDetectionTimeout=3]


 [22:26:21,839][WARNING][tcp-disco-client-message-worker-#12%xx%][TcpDiscoverySpi]
 Client node considered as unreachable and will be dropped from cluster,
 because no metrics update messages received in interval:
 TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by
 network problems or long GC pause on client node, try to increase this
 parameter. [nodeId=302cff60-b88d-40da-9e12-b955e6bf973d,
 clientFailureDetectionTimeout=3]


 [22:26:21,847][SEVERE][ttl-cleanup-worker-#48%xx%][G] Blocked
 system-critical thread has been detected. This can lead to cluster-wide
 undefined behaviour [threadName=partition-exchanger, blockedFor=57s]


 22:26:21,954][SEVERE][ttl-cleanup-worker-#48%xx%][] Critical system
 error detected. Will be handled accordingly to configured handler
 [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
 super=AbstractFailureHandler
 [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
 [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
 [name=partition-exchanger, igniteInstanceName=xx, finished=false,
 heartbeatTs=1568931981805]]]




 -
 Denis


 On Thu, Oct 3, 2019 at 11:50 AM John Smith 
 wrote:

> So I have been monitoring my node and the same one seems to stop once
> in a while.
>
> https://www.dropbox.com/s/7n5qfsl5uyi1obt/ignite-logs.zip?dl=0
>
> I have attached the GC logs and the ignite logs. From what I see from
> gc.logs I don't see big pauses. I could be wrong.
>
> The machine is 16GB and I have the configs here:
> https://www.dropbox.com/s/hkv38s3vce5a4sk/ignite-config.xml?dl=0
>
> Here are the JVM settings...
>
> if [ -z "$JVM_OPTS" ] ; then
> JVM_OPTS="-Xms2g -Xmx2g -server -XX:MaxMetaspaceSize=256m"
> fi
>
> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails
> -Xloggc:/var/log/apache-ignite/gc.log"
>
> JVM_OPTS="${JVM_OPTS} -Xss16m"
>



Re: Apache Ignite Change data capture functionality

2019-10-10 Thread Denis Mekhanikov
Ravichandra,

There is no integration for Striim in Apache Ignite codebase, so you need to 
check with the Striim documentation if it’s possible to configure it for Ignite.
If you want to use Ignite as a target, then a JDBC adapter should work, if 
there is any available.
If Ignite should work as a source, then you’ll probably need to implement the 
adapter yourself by using continuous queries or cache events.

Denis
On 9 Oct 2019, 21:13 +0300, ravichandra , 
wrote:
> As part of change data capture functionality can apache ignite be integrated
> with Striim which is a real-time data integration software. I read that the
> same functionality can be achieved by integrating with Oracle Golden gate.
> But I am curious to know whether CDC is available via Striim.
>
> Thanks,
> Ravichandra
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Node stopped.

2019-10-10 Thread Denis Mekhanikov
Hi!

There are the following messages in the logs:

[22:26:21,816][WARNING][jvm-pause-detector-worker][IgniteKernal%xx] 
Possible too long JVM pause: 55705 milliseconds.
...
[22:26:21,847][SEVERE][ttl-cleanup-worker-#48%xx%][G] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=partition-exchanger, blockedFor=57s]

Looks like the JVM was paused for almost a minute. It doesn’t seem to be caused 
by a garbage collection, since there is no evidence of GC pressure in the GC 
log. Usually such big pauses happen in virtualised environments when backups 
are captured from machines or they just don’t have enough CPU time.

Looking at safepoint statistics may also reveal some interesting details. You 
can learn about safepoints here: 
https://blog.gceasy.io/2016/12/22/total-time-for-which-application-threads-were-stopped/

Denis
On 9 Oct 2019, 23:14 +0300, John Smith , wrote:
> So the error sais to set clientFailureDetectionTimeout=3
>
> 1- Do I put a higher value than 3?
> 2- Do I do it on the client or the server nodes or all nodes?
> 3- Also if a client is misbehaving why shutoff the server node?
>
> > On Thu, 3 Oct 2019 at 21:02, John Smith  wrote:
> > > But if it's the client node that's failing why is the server node 
> > > stopping? I'm pretty sure we do verry simple put and get operations. All 
> > > the client nodes are started as client=true
> > >
> > > > On Thu., Oct. 3, 2019, 4:18 p.m. Denis Magda,  wrote:
> > > > > Hi John,
> > > > >
> > > > > I don't see any GC pressure or STW pauses either. If not GC then it 
> > > > > might have been caused by a network glitch or some long-running 
> > > > > operation started by the app. These logs statement
> > > > >
> > > > >
> > > > > [22:26:21,827][WARNING][tcp-disco-client-message-worker-#10%xx%][TcpDiscoverySpi]
> > > > >  Client node considered as unreachable and will be dropped from 
> > > > > cluster, because no metrics update messages received in interval: 
> > > > > TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused 
> > > > > by network problems or long GC pause on client node, try to increase 
> > > > > this parameter. [nodeId=b07182d0-bf70-4318-9fe3-d7d5228bd6ef, 
> > > > > clientFailureDetectionTimeout=3]
> > > > >
> > > > > [22:26:21,839][WARNING][tcp-disco-client-message-worker-#12%xx%][TcpDiscoverySpi]
> > > > >  Client node considered as unreachable and will be dropped from 
> > > > > cluster, because no metrics update messages received in interval: 
> > > > > TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused 
> > > > > by network problems or long GC pause on client node, try to increase 
> > > > > this parameter. [nodeId=302cff60-b88d-40da-9e12-b955e6bf973d, 
> > > > > clientFailureDetectionTimeout=3]
> > > > >
> > > > > [22:26:21,847][SEVERE][ttl-cleanup-worker-#48%xx%][G] Blocked 
> > > > > system-critical thread has been detected. This can lead to 
> > > > > cluster-wide undefined behaviour [threadName=partition-exchanger, 
> > > > > blockedFor=57s]
> > > > >
> > > > > 22:26:21,954][SEVERE][ttl-cleanup-worker-#48%xx%][] Critical 
> > > > > system error detected. Will be handled accordingly to configured 
> > > > > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> > > > > super=AbstractFailureHandler 
> > > > > [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], 
> > > > > failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> > > > > o.a.i.IgniteException: GridWorker [name=partition-exchanger, 
> > > > > igniteInstanceName=xx, finished=false, 
> > > > > heartbeatTs=1568931981805]]]
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > -
> > > > > Denis
> > > > >
> > > > >
> > > > > > On Thu, Oct 3, 2019 at 11:50 AM John Smith  
> > > > > > wrote:
> > > > > > > So I have been monitoring my node and the same one seems to stop 
> > > > > > > once in a while.
> > > > > > >
> > > > > > > https://www.dropbox.com/s/7n5qfsl5uyi1obt/ignite-logs.zip?dl=0
> > > > > > >
> > > > > > > I have attached the GC logs and the ignite logs. From what I see 
> > > > > > > from gc.logs I don't see big pauses. I could be wrong.
> > > > > > >
> > > > > > > The machine is 16GB and I have the configs here: 
> > > > > > > https://www.dropbox.com/s/hkv38s3vce5a4sk/ignite-config.xml?dl=0
> > > > > > >
> > > > > > > Here are the JVM settings...
> > > > > > >
> > > > > > > if [ -z "$JVM_OPTS" ] ; then
> > > > > > >     JVM_OPTS="-Xms2g -Xmx2g -server -XX:MaxMetaspaceSize=256m"
> > > > > > > fi
> > > > > > >
> > > > > > > JVM_OPTS="$JVM_OPTS -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails 
> > > > > > > -Xloggc:/var/log/apache-ignite/gc.log"
> > > > > > >
> > > > > > > JVM_OPTS="${JVM_OPTS} -Xss16m"


Re: Ignite SQL table ALTER COLUMN and RENAME COLUMN

2019-10-10 Thread Denis Mekhanikov
Favas,

It’s possible to remove a column and add another one using ALTER COMAND SQL 
statement, but currently you can't change a column’s type.
Note, that removing a column and adding another one with the same name but with 
a different type can lead to data corruption.

Denis
On 10 Oct 2019, 09:51 +0300, Muhammed Favas 
, wrote:
> Hi,
>
> Is there a way in ignite to ALTER the column to change the data type/nullable 
> property and also RENAME of column?
>
> I saw in ignite documentation that it will add in upcoming releases. Which 
> release it is planning for?
>
> Regards,
> Favas
>


Re: Authenticating communication between nodes using Ignite.Net

2019-10-10 Thread Pavel Tupitsyn
1. Thick client/server nodes use a different communication mechanism than
Thin client nodes,
and network ports are different. Security mechanism is also different.

E.g. you can have a cluster of server nodes running in a controlled
environment, with only 10800
port open to the outside. This way only Thin Client nodes can connect from
outside,
and server-to-server connections do not need authentication.

2. Ignite does not provide ready to use server-to-server auth out of the
box (neither Java nor .NET),
some third party vendors provide this via plugins.
If you have to stick with Ignite, you'll have to write a plugin, part of
which has to be in Java.
See Ignite.NET plugin system:
https://apacheignite-net.readme.io/docs/plugins

On Thu, Oct 10, 2019 at 2:07 AM alokyadav12  wrote:

> We are new to Ignite.Net and trying to implement few security feature
> before
> deciding final implementation in product.
>
> We had implemented authentication on Ignite Server and when connecting Thin
> client it user user id and password and working as expected.
> We had noticed that if we spun off another Node then it connects
> automatically to running node and doesnt need username and password.
>
> Question 1 : Does thick client and node does not authenticate when
> connecting nodes?
>
> Question 2 : Found an article to create custome plugin and authenticate
> http://smartkey.co.uk/development/securing-an-apache-ignite-cluster/. This
> article focused on Java implementation, but we are using Ignite.Net and
> didnt find the  DiscoverySpiNodeAuthenticator,  GridSecurityProcessor
> Interfaces to create a plugin. Are these classes available to use in
> Ignite.Net? Is there any other alternate available.
>
> Is there any other way we can authenticate thick client and nodes when
> connecting, as we need to secure nodes so only authenticated nodes and
> Thick
> client can connect.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Ignite SQL table ALTER COLUMN and RENAME COLUMN

2019-10-10 Thread Muhammed Favas
Hi,

Is there a way in ignite to ALTER the column to change the data type/nullable 
property and also RENAME of column?

I saw in ignite documentation that it will add in upcoming releases. Which 
release it is planning for?

Regards,
Favas