Re: Ignite on AKS and RBAC issue

2020-07-10 Thread steve.hostettler
Hello Alex, thanks for the tip but putting everything on the namespace ignite
does not help.
I also rechecked the documentation. I still get the 403. 

Additional question : how does the service account and the service relate?

So I have a

1) service account
kubectl describe serviceaccount ignite -n ignite
Name:ignite
Namespace:   ignite
Labels:  app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: pe-v1
 meta.helm.sh/release-namespace: ignite
Image pull secrets:  
Mountable secrets:   ignite-token-htqrp
Tokens:  ignite-token-htqrp
Events:  

2) a clusterrole
kubectl describe clusterrole ignite -n ignite
Name: ignite
Labels:   app.kubernetes.io/managed-by=Helm
  release=pe-v1
Annotations:  meta.helm.sh/release-name: pe-v1
  meta.helm.sh/release-namespace: ignite
PolicyRule:
  Resources  Non-Resource URLs  Resource Names  Verbs
  -  -  --  -
  endpoints  [] []  [get list watch]
  pods   [] []  [get list watch]

3) a clusterrolebinding
kubectl describe clusterrolebinding ignite -n ignite
Name: ignite
Labels:   app.kubernetes.io/managed-by=Helm
  release=pe-v1
Annotations:  meta.helm.sh/release-name: pe-v1
  meta.helm.sh/release-namespace: ignite
Role:
  Kind:  ClusterRole
  Name:  ignite
Subjects:
  KindNameNamespace
  -
  ServiceAccount  ignite  ignite

4)a service
kubectl describe svc processing-engine-pe-v1-ignite -n ignite
Name:  processing-engine-pe-v1-ignite
Namespace: ignite
Labels:app.kubernetes.io/managed-by=Helm
Annotations:   meta.helm.sh/release-name: pe-v1
   meta.helm.sh/release-namespace: ignite
Selector:  type=processing-engine-pe-v1.node
Type:  ClusterIP
IP:None
Port:  service-discovery  47500/TCP
TargetPort:47500/TCP
Endpoints: 10.244.0.34:47500,10.244.1.31:47500
Session Affinity:  None
Events:

But somehow I still get a 403
2020-07-10 22:08:51,837 INFO 
[org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi] (ServerService Thread
Pool -- 15) Successfully bound to TCP port [port=47500,
localHost=0.0.0.0/0.0.0.0, locNodeId=c651239a-2964-4b8b-915b-c055bcf410ed]
2020-07-10 22:08:52,029 ERROR
[org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi] (ServerService Thread
Pool -- 15) Failed to get registered addresses from IP finder on start
(retrying every 2000ms; change 'reconnectDelay' to configure the frequency
of retries).: class org.apache.ignite.spi.IgniteSpiException: Failed to
retrieve Ignite pods IP addresses.
at
org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder.getRegisteredAddresses(TcpDiscoveryKubernetesIpFinder.java:172)
at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.registeredAddresses(TcpDiscoverySpi.java:1900)
at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.resolvedAddresses(TcpDiscoverySpi.java:1848)

at
org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
at java.lang.Thread.run(Thread.java:748)
at org.jboss.threads.JBossThread.run(JBossThread.java:485)
Caused by: java.io.IOException: Server returned HTTP response code: 403 for
URL:
https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/ignite/endpoints/processing-engine-pe-v1-ignite
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1900)
2020-07-10 22:13:47,219 ERROR [org.jboss.as.controller.management-operation]
(main) WFLYCTL0190: Step handler
org.jboss.as.server.deployment.DeploymentHandlerUtil$1@63778853 for
operation add at address [("deployment" => "reg.war")] failed handling
operation rollback -- java.util.concurrent.TimeoutException:
java.util.concurrent.TimeoutException
at
org.jboss.as.controller.OperationContextImpl.waitForRemovals(OperationContextImpl.java:523)
at org.wildfly.swarm.bootstrap.Main.main(Main.java:87)

2020-07-10 22:13:52,220 ERROR [org.jboss.as.controller.management-operation]
(main) WFLYCTL0349: Timeout after [5] seconds waiting for service container
stability while finalizing an operation. Process must be restarted. Step
that first updated the service container was 'add' at address
'[("deployment" => "reg.war")]'
2020-07-10 22:13:52,225 ERROR [stderr] (main)
org.wildfly.swarm.container.DeploymentException:
org.wildfly.swarm.container.DeploymentException: THORN0004: Deployment
failed: WFLYCTL0344: Operation timed out awaiting service container
stability
2020-07-10 22:13:52,226 ERROR [stderr] (main)   at
org.wildfly.swarm.container.runtime.RuntimeDeployer.deploy(RuntimeDeployer.java:301)
2020-07-10 22:13:52,230 ERROR [stderr] (main)   at

Re: Using SORT BY and ORDER BY

2020-07-10 Thread Surkov.Aleksandr
There is the question. It turns out that I need to collocate the table,
because the field COL will have different values?
But how can i do this?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Using SORT BY and ORDER BY

2020-07-10 Thread Surkov.Aleksandr
Thank you!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Using SORT BY and ORDER BY

2020-07-10 Thread Surkov.Aleksandr
The org.apache.ignite.internal.processors.cache.query.CacheQuery interface
has a comment:

 * {@code Group by} and {@code sort by} statements are applied
separately
 * on each node, so result set will likely be incorrectly grouped or
sorted
 * after results from multiple remote nodes are grouped together.
 
As far as I understand:
1. {@code sort by} does not supported
2. ORDER BY returns a sorted list even if items are on different nodes

It is right?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Remote Filter Execution

2020-07-10 Thread VeenaMithare
Hi Ilya, 

Yes, So filter gets created but not executed on more than one node. 'Entered
Remote Filter' happens only on one node. 'Filter created' log is printed in
the 'create()' method of the Factory. 
'Entered Remote Filter' is printed in the evaluate method of the
CacheEntryEventFilter.

That is what I was saying in my earlier post - 

You will notice that this log is only on server 1  - "projectname LISTENS:
Entered Remote Filter ."

You can also see that the filter has been created on all the three nodes. (
LOG : projectname LISTENS: Filter created  )

regards,
Veena.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: DataStreamer. allowOverwrite(false) - Will it slow the writes?

2020-07-10 Thread krkumar24061...@gmail.com
Hi - Yes, Ignite native persistence is enabled.

Thanx and Regards,
KR Kumar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite on AKS and RBAC issue

2020-07-10 Thread akorensh
Hi,
  I see that you've bound everything to the "default" namespace.
  kubectl describe serviceaccount ignite
  Name:ignite
 Namespace:   default

  Make everything in the "ignite" namespace as described here:
  https://apacheignite.readme.io/docs/rbac-authorization
  Follow there recommendations to deply on K8:
https://apacheignite.readme.io/docs/stateless-deployment

  If that doesn't work send over all your yaml files and I'll take a look.
Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite on AKS and RBAC issue

2020-07-10 Thread steve.hostettler
Hello,

I  am deploying an embeded version of ignite on AKS and I am getting this
error:
Caused by: java.io.IOException: Server returned HTTP response code: 403 for
URL:
https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/default/endpoints/processing-engine-pe-v1-ignite
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1900)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)


That sounds like a problem with the RBAC to me but I cannot nail it down.
So let me give my current configuration:

NAME  READY   STATUSRESTARTS  
AGE
processing-engine-pe-v1.master-69668fcb5b-zm7m8   1/1 Running   0 
9m6s
processing-engine-pe-v1.worker-7598949c5d-pkbfg   1/1 Running   0 
9m6s

As you can see 2 pods on the default namespace

So the configuration  is





The service is there
kubectl describe  svc processing-engine-pe-v1-ignite
Name:  processing-engine-pe-v1-ignite
Namespace: default
Labels:app.kubernetes.io/managed-by=Helm
Annotations:   meta.helm.sh/release-name: pe-v1
   meta.helm.sh/release-namespace: default
Selector:  type=processing-engine-pe-v1.node
Type:  ClusterIP
IP:None
Port:  service-discovery  47500/TCP
TargetPort:47500/TCP
Endpoints: 10.244.0.31:47500,10.244.1.28:47500
Session Affinity:  None
Events:

The service account
kubectl describe serviceaccount ignite
Name:ignite
Namespace:   default
Labels:  app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: pe-v1
 meta.helm.sh/release-namespace: default
Image pull secrets:  
Mountable secrets:   **
Tokens:  **
Events:  


The role
kubectl describe clusterrole ignite
Name: ignite
Labels:   app.kubernetes.io/managed-by=Helm
  release=pe-v1
Annotations:  meta.helm.sh/release-name: pe-v1
  meta.helm.sh/release-namespace: default
PolicyRule:
  Resources  Non-Resource URLs  Resource Names  Verbs
  -  -  --  -
  endpoints  [] []  [get list watch]
  pods   [] []  [get list watch]

The role binding
kubectl describe clusterrolebinding ignite
Name: ignite
Labels:   app.kubernetes.io/managed-by=Helm
  release=pe-v1
Annotations:  meta.helm.sh/release-name: pe-v1
  meta.helm.sh/release-namespace: default
Role:
  Kind:  ClusterRole
  Name:  ignite
Subjects:
  KindNameNamespace
  -
  ServiceAccount  ignite  default


Any idea of what I am missing?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Using SORT BY and ORDER BY

2020-07-10 Thread Ilya Kasnacheev
Hello!

1. I guess it is a honest mistake.
2. The idea here is that you can't expect that GROUP BY COL will return
anything relevant, if the table is not collocated by COL.

Regards,
-- 
Ilya Kasnacheev


пт, 10 июл. 2020 г. в 17:34, Surkov.Aleksandr :

> The org.apache.ignite.internal.processors.cache.query.CacheQuery interface
> has a comment:
>
>  * {@code Group by} and {@code sort by} statements are applied
> separately
>  * on each node, so result set will likely be incorrectly grouped
> or
> sorted
>  * after results from multiple remote nodes are grouped together.
>
> As far as I understand:
> 1. {@code sort by} does not supported
> 2. ORDER BY returns a sorted list even if items are on different nodes
>
> It is right?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: How do I know the cache rebalance is finished?

2020-07-10 Thread Vladislav Pyatkov
Hi,
I think it is not a priority issue, because in general it is right. 
EVT_CACHE_REBALANCE_STOPPED event is received when all data loaded to a node, 
but switch of affinity happens after all cache will be rebalanced.
At first, why do you need to know, when affinity change after rebalance? In my 
point of view, rebalance is a process which not influence on user load.
Another point, you can wait all caches that are rebalancing and be sure all 
data was transferred.

In log you can see messages:

Rebalancing scheduled [order=[ignite-sys-cache, ON_HEAP_CACHE], 
top=AffinityTopologyVersion [topVer=2, minorTopVer=0], rebalanceId=1, 
evt=NODE_JOINED, node=8138d15d-1606-4eb1-8359-d5637d52]

This means: ignite-sys-cache will rebalance first and ON_HEAP_CACHE after.

After all future completed

Completed rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=ignite-sys-cache] ...

Here a code receive a message about rebalance stopped on ignite-sys-cache.

Completed rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=ON_HEAP_CACHE] ...

Here rebalance stopped on ON_HEAP_CACHE.

You will see a topology switch on minor version

Started exchange init [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1]
...
Completed partition exchange [localNode=8138d15d-1606-4eb1-8359-d5637d52, 
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
[topVer=2, minorTopVer=1]...

And only after this exchange completed you can see a new primary partition in 
joined node.

It is what happens now.
I really don’t know how to change this behavior that it will more convenient to 
user. 
If you still have use case where needs to know exactly moment of switching 
affinity, could you move this discussion to developer list? 
I hope developers can help us.

On 2020/07/08 21:21:37, Humphrey  wrote: 
> Rebouncing this topic, the ticket is still open (almost 4 years). 
> Any progress / priority to this ticket or work around?
> https://issues.apache.org/jira/browse/IGNITE-3362.
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> 


Re: Remote Filter Execution

2020-07-10 Thread Ilya Kasnacheev
Hello!

I see the following lines on all 3 nodes:



2020-07-09 16:41:02,957 [disco-notifier-worker-#101] DEBUG
com.COMPANYNAME.prophet.configstore.common.remote.ConfigStoreTableRemoteFilterFactory
[] - projectname LISTENS: Filter creation started with
andPredicateMap:{NAME=(EQUAL,CENTRE)},orPredicateMap:null,
Remotefilterfactory:com.COMPANYNAME.prophet.configstore.common.remote.ConfigStoreTableRemoteFilterFactory@14bf552f,
Service:client Instance:null, HostName:null
2020-07-09 16:41:02,958 [disco-notifier-worker-#101] DEBUG
com.COMPANYNAME.prophet.configstore.common.remote.ConfigStoreTableRemoteFilterFactory
[] - projectname LISTENS: Filter created with
andPredicateMap{NAME=(EQUAL,CENTRE)},orPredicateMapnull,
Remotefilterfactory:com.COMPANYNAME.prophet.configstore.common.remote.ConfigStoreTableRemoteFilterFactory@14bf552f,


Is it relevant?


Regards,
-- 
Ilya Kasnacheev


пт, 10 июл. 2020 г. в 16:02, VeenaMithare :

> HI Ilya,
>
> Please find the attached logs.
>
> You will notice that this log is only on server 1  - "projectname LISTENS:
> Entered Remote Filter ."
>
> You can also see that the filter has been created on all the three nodes. (
> LOG : projectname LISTENS: Filter created  )
>
> server1RemoteFilterExecuted.txt
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2757/server1RemoteFilterExecuted.txt>
>
> server2.txt
> 
> server3.txt
> 
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: [External]Re: Ignite cluster became unresponsive

2020-07-10 Thread Ilya Kasnacheev
Hello!

It seems that communication connections were closed after CG pause, then
you have got half-open connections. It is recommended to keep
socketWriteTimeout and failure detection timeout in relative sync.

Default socketWriteTimeout on TcpConnectionSpi is very low while your
failure detection timeout is rather high, leading to such issue.

It is also possible that client nodes can connect to a server node but not
vice versa, leading to failure of opening connections once they are closed:

Thread [name="sys-stripe-12-#13%EDIFCustomerCC%", id=45, state=RUNNABLE,
blockCnt=851, waitCnt=27526057]
at sun.nio.ch.Net.poll(Native Method)
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
at
o.a.i.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
at
o.a.i.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
at
o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
at
o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713)
at
o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)

Regards,
-- 
Ilya Kasnacheev


пт, 10 июл. 2020 г. в 16:32, Kamlesh Joshi :

> Hi Ilya,
>
>
>
> PFA the entire node logs, which contains thread dump as well. Let us know
> if any findings.
>
>
>
> *Thanks and Regards,*
>
> *Kamlesh Joshi*
>
>
>
> *From:* Ilya Kasnacheev 
> *Sent:* 10 July 2020 17:51
> *To:* user@ignite.apache.org
> *Subject:* Re: [External]Re: Ignite cluster became unresponsive
>
>
>
> The e-mail below is from an external source. Please do not open
> attachments or click links from an unknown or suspicious origin.
>
> Hello!
>
>
>
> Can you provide full thread dump (jstack) after you see these messages?
>
>
>
> Regards,
>
> --
>
> Ilya Kasnacheev
>
>
>
>
>
> ср, 8 июл. 2020 г. в 15:57, Kamlesh Joshi :
>
> Hi Stephen/Team,
>
>
>
> Did you got any chance to look into this?
>
>
>
> *Thanks and Regards,*
>
> *Kamlesh Joshi*
>
>
>
> *From:* Kamlesh Joshi
> *Sent:* 06 July 2020 14:50
> *To:* user@ignite.apache.org
> *Subject:* RE: [External]Re: Ignite cluster became unresponsive
>
>
>
> Hi Stephen,
>
>
>
> We have started our node with below JVM parameters. Also, we have
> increased these timeouts *failureDetectionTimeout*/
> *clientFailureDetectionTimeout*/*networkTimeout to 48*.
>
>
>
> *-XX:+AggressiveOpts -XX:+AlwaysPreTouch -XX:+UseG1GC
> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
> -XX:+UnlockCommercialFeatures -Djava.net.preferIPv4Stack=true
> -DIGNITE_LONG_OPERATIONS_DUMP_TIMEOUT=60
> -DIGNITE_THREAD_DUMP_ON_EXCHANGE_TIMEOUT=true -Dfile.encoding=UTF-8
> -DIGNITE_QUIET=false*
>
>
>
> Is there anything else that we have to tune ?
>
>
>
> And I think JVM pause is introduced as a result of the error that we
> encountered right? Correct me if am wrong.
>
>
>
> *Thanks and Regards,*
>
> *Kamlesh Joshi*
>
>
>
> *From:* Stephen Darlington 
> *Sent:* 06 July 2020 14:09
> *To:* user 
> *Subject:* [External]Re: Ignite cluster became unresponsive
>
>
>
> The e-mail below is from an external source. Please do not open
> attachments or click links from an unknown or suspicious origin.
>
> There are a few issues here — the blocked thread, the communication error
> — but I possibly the key one is the JVM pause:
>
>
>
> *[2020-07-03T18:17:21,793][WARN
> ][jvm-pause-detector-worker][IgniteKernal%CustomerCC] Possible too long JVM
> pause: 10133 milliseconds.*
>
>
>
> This is usually due to garbage collection, but there are a number of other
> possibilities such as slow I/O. Suggest you start with the recommendations
> on the GC tuning documentation page:
> https://apacheignite.readme.io/docs/jvm-and-system-tuning
>
>
>
> Regards,
>
> Stephen
>
>
>
> On 4 Jul 2020, at 12:44, Kamlesh Joshi  wrote:
>
>
>
> Hi Team,
>
>
>
> We have encountered following defect in PROD environment. After which
> entire traffic got halted for around 10 minutes, we recently upgraded our
> cluster to Ignite 2.7.6 from 2.6.0.
>
> Is this related to any existing open defect in this version? Has anyone
> observed the same defect earlier ?
>
>
>
> Any help or pointers around this will be appreciated.
>
>
>
>
>
> *[2020-07-03T18:17:11,613][ERROR][sys-stripe-36-#37%CustomerCC%][G]
> Blocked system-critical thread has been detected. This can lead to
> cluster-wide undefined behaviour*
>
> *[threadName=partition-exchanger, blockedFor=480s]*
>
> *[2020-07-03T18:17:11,613][WARN ][sys-stripe-36-#37%CustomerCC%][G] Thread
> [name="exchange-worker-#344%CustomerCC%", id=391, state=TIMED_WAITING,
> blockCnt=1, waitCnt=2049782]*
>
> *Lock
> [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6bf9f3a4,
> ownerName=null, ownerId=-1]*
>
>
>
> *[2020-07-03T18:17:11,620][ERROR][sys-stripe-36-#37%CustomerCC%][]
> 

Re: Remote Filter Execution

2020-07-10 Thread VeenaMithare
HI Ilya,

Please find the attached logs. 

You will notice that this log is only on server 1  - "projectname LISTENS:
Entered Remote Filter ."

You can also see that the filter has been created on all the three nodes. (
LOG : projectname LISTENS: Filter created  )

server1RemoteFilterExecuted.txt

  
server2.txt
  
server3.txt
  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite Backup and Restore

2020-07-10 Thread Ilya Kasnacheev
Hello!

I guess that Apache Ignite 2.9 will have snapshotting capabilities.

Traditionally we rely more on backup nodes/partitions than on off-line
backups.

Regards,
-- 
Ilya Kasnacheev


ср, 8 июл. 2020 г. в 12:23, marble.zh...@coinflex.com <
marble.zh...@coinflex.com>:

> Hi, any suggestions?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: ignite 2.8.1: Failed to resolve node topology

2020-07-10 Thread Ilya Kasnacheev
Hello!

It seems that the node fell behind in switching to new topology. Can you
provide full log of this node? I guess that something was blocking PME.

Regards,
-- 
Ilya Kasnacheev


чт, 9 июл. 2020 г. в 09:44, Mahesh Renduchintala <
mahesh.renduchint...@aline-consulting.com>:

>  Hi,
>
> We have a crash in our environment with the below error.
> Any insight into what might have gone wrong?
>
> regards
> Mahesh
>
>
> ^-- Heap [used=17211MB, free=64.98%, comm=49152MB]
> ^-- Off-heap [used=48448MB, free=26.41%, comm=65736MB]
> ^--   sysMemPlc region [used=0MB, free=99.98%, comm=100MB]
> ^--   default region [used=48447MB, free=26.07%, comm=65536MB]
> ^--   metastoreMemPlc region [used=0MB, free=99.21%, comm=0MB]
> ^--   TxLog region [used=0MB, free=100%, comm=100MB]
> ^-- Ignite persistence [used=48647MB]
> ^--   sysMemPlc region [used=0MB]
> ^--   default region [used=48646MB]
> ^--   metastoreMemPlc region [used=0MB]
> ^--   TxLog region [used=0MB]
> ^-- Outbound messages queue [size=0]
> ^-- Public thread pool [active=0, idle=4, qSize=0]
> ^-- System thread pool [active=11, idle=12, qSize=0]
> [18:44:06,313][INFO][exchange-worker-#70][GridDhtPartitionsExchangeFuture]
> Finish exchange future [startVer=AffinityTopologyVersion [topVer=83,
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=83, minorTopVer=0],
> err=null, rebalanced=true, wasRebalanced=true]
> [18:44:06,316][WARNING][sys-stripe-21-#22][finish] Received finish request
> for completed transaction (the message may be too late)
> [txId=GridCacheVersion [topVer=205532475, order=1594223109470,
> nodeOrder=51], dhtTxId=null, node=35c37b55-ec30-4457-b861-403bcfc20c12,
> commit=false]
> [18:44:06,317][WARNING][sys-stripe-18-#19][finish] Received finish request
> for completed transaction (the message may be too late)
> [txId=GridCacheVersion [topVer=205532475, order=1594223109273,
> nodeOrder=51], dhtTxId=null, node=35c37b55-ec30-4457-b861-403bcfc20c12,
> commit=false]
> [18:44:06,317][WARNING][sys-stripe-23-#24][finish] Received finish request
> for completed transaction (the message may be too late)
> [txId=GridCacheVersion [topVer=205532475, order=1594223109436,
> nodeOrder=51], dhtTxId=null, node=35c37b55-ec30-4457-b861-403bcfc20c12,
> commit=false]
> [18:44:06,321][WARNING][sys-stripe-21-#22][finish] Received finish request
> for completed transaction (the message may be too late)
> [txId=GridCacheVersion [topVer=205532475, order=1594223109502,
> nodeOrder=51], dhtTxId=null, node=35c37b55-ec30-4457-b861-403bcfc20c12,
> commit=false]
> [18:44:06,327][INFO][exchange-worker-#70][GridDhtPartitionsExchangeFuture]
> Completed partition exchange
> [localNode=d38d1293-9dd5-4b4e-9934-97ba3fbafa62,
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
> [topVer=83, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
> [id=c19d4735-2b52-487f-9d6d-0574b8f15858,
> consistentId=L1APISERVICE_7aa60a92371, addrs=ArrayList [10.244.0.93,
> 127.0.0.1], sockAddrs=HashSet [/10.244.0.93:0, /127.0.0.1:0], discPort=0,
> order=68, intOrder=39, lastExchangeTime=1594131247817, loc=false,
> ver=2.8.1#20200521-sha1:86422096, isClient=true], done=true,
> newCrdFut=null], topVer=AffinityTopologyVersion [topVer=83, minorTopVer=0]]
> [18:44:06,327][INFO][exchange-worker-#70][GridDhtPartitionsExchangeFuture]
> Exchange timings [startVer=AffinityTopologyVersion [topVer=83,
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=83, minorTopVer=0],
> stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters
> initialization" (0 ms), stage="Determine exchange type" (16 ms),
> stage="Exchange done" (71053 ms), stage="Total time" (71069 ms)]
> [18:44:06,327][INFO][exchange-worker-#70][GridDhtPartitionsExchangeFuture]
> Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=83,
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=83, minorTopVer=0]]
> [18:44:06,327][INFO][exchange-worker-#70][time] Finished exchange init
> [topVer=AffinityTopologyVersion [topVer=83, minorTopVer=0], crd=false]
> [18:44:06,345][INFO][db-checkpoint-thread-#112][GridCacheDatabaseSharedManager]
> Checkpoint started [checkpointId=92718d25-2db4-4691-886c-ec26b8b6ecba,
> startPtr=FileWALPointer [idx=389, fileOff=2151853, len=17162371],
> checkpointBeforeLockTime=869ms, checkpointLockWait=86163ms,
> checkpointListenersExecuteTime=784ms, checkpointLockHoldTime=859ms,
> walCpRecordFsyncDuration=20ms, writeCheckpointEntryDuration=1ms,
> splitAndSortCpPagesDuration=24ms,  pages=28028, reason='timeout']
> [18:44:06,388][SEVERE][sys-stripe-3-#4][] Critical system error detected.
> Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=c*lass
> 

Re: [External]Re: Ignite cluster became unresponsive

2020-07-10 Thread Ilya Kasnacheev
Hello!

Can you provide full thread dump (jstack) after you see these messages?

Regards,
-- 
Ilya Kasnacheev


ср, 8 июл. 2020 г. в 15:57, Kamlesh Joshi :

> Hi Stephen/Team,
>
>
>
> Did you got any chance to look into this?
>
>
>
> *Thanks and Regards,*
>
> *Kamlesh Joshi*
>
>
>
> *From:* Kamlesh Joshi
> *Sent:* 06 July 2020 14:50
> *To:* user@ignite.apache.org
> *Subject:* RE: [External]Re: Ignite cluster became unresponsive
>
>
>
> Hi Stephen,
>
>
>
> We have started our node with below JVM parameters. Also, we have
> increased these timeouts *failureDetectionTimeout*/
> *clientFailureDetectionTimeout*/*networkTimeout to 48*.
>
>
>
> *-XX:+AggressiveOpts -XX:+AlwaysPreTouch -XX:+UseG1GC
> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
> -XX:+UnlockCommercialFeatures -Djava.net.preferIPv4Stack=true
> -DIGNITE_LONG_OPERATIONS_DUMP_TIMEOUT=60
> -DIGNITE_THREAD_DUMP_ON_EXCHANGE_TIMEOUT=true -Dfile.encoding=UTF-8
> -DIGNITE_QUIET=false*
>
>
>
> Is there anything else that we have to tune ?
>
>
>
> And I think JVM pause is introduced as a result of the error that we
> encountered right? Correct me if am wrong.
>
>
>
> *Thanks and Regards,*
>
> *Kamlesh Joshi*
>
>
>
> *From:* Stephen Darlington 
> *Sent:* 06 July 2020 14:09
> *To:* user 
> *Subject:* [External]Re: Ignite cluster became unresponsive
>
>
>
> The e-mail below is from an external source. Please do not open
> attachments or click links from an unknown or suspicious origin.
>
> There are a few issues here — the blocked thread, the communication error
> — but I possibly the key one is the JVM pause:
>
>
>
> *[2020-07-03T18:17:21,793][WARN
> ][jvm-pause-detector-worker][IgniteKernal%CustomerCC] Possible too long JVM
> pause: 10133 milliseconds.*
>
>
>
> This is usually due to garbage collection, but there are a number of other
> possibilities such as slow I/O. Suggest you start with the recommendations
> on the GC tuning documentation page:
> https://apacheignite.readme.io/docs/jvm-and-system-tuning
>
>
>
> Regards,
>
> Stephen
>
>
>
> On 4 Jul 2020, at 12:44, Kamlesh Joshi  wrote:
>
>
>
> Hi Team,
>
>
>
> We have encountered following defect in PROD environment. After which
> entire traffic got halted for around 10 minutes, we recently upgraded our
> cluster to Ignite 2.7.6 from 2.6.0.
>
> Is this related to any existing open defect in this version? Has anyone
> observed the same defect earlier ?
>
>
>
> Any help or pointers around this will be appreciated.
>
>
>
>
>
> *[2020-07-03T18:17:11,613][ERROR][sys-stripe-36-#37%CustomerCC%][G]
> Blocked system-critical thread has been detected. This can lead to
> cluster-wide undefined behaviour*
>
> *[threadName=partition-exchanger, blockedFor=480s]*
>
> *[2020-07-03T18:17:11,613][WARN ][sys-stripe-36-#37%CustomerCC%][G] Thread
> [name="exchange-worker-#344%CustomerCC%", id=391, state=TIMED_WAITING,
> blockCnt=1, waitCnt=2049782]*
>
> *Lock
> [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6bf9f3a4,
> ownerName=null, ownerId=-1]*
>
>
>
> *[2020-07-03T18:17:11,620][ERROR][sys-stripe-36-#37%CustomerCC%][]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
> [name=partition-exchanger, igniteInstanceName=CustomerCC, finished=false,
> heartbeatTs=1593780431612]]]*
>
> *org.apache.ignite.IgniteException: GridWorker [name=partition-exchanger,
> igniteInstanceName=CustomerCC, finished=false, heartbeatTs=1593780431612]*
>
> *at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
> [ignite-core-2.7.6.jar:2.7.6]*
>
> *at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
> [ignite-core-2.7.6.jar:2.7.6]*
>
> *at
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
> [ignite-core-2.7.6.jar:2.7.6]*
>
> *at
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
> [ignite-core-2.7.6.jar:2.7.6]*
>
> *at
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:513)
> [ignite-core-2.7.6.jar:2.7.6]*
>
> *at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> [ignite-core-2.7.6.jar:2.7.6]*
>
> *at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]*
>
> *[2020-07-03T18:17:11,625][WARN
> ][sys-stripe-36-#37%CustomerCC%][FailureProcessor] No deadlocked threads
> detected.*
>
> *[2020-07-03T18:17:21,790][INFO
> ][tcp-disco-sock-reader-#201%CustomerCC%][TcpDiscoverySpi] Finished serving
> remote node connection [rmtAddr=/xx.xx.xx.xx:46416, rmtPort=46416*
>
> *[2020-07-03T18:17:21,793][WARN
> ][jvm-pause-detector-worker][IgniteKernal%CustomerCC] Possible too long JVM
> 

Re: Block until partition map exchange is complete

2020-07-10 Thread Ilya Kasnacheev
Hello!

Can you throw together a reproducer project which shows this behavior? I
would check.

Regards,
-- 
Ilya Kasnacheev


пт, 3 июл. 2020 г. в 13:14, ssansoy :

> Thanks - the issue I have now is how can I confirm that the local listen
> has
> returned before executing my code?
> e.g. in the local listen I can set a flag, and then the local listen
> returns
> - but the thread that detects this flag and runs the task could still be
> scheduled to run before the local listen has returned.
> Is there a callback I can register which is triggered after the local
> listen
> returns so I can guarantee I am executing in the correct order (e.g. after
> whatever needs to be committed has been committed)?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Ignite server of perNodeParallelOperatoins ?

2020-07-10 Thread Ilya Kasnacheev
Hello!

Yes, there is Data Streamer Thread Pool Size on server, which corresponds
to perNodeParallelOperations.

Server will not run more parallel operations than thread pool size.

Regards,
-- 
Ilya Kasnacheev


вт, 7 июл. 2020 г. в 23:18, Edward Chen :

> Hello,
>
> In IgniteDataStreamer, there is a config: perNodeParallelOperatoins
> (int), it is configured in the client side. in the Server side, does it
> have similar configuration ? otherwise,  if client has freedom to set
> any number of perNodeParallelOperatoins they want,  how server prevent
> not crash ?
>
> Thanks. Ed
>
>
>


Re: enum behavior in REST

2020-07-10 Thread Ilya Kasnacheev
Hello!

Can you provide an example of REST response? I don't understand how this
issue manifests.

Regards,
-- 
Ilya Kasnacheev


ср, 8 июл. 2020 г. в 01:25, Maxim Volkomorov <2201...@gmail.com>:

> I have "type":{"platformType":false}" in REST response for my
> enum object property.
>
> Object property:
> public class Organization {
> //...
> private OrganizationType type;
> //...
>public OrganizationType type() {
> return type;
> }
> //...
> }
>
> OrganizationType :
> public enum OrganizationType {
> /** Non-profit organization. */
> NON_PROFIT,
>
> /** Private organization. */
> PRIVATE,
>
> /** Government organization. */
> GOVERNMENT
> }
>
> I have correct deserializing at log:
>
> ... type=PRIVATE ...
>
> Should I make custom deserialization for REST requests? Could I make a
> custom REST method for retrieving some fields of object?
>


Re: DataStreamer. allowOverwrite(false) - Will it slow the writes?

2020-07-10 Thread Ilya Kasnacheev
Hello!

I don't think so. It is supposed to speed it up. Do you have persistence?
With persistence, you would expect a slow-down after an initial spike.

Regards,
-- 
Ilya Kasnacheev


пт, 10 июл. 2020 г. в 15:09, krkumar24061...@gmail.com <
krkumar24061...@gmail.com>:

> Hi Guys - If I have my data streamer's allowOverwrite is set to false which
> is default, will this cause my read IOPS go up as it need to check if the
> key already exist? Bcoz the behavior that we see is that the write
> performance goes down significantly after it has inserted few billion rows
> and around the same time READ IOPS goes up. We are just curious if
> allowOverwrite(false) is causing the read IOPS go up?
>
>
> Thanx and Regards,
> KR Kumar
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


DataStreamer. allowOverwrite(false) - Will it slow the writes?

2020-07-10 Thread krkumar24061...@gmail.com
Hi Guys - If I have my data streamer's allowOverwrite is set to false which
is default, will this cause my read IOPS go up as it need to check if the
key already exist? Bcoz the behavior that we see is that the write
performance goes down significantly after it has inserted few billion rows
and around the same time READ IOPS goes up. We are just curious if
allowOverwrite(false) is causing the read IOPS go up? 


Thanx and Regards,
KR Kumar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite node top ram limit

2020-07-10 Thread Ilya Kasnacheev
Hello!

I don't think there is any, if we are talking of off-heap size.

Any problems in case of node failure are shared by persistent clusters
which have more data than RAM.

Regards,
-- 
Ilya Kasnacheev



ср, 8 июл. 2020 г. в 18:01, Maxim Volkomorov <2201...@gmail.com>:

> Is there a RAM limit for single node (hardcoded or practical)?
>
> What practical top limit is comfortable in terms of node failures?
>


Re: Request among nodes taking minimum of idleConnectionTimeout configuration in tcpcommunicationSpi

2020-07-10 Thread Ilya Kasnacheev
Hello!

Maybe you have some firewall issue here, such as only one way connections
are possible but not the other way around.

We do not recommend geo-distributed clustering with Apache Ignite.

Regards,
-- 
Ilya Kasnacheev


вт, 7 июл. 2020 г. в 06:59, trans :

> Hi, can someone please suggest on above.
> Why client is trying to use different port other than its for creating TCP
> client and failing 30 times
>
> Thanks.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Destroying a ignite cache when one of the worker nodes is down

2020-07-10 Thread Ilya Kasnacheev
Hello!

As far as I know, there is no such way yet.

Regards,
-- 
Ilya Kasnacheev


чт, 9 июл. 2020 г. в 17:22, krkumar24061...@gmail.com <
krkumar24061...@gmail.com>:

> Hi Guys - I have a five node cluster and all the nodes are part of the
> baseline. Ignite native persistence is enabled. If one of the nodes in the
> baseline is down and then we destroy a cache, it removes the cache on all
> the remaining four nodes and cache is completely destroyed. But now when I
> bring up the fifth node back into the cluster, it throws an exception:
>
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start
> SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000,
> marsh=JdkMarshaller
> [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@7d1cfe97],
> reconCnt=10, reconDelay=2000, maxAckTimeout=60, soLinger=5,
> forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null,
> skipAddrsRandomization=false]
> at
>
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:302)
> at
>
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:943)
> at
>
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1960)
> ... 14 more
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Joining node has
> caches with data which are not presented on cluster, it could mean that
> they
> were already destroyed, to add the node to cluster - remove directories
> with
> the caches[XX]
> at
>
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:1997)
> at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1116)
> at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:427)
> at
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2099)
>
>
>
> Is there any automatic way for ignite to auto delete these stale cache
> folders??
>
> Thanx and Regards,
> KR Kumar
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Remote Filter Execution

2020-07-10 Thread Ilya Kasnacheev
Hello!

This is weird, I recommend checking logs for exceptions, maybe the filter
cannot run on two other server nodes.

Regards,
-- 
Ilya Kasnacheev


пт, 10 июл. 2020 г. в 10:25, VeenaMithare :

> Hello,
>
> We have a 3 server cluster .  The Caches on this cluster are configured as
> PARTITIONED, with 1 BACKUP.
>
> We also have a client executing a simple continuous query awaiting updates
> on record where NAME=AA.
>
> 1. When this record is updated, I see the remote filter being executed only
> on one node( say : Server1 ) . Should this not be executed on atleast 2
> nodes since the Cache is configured with 1 Backup .
>
> From the documentation :
> https://apacheignite.readme.io/docs/continuous-queries#remote-filter
>
> This filter is executed on primary and backup nodes for a given key and
> evaluates whether an update should be propagated as an event to the query's
> local listener.
>
> 2. If Server1 i.e.  the server executing this remote filter is restarted (
> say for some new deployment or any other reason ), I see that the remote
> filter is not deployed on this again.
>
> And now, changes in any other record on this table ( i.e. records other
> than
> NAME=AA ) are also passed to the client - since there is no remote filter
> to
> filter this record.
>
> Please let me know what I am missing.
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Remote Filter Execution

2020-07-10 Thread VeenaMithare
Hello, 

We have a 3 server cluster .  The Caches on this cluster are configured as
PARTITIONED, with 1 BACKUP. 

We also have a client executing a simple continuous query awaiting updates
on record where NAME=AA.

1. When this record is updated, I see the remote filter being executed only
on one node( say : Server1 ) . Should this not be executed on atleast 2
nodes since the Cache is configured with 1 Backup . 

>From the documentation :
https://apacheignite.readme.io/docs/continuous-queries#remote-filter

This filter is executed on primary and backup nodes for a given key and
evaluates whether an update should be propagated as an event to the query's
local listener.

2. If Server1 i.e.  the server executing this remote filter is restarted (
say for some new deployment or any other reason ), I see that the remote
filter is not deployed on this again. 

And now, changes in any other record on this table ( i.e. records other than 
NAME=AA ) are also passed to the client - since there is no remote filter to
filter this record.

Please let me know what I am missing.






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/