Re: JDBC Connection Pooling

2020-05-01 Thread narges saleh
Hi All,
 If I use client connection configuration to set the number of threads for
a JDBC connection,  and use the connection with multiple insert statements
(with streaming set to true), to multiple different caches, are the inserts
stacked because the connection is shared? Should I create multiple JDBC
connections and switch them across the statements?
thanks.

On Thu, Apr 16, 2020 at 12:33 PM Evgenii Zhuravlev 
wrote:

> As I said, if you use only DataStreamer, without jdbc, just a plain,
> key-value IgniteDataStreamer, then, you should have only one instance per
> cache. It will give you the better performance. This one streamer can be
> used from multiple threads.
>
> чт, 16 апр. 2020 г. в 09:54, narges saleh :
>
>> I am sorry for mixing these two up.
>> I am asking if I were to use binaryobject builder with datastreamer in
>> place on jdbc connection would/should I create a pool of the streamer
>> objects. From your answers, the answer seems to be yes. thank you.
>>
>> On Thu, Apr 16, 2020 at 9:07 AM Evgenii Zhuravlev <
>> e.zhuravlev...@gmail.com> wrote:
>>
>>> You said that you use Binary Object Builder, so, I thought that you use
>>> key value API and data streamers. I don't really understand now you use
>>> BinaryObjectBuilder with thin JDBC client.
>>>
>>> >What if I have a persistent connection that sends data continuously?
>>> Should I hold on the instance of the streamer (for a particular cache), or
>>> recreate a new one once a new load of data arrives?
>>> If you're loading data continuously, it makes sense to store data
>>> streamer instance somewhere and just reuse it, avoiding recreating it each
>>> time. I
>>>
>>> >Are you saying have the data streamed to the streamer via multiple
>>> connections, across multiple threads?
>>> If you use just a simple IgniteDataStreamer, you can use it from
>>> multiple threads(use addData from multiple threads) to increase the
>>> throughput.
>>> Evgenii
>>>
>>> ср, 15 апр. 2020 г. в 12:07, narges saleh :
>>>
 Hello Evgenii,

 I am not sure what you mean by reuse a data streamer from multiple
 threads.  I have data being constantly "streamed" to the streamer via a
 connection. Are you saying have the data streamed to the streamer via
 multiple connections, across multiple threads?
 What if I have a persistent connection that sends data continuously?
 Should I hold on the instance of the streamer (for a particular cache), or
 recreate a new one once a new load of data arrives?

 On Wed, Apr 15, 2020 at 1:17 PM Evgenii Zhuravlev <
 e.zhuravlev...@gmail.com> wrote:

> > Should I create a pool of data streamers (a few for each cache)?
> If you use just KV API, it's better to have only one data streamer per
> cache and reuse it from multiple threads - it will give you the best
> performance.
>
> Evgenii
>
> ср, 15 апр. 2020 г. в 04:53, narges saleh :
>
>> Please note that in my case, the streamers are running on the server
>> side (as part of different services).
>>
>> On Wed, Apr 15, 2020 at 6:46 AM narges saleh 
>> wrote:
>>
>>> So, in effect, I'll be having a pool of streamers, right?
>>> Would this still be the case if I am using BinaryObjectBuilder to
>>> build objects to stream the data to a few caches? Should I create a 
>>> pool of
>>> data streamers (a few for each cache)?
>>> I don't want to have to create a new object builder and data
>>> streamer if I am inserting to the same cache over and over.
>>>
>>> On Tue, Apr 14, 2020 at 11:56 AM Evgenii Zhuravlev <
>>> e.zhuravlev...@gmail.com> wrote:
>>>
 For each connection, on node side will be created its own
 datastreamer. I think it makes sense to try pooling for data load, but 
 you
 will need to measure everything, since the pool size depends on the 
 lot of
 things

 вт, 14 апр. 2020 г. в 07:31, narges saleh :

> Yes, Evgenii.
>
> On Mon, Apr 13, 2020 at 10:06 PM Evgenii Zhuravlev <
> e.zhuravlev...@gmail.com> wrote:
>
>> Hi,
>>
>> Do you use STREAMING MODE for thin JDBC driver?
>>
>> Evgenii
>>
>> пн, 13 апр. 2020 г. в 19:33, narges saleh :
>>
>>> Thanks Alex. I will study the links you provided.
>>>
>>> I read somewhere that jdbc datasource is available via Ignite
>>> JDBC, (which should provide connection pooling).
>>>
>>> On Mon, Apr 13, 2020 at 12:31 PM akorensh <
>>> alexanderko...@gmail.com> wrote:
>>>
 Hi,
   At this point you need to implement connection pooling
 yourself.
   Use

 https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/ClientConnectorConfiguration.html#setThreadPoolSize-int-
>>

Re: JDBC Connection Pooling

2020-05-01 Thread narges saleh
Hi

On Thu, Apr 16, 2020 at 12:33 PM Evgenii Zhuravlev 
wrote:

> As I said, if you use only DataStreamer, without jdbc, just a plain,
> key-value IgniteDataStreamer, then, you should have only one instance per
> cache. It will give you the better performance. This one streamer can be
> used from multiple threads.
>
> чт, 16 апр. 2020 г. в 09:54, narges saleh :
>
>> I am sorry for mixing these two up.
>> I am asking if I were to use binaryobject builder with datastreamer in
>> place on jdbc connection would/should I create a pool of the streamer
>> objects. From your answers, the answer seems to be yes. thank you.
>>
>> On Thu, Apr 16, 2020 at 9:07 AM Evgenii Zhuravlev <
>> e.zhuravlev...@gmail.com> wrote:
>>
>>> You said that you use Binary Object Builder, so, I thought that you use
>>> key value API and data streamers. I don't really understand now you use
>>> BinaryObjectBuilder with thin JDBC client.
>>>
>>> >What if I have a persistent connection that sends data continuously?
>>> Should I hold on the instance of the streamer (for a particular cache), or
>>> recreate a new one once a new load of data arrives?
>>> If you're loading data continuously, it makes sense to store data
>>> streamer instance somewhere and just reuse it, avoiding recreating it each
>>> time. I
>>>
>>> >Are you saying have the data streamed to the streamer via multiple
>>> connections, across multiple threads?
>>> If you use just a simple IgniteDataStreamer, you can use it from
>>> multiple threads(use addData from multiple threads) to increase the
>>> throughput.
>>> Evgenii
>>>
>>> ср, 15 апр. 2020 г. в 12:07, narges saleh :
>>>
 Hello Evgenii,

 I am not sure what you mean by reuse a data streamer from multiple
 threads.  I have data being constantly "streamed" to the streamer via a
 connection. Are you saying have the data streamed to the streamer via
 multiple connections, across multiple threads?
 What if I have a persistent connection that sends data continuously?
 Should I hold on the instance of the streamer (for a particular cache), or
 recreate a new one once a new load of data arrives?

 On Wed, Apr 15, 2020 at 1:17 PM Evgenii Zhuravlev <
 e.zhuravlev...@gmail.com> wrote:

> > Should I create a pool of data streamers (a few for each cache)?
> If you use just KV API, it's better to have only one data streamer per
> cache and reuse it from multiple threads - it will give you the best
> performance.
>
> Evgenii
>
> ср, 15 апр. 2020 г. в 04:53, narges saleh :
>
>> Please note that in my case, the streamers are running on the server
>> side (as part of different services).
>>
>> On Wed, Apr 15, 2020 at 6:46 AM narges saleh 
>> wrote:
>>
>>> So, in effect, I'll be having a pool of streamers, right?
>>> Would this still be the case if I am using BinaryObjectBuilder to
>>> build objects to stream the data to a few caches? Should I create a 
>>> pool of
>>> data streamers (a few for each cache)?
>>> I don't want to have to create a new object builder and data
>>> streamer if I am inserting to the same cache over and over.
>>>
>>> On Tue, Apr 14, 2020 at 11:56 AM Evgenii Zhuravlev <
>>> e.zhuravlev...@gmail.com> wrote:
>>>
 For each connection, on node side will be created its own
 datastreamer. I think it makes sense to try pooling for data load, but 
 you
 will need to measure everything, since the pool size depends on the 
 lot of
 things

 вт, 14 апр. 2020 г. в 07:31, narges saleh :

> Yes, Evgenii.
>
> On Mon, Apr 13, 2020 at 10:06 PM Evgenii Zhuravlev <
> e.zhuravlev...@gmail.com> wrote:
>
>> Hi,
>>
>> Do you use STREAMING MODE for thin JDBC driver?
>>
>> Evgenii
>>
>> пн, 13 апр. 2020 г. в 19:33, narges saleh :
>>
>>> Thanks Alex. I will study the links you provided.
>>>
>>> I read somewhere that jdbc datasource is available via Ignite
>>> JDBC, (which should provide connection pooling).
>>>
>>> On Mon, Apr 13, 2020 at 12:31 PM akorensh <
>>> alexanderko...@gmail.com> wrote:
>>>
 Hi,
   At this point you need to implement connection pooling
 yourself.
   Use

 https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/ClientConnectorConfiguration.html#setThreadPoolSize-int-
   to specify number of threads Ignite creates to service
 connection
 requests.

   Each new connection will be handled by a separate thread
 inside
 Ignite(maxed out a threadPoolSize - as described above)

   ClientConnectorConfiguration is set inside
>>

Re: Random2LruPageEvictionTracker causing hanging in our integration tests

2020-05-01 Thread scottmf
out.multipart-aa
  
out.multipart-ab
  
out.multipart-ac
  
out.multipart-ad
  
out.multipart-ae
  
out.multipart-af
  
out.multipart-ag
  

hi Ilya, I turned on debugging for ignite and dumped the output into a
multipart set of files that i've attached.  Let me know if you need anymore
info.  If needed I can try to reproduce this in a generic setting but that
will take time.

since 5MB is the limit, i had to upload the files in 5MB chunks.  To
assemble them, put them into a directory then run 'cat * > file.gz'

Answers to your questions:

> Do you see any "Too many attempts to choose data page" or "Too many failed
> attempts to evict page" messages in your logs?

See output file

> How large are your data regions

we only use the default data region with default settings - 512MB

> how many caches do they have?

maybe 20ish?

> I would expect that behavior if eviction can't find any page to evict, if
> all data pages are evicted already and only metadata pages remain, ones
> that cannot be evicted.

Could you elaborate on this or point me to any docs?






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Apache ignite evolvable object

2020-05-01 Thread Evgenii Zhuravlev
Hi,

BinaryObjects allow doing this:
https://apacheignite.readme.io/docs/binary-marshaller

Evgenii

пт, 1 мая 2020 г. в 09:03, Hemambara :

> I am using apache ignite 2.8.0. Looking for an option to have evolvable
> data
> model similar to coherence (com.tangosol.io.Evolvable). Do we have any ?
> Idea is to save future data if the domain model version is backward
> compatible and when the same model is transferred to new version, we can
> retrieve fields from future data, with out any data loss
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Apache ignite evolvable object

2020-05-01 Thread Hemambara
I am using apache ignite 2.8.0. Looking for an option to have evolvable data
model similar to coherence (com.tangosol.io.Evolvable). Do we have any ?
Idea is to save future data if the domain model version is backward
compatible and when the same model is transferred to new version, we can
retrieve fields from future data, with out any data loss



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Event Listners when we use DataStreamer

2020-05-01 Thread Ilya Kasnacheev
Hello!

We have a test for this in our code base,
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessorSelfTest#testFlushTimeout
It passes in 2.8.

Do you have a reproducer where you don't see EVT_CACHE_OBJECT_PUT for cache
populated by data streamer with allowOverride(true)? Can you share it as a
small runnable project?
Are you sure you install your listener correctly?

Regards,
-- 
Ilya Kasnacheev


пт, 1 мая 2020 г. в 15:06, krkumar24061...@gmail.com <
krkumar24061...@gmail.com>:

> Hi Ilya - I have tried that but not firing the event. But it does fire for
> put and putAll
>
> Thanx and Regards,
> KR Kumar
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2020-05-01 Thread userx
Hi Pavel,

The exchange finished taking its time, but during that time, new client was
not able to write to the cache.

So what happened was that

There were 4 Ignite servers out of a bunch of 19 (as you can see in the
consistentids) in my message above,  that their acknowledgement to
Coordinator node was pending because they possibly were finishing some
Atomic updates or transactions. This almost went for 2 hours. During those 2
hours, clients tried to activate 

if (ignite == null) {
Ignition.setClientMode(true);
String fileName = getRelevantFileName();
ignite = Ignition.start(fileName);
  }
  ignite.cluster().active(true);

But the activation couldnt happen. For this task we have a timeout of 5
minutes. If this doesnt happen client gives up unless the next time it needs
to create a cache.

So when i talk about clients, say they are just individual java processes
running.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2020-05-01 Thread Pavel Kovalenko
Hello,

I don't clearly understand from your message, but have the exchange finally
finished? Or you were getting this WARN message all the time?

пт, 1 мая 2020 г. в 12:32, Ilya Kasnacheev :

> Hello!
>
> This description sounds like a typical hanging Partition Map Exchange, but
> you should be able to see that in logs.
> If you don't, you can collect thread dumps from all nodes with jstack and
> check it for any stalling operations (or share with us).
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 1 мая 2020 г. в 11:53, userx :
>
>> Hi Pavel,
>>
>> I am using 2.8 and still getting the same issue. Here is the ecosystem
>>
>> 19 Ignite servers (S1 to S19) running at 16GB of max JVM and in persistent
>> mode.
>>
>> 96 Clients (C1 to C96)
>>
>> There are 19 machines, 1 Ignite server is started on 1 machine. The
>> clients
>> are evenly distributed across machines.
>>
>> C19 tries to create a cache, it gets a timeout exception as i have 5 mins
>> of
>> timeout. When I looked into the coordinator logs, between a span of 5
>> minutes, it gets the messages
>>
>>
>> 2020-04-24 15:37:09,434 WARN [exchange-worker-#45%S1%] {}
>>
>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture
>> - Unable to await partitions release latch within timeout. Some nodes have
>> not sent acknowledgement for latch completion. It's possible due to
>> unfinishined atomic updates, transactions or not released explicit locks
>> on
>> that nodes. Please check logs for errors on nodes with ids reported in
>> latch
>> `pendingAcks` collection [latch=ServerLatch [permits=4,
>> pendingAcks=HashSet
>> [84b8416c-fa06-4544-9ce0-e3dfba41038a,
>> 19bd7744-0ced-4123-a35f-ddf0cf9f55c4,
>> 533af8f9-c0f6-44b6-92d4-658f86ffaca0,
>> 1b31cb25-abbc-4864-88a3-5a4df37a0cf4],
>> super=CompletableLatch [id=CompletableLatchUid [id=exchange,
>> topVer=AffinityTopologyVersion [topVer=174, minorTopVer=1]
>>
>> And the 4 nodes which have not been able to acknowledge latch completion
>> are
>> S14, S7, S18, S4
>>
>> I went to see the logs of S4, it just records the addition of C19 into
>> topology and then C19 leaving it after 5 minutes. The only thing is that
>> in
>> GC I see this consistently "Total time for which application threads were
>> stopped: 0.0006225 seconds, Stopping threads took: 0.887 seconds"
>>
>> I understand that until the time all the atomic updates and transactions
>> are
>> finished Clients are not able to create caches by communicating with
>> Coordinator but is there a way around ?
>>
>> So the question is that is it still prevalent on 2.8 ?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>


Re: Event Listners when we use DataStreamer

2020-05-01 Thread krkumar24061...@gmail.com
Hi Ilya - I have tried that but not firing the event. But it does fire for
put and putAll

Thanx and Regards,
KR Kumar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Random2LruPageEvictionTracker causing hanging in our integration tests

2020-05-01 Thread Ilya Kasnacheev
Hello!

Yes, you are right, I happened to miss the relevant portion.

Do you see any "Too many attempts to choose data page" or "Too many failed
attempts to evict page" messages in your logs?

How large are your data regions, how many caches do they have? I would
expect that behavior if eviction can't find any page to evict, if all data
pages are evicted already and only metadata pages remain, ones that cannot
be evicted.

Regards,
-- 
Ilya Kasnacheev


чт, 30 апр. 2020 г. в 22:14, scottmf :

> Hi Ilya, I'm confused. What do you see? I am posting a stack that ends
> with several apache ignite calls.
>
> "Test worker" #22 prio=5 os_prio=31 cpu=299703.41ms elapsed=317.18s 
> tid=0x7ff3cfc8c800 nid=0x7203 runnable  [0x75b38000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.ignite.internal.processors.cache.persistence.evict.Random2LruPageEvictionTracker.evictDataPage(Random2LruPageEvictionTracker.java:152)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager.ensureFreeSpace(IgniteCacheDatabaseSharedManager.java:1086)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.ensureFreeSpace(GridCacheMapEntry.java:4513)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1461)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:745)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.localFinish(GridNearTxLocal.java:3850)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.doFinish(GridNearTxFinishFuture.java:440)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:390)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$25.apply(GridNearTxLocal.java:4129)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$25.apply(GridNearTxLocal.java:4118)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:399)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:354)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.commitNearTxLocalAsync(GridNearTxLocal.java:4118)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.commit(GridNearTxLocal.java:4086)
>   at 
> org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$4.applyx(DataStructuresProcessor.java:587)
>   at 
> org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$4.applyx(DataStructuresProcessor.java:556)
>   at 
> org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.retryTopologySafe(DataStructuresProcessor.java:1664)
>   at 
> org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.getAtomic(DataStructuresProcessor.java:556)
>   at 
> org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.reentrantLock(DataStructuresProcessor.java:1361)
>   at 
> org.apache.ignite.internal.IgniteKernal.reentrantLock(IgniteKernal.java:4136)
>   at jdk.internal.reflect.GeneratedMethodAccessor713.invoke(Unknown 
> Source)
>   at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.5/DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(java.base@11.0.5/Method.java:566)
>   at 
> com.example.symphony.cmf.ignite.IgniteInitializer$1.invoke(IgniteInitializer.java:158)
>   at com.sun.proxy.$Proxy205.reentrantLock(Unknown Source)
>   at 
> com.example.data.store.jdbc.cache.CacheService.getCount(CacheService.java:47)
>   at 
> com.example.data.store.jdbc.cache.CacheService$$FastClassBySpringCGLIB$$7efa9131.invoke()
>   at 
> org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
>   at 
> org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:771)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
>   at 
> org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
>   at 
> org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:88)
>   at 
> io.micrometer.core.aop.TimedAspect.processWithTimer(TimedAspect.java:105)
>   at io.micrometer.core.aop.TimedAspect.timedMethod(TimedAspect.java:94)
>   at jdk.internal.reflect.GeneratedMethodAccessor712.invoke(Unknown 
> Source)
>   at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.5/DelegatingMethodAccesso

Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2020-05-01 Thread Ilya Kasnacheev
Hello!

This description sounds like a typical hanging Partition Map Exchange, but
you should be able to see that in logs.
If you don't, you can collect thread dumps from all nodes with jstack and
check it for any stalling operations (or share with us).

Regards,
-- 
Ilya Kasnacheev


пт, 1 мая 2020 г. в 11:53, userx :

> Hi Pavel,
>
> I am using 2.8 and still getting the same issue. Here is the ecosystem
>
> 19 Ignite servers (S1 to S19) running at 16GB of max JVM and in persistent
> mode.
>
> 96 Clients (C1 to C96)
>
> There are 19 machines, 1 Ignite server is started on 1 machine. The clients
> are evenly distributed across machines.
>
> C19 tries to create a cache, it gets a timeout exception as i have 5 mins
> of
> timeout. When I looked into the coordinator logs, between a span of 5
> minutes, it gets the messages
>
>
> 2020-04-24 15:37:09,434 WARN [exchange-worker-#45%S1%] {}
>
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture
> - Unable to await partitions release latch within timeout. Some nodes have
> not sent acknowledgement for latch completion. It's possible due to
> unfinishined atomic updates, transactions or not released explicit locks on
> that nodes. Please check logs for errors on nodes with ids reported in
> latch
> `pendingAcks` collection [latch=ServerLatch [permits=4, pendingAcks=HashSet
> [84b8416c-fa06-4544-9ce0-e3dfba41038a,
> 19bd7744-0ced-4123-a35f-ddf0cf9f55c4,
> 533af8f9-c0f6-44b6-92d4-658f86ffaca0,
> 1b31cb25-abbc-4864-88a3-5a4df37a0cf4],
> super=CompletableLatch [id=CompletableLatchUid [id=exchange,
> topVer=AffinityTopologyVersion [topVer=174, minorTopVer=1]
>
> And the 4 nodes which have not been able to acknowledge latch completion
> are
> S14, S7, S18, S4
>
> I went to see the logs of S4, it just records the addition of C19 into
> topology and then C19 leaving it after 5 minutes. The only thing is that in
> GC I see this consistently "Total time for which application threads were
> stopped: 0.0006225 seconds, Stopping threads took: 0.887 seconds"
>
> I understand that until the time all the atomic updates and transactions
> are
> finished Clients are not able to create caches by communicating with
> Coordinator but is there a way around ?
>
> So the question is that is it still prevalent on 2.8 ?
>
>
>
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Re: Backups not being done for SQL caches

2020-05-01 Thread Courtney Robinson
Hi Alexandr,
Thanks, I didn't know the metrics and topology info was different.
I found the issue, we were not adding the nodes to the baseline topology
due to a bug.
We're using v2.8.0 in the upgrade/migration, previous version is 2.7

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) 
https://hypi.io


On Thu, Apr 30, 2020 at 11:36 PM Alexandr Shapkin  wrote:

> Hi,
>
>
>
> I believe that you need to find the following message:
>
>
>
> 2020-04-30 16:57:44.8141|INFO|Test|Topology snapshot [ver=53,
> locNode=26887aac, servers=3, clients=0, state=ACTIVE, CPUs=16,
> offheap=8.3GB, heap=15.0GB]
>
> 2020-04-30 16:57:44.8141|INFO|Test|  ^-- Baseline [id=0, size=3, online=3,
> offline=0]
>
>
>
> Metrics don’t tell you l the actual topology and baseline snapshot.
>
> A node might be running, but not included into the baseline, that might be
> the reason in your case.
>
>
>
> Also, what Ignite version do you use?
>
>
>
> *From: *Courtney Robinson 
> *Sent: *Thursday, April 30, 2020 7:58 PM
> *To: *user@ignite.apache.org
> *Subject: *Re: Backups not being done for SQL caches
>
>
>
> Hi Illya,
>
> Yes we have persistence enabled in this cluster. This is also change from
> our current production deployment where we have our own CacheStore with
> read and write through enabled. In this test cluster Ignite's native
> persistence is being used without any external or custom CacheStore
> implementation.
>
>
>
> From the Ignite logs it says all 3 nodes are present:
>
>
>
> 2020-04-30 16:53:20.468  INFO 9 --- [orker-#23%hypi%]
> o.a.ignite.internal.IgniteKernal%hypi:
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
> ^-- Node [id=e0b6889f, name=hypi, uptime=19:15:06.473]
> ^-- H/N/C [hosts=3, nodes=3, CPUs=3]
> ^-- CPU [cur=-100%, avg=-100%, GC=0%]
> ^-- PageMemory [pages=975]
> ^-- Heap [used=781MB, free=92.37%, comm=4912MB]
> ^-- Off-heap [used=3MB, free=99.91%, comm=4296MB]
> ^--   sysMemPlc region [used=0MB, free=99.98%, comm=100MB]
> ^--   metastoreMemPlc region [used=0MB, free=99.95%, comm=0MB]
> ^--   TxLog region [used=0MB, free=100%, comm=100MB]
> ^--   hypi region [used=3MB, free=99.91%, comm=4096MB]
> ^-- Ignite persistence [used=3MB]
> ^--   sysMemPlc region [used=0MB]
> ^--   metastoreMemPlc region [used=0MB]
> ^--   TxLog region [used=0MB]
> ^--   hypi region [used=3MB]
> ^-- Outbound messages queue [size=0]
> ^-- Public thread pool [active=0, idle=0, qSize=0]
> ^-- System thread pool [active=0, idle=6, qSize=0]
>
>
> Regards,
>
> Courtney Robinson
>
> Founder and CEO, Hypi
>
> Tel: ++44 208 123 2413 (GMT+0) 
>
> https://hypi.io
>
>
>
>
>
> On Thu, Apr 30, 2020 at 3:12 PM Ilya Kasnacheev 
> wrote:
>
> Hello!
>
>
>
> Do you have persistence? If so, are you sure that all 3 of your nodes are
> in baseline topology?
>
>
>
> Regards,
>
> --
>
> Ilya Kasnacheev
>
>
>
>
>
> чт, 30 апр. 2020 г. в 16:09, Courtney Robinson  >:
>
> We're continuing migration from using the Java API to purley SQL and have
> encountered a situation on our development cluster where even though ALL
> tables are created with backups=2, as in
>
> template=partitioned,backups=2,affinity_key=instanceId,atomicity=ATOMIC,cache_name=  name here>
>
> In the logs, with 3 nodes in this test environment we have:
>
>
>
> 2020-04-29 22:55:50.083 INFO 9
> *--- [orker-#40%hypi%] o.apache.ignite.internal.exchange.time : Started
> exchange init [topVer=AffinityTopologyVersion [topVer=27, minorTopVer=1],
> crd=true, evt=DISCOVERY_CUSTOM_EVT,
> evtNode=e0b6889f-219b-4686-ab52-725bfe7848b2,
> customEvt=DynamicCacheChangeBatch
> [id=a81a0e7c171-3f0fbbc0-b996-448c-98f7-119d7e485f04, reqs=ArrayList
> [DynamicCacheChangeRequest [cacheName=hypi_whatsapp_Item, hasCfg=true,
> nodeId=e0b6889f-219b-4686-ab52-725bfe7848b2, clientStartOnly=false,
> stop=false, destroy=false, disabledAfterStartfalse]],
> exchangeActions=ExchangeActions [startCaches=[hypi_whatsapp_Item],
> stopCaches=null, startGrps=[hypi_whatsapp_Item], stopGrps=[],
> resetParts=null, stateChangeRequest=null], startCaches=false],
> allowMerge=false, exchangeFreeSwitch=false]*2020-04-29 22:55:50.280 INFO 9
>
> *--- [orker-#40%hypi%] o.a.i.i.p.cache.GridCacheProcessor : Started cache
> [name=hypi_whatsapp_Item, id=1391701259, dataRegionName=hypi,
> mode=PARTITIONED, atomicity=ATOMIC, backups=2, mvcc=false]*2020-04-29 22:
> 55:50.289 INFO 9
> *--- [ sys-#648%hypi%] o.a.i.i.p.a.GridAffinityAssignmentCache : Local
> node affinity assignment distribution is not ideal
> [cache=hypi_whatsapp_Item, expectedPrimary=1024.00, actualPrimary=0,
> expectedBackups=2048.00, actualBackups=0, warningThreshold=50.00%]*2020-04
> -29 22:55:50.293 INFO 9
> *--- [orker-#40%hypi%] .c.d.d.p.GridDhtPartitionsExchangeFuture : Finished
> waiting for partition release future [topVer=AffinityTopologyVersion
> [topVer=27, minorTopVer=1], waitTime=0ms, futInfo=NA, mode=DISTRIBUTED

Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2020-05-01 Thread userx
Hi Pavel,

I am using 2.8 and still getting the same issue. Here is the ecosystem 

19 Ignite servers (S1 to S19) running at 16GB of max JVM and in persistent
mode.

96 Clients (C1 to C96)

There are 19 machines, 1 Ignite server is started on 1 machine. The clients
are evenly distributed across machines.

C19 tries to create a cache, it gets a timeout exception as i have 5 mins of
timeout. When I looked into the coordinator logs, between a span of 5
minutes, it gets the messages 


2020-04-24 15:37:09,434 WARN [exchange-worker-#45%S1%] {}
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture
- Unable to await partitions release latch within timeout. Some nodes have
not sent acknowledgement for latch completion. It's possible due to
unfinishined atomic updates, transactions or not released explicit locks on
that nodes. Please check logs for errors on nodes with ids reported in latch
`pendingAcks` collection [latch=ServerLatch [permits=4, pendingAcks=HashSet
[84b8416c-fa06-4544-9ce0-e3dfba41038a, 19bd7744-0ced-4123-a35f-ddf0cf9f55c4,
533af8f9-c0f6-44b6-92d4-658f86ffaca0, 1b31cb25-abbc-4864-88a3-5a4df37a0cf4],
super=CompletableLatch [id=CompletableLatchUid [id=exchange,
topVer=AffinityTopologyVersion [topVer=174, minorTopVer=1]

And the 4 nodes which have not been able to acknowledge latch completion are
S14, S7, S18, S4

I went to see the logs of S4, it just records the addition of C19 into
topology and then C19 leaving it after 5 minutes. The only thing is that in
GC I see this consistently "Total time for which application threads were
stopped: 0.0006225 seconds, Stopping threads took: 0.887 seconds"

I understand that until the time all the atomic updates and transactions are
finished Clients are not able to create caches by communicating with
Coordinator but is there a way around ?

So the question is that is it still prevalent on 2.8 ?









--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/