[jira] [Created] (IGNITE-13304) Improve javadocs for classes related to cache configuration enrichment
Vyacheslav Koptilin created IGNITE-13304: Summary: Improve javadocs for classes related to cache configuration enrichment Key: IGNITE-13304 URL: https://issues.apache.org/jira/browse/IGNITE-13304 Project: Ignite Issue Type: Improvement Affects Versions: 2.8.1 Reporter: Vyacheslav Koptilin Assignee: Vyacheslav Koptilin In my opinion, some classes related to cache configuration enrichment need to be -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13303) Update mockito from 1.10.19 to 3.x
Alexey Kuznetsov created IGNITE-13303: - Summary: Update mockito from 1.10.19 to 3.x Key: IGNITE-13303 URL: https://issues.apache.org/jira/browse/IGNITE-13303 Project: Ignite Issue Type: Task Components: general Affects Versions: 2.9 Reporter: Alexey Kuznetsov Assignee: Alexey Kuznetsov Fix For: 2.10 We are using very old version of Mockito: 1.10.19 {panel:title=According to Mockito GitHub:} Still on Mockito 1.x? See what's new in Mockito 2! Mockito 3 does not introduce any breaking API changes, but now requires Java 8 over Java 6 for Mockito 2.{panel} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSSION] Cache warmup
Hello Kirill, Thanks a lot for driving this activity. If I am not mistaken, this discussion relates to IEP-40. > I suggest adding a warmup phase after recovery here [1] after [2], before discovery. This means that the user's thread, which starts Ignite via Ignition.start(), will wait for ana additional step - cache warm-up. I think this fact has to be clearly mentioned in our documentation (at Javadocat least) because this step can be time-consuming. > I suggest adding a new interface: I would change it a bit. First of all, it would be nice to place this interface to a public package and get rid of using GridCacheContext, which is an internal class and it should not leak to the public API in any case. Perhaps, this parameter is not needed at all or we should add some public abstraction instead of internal class. package org.apache.ignite.configuration; import org.apache.ignite.IgniteCheckedException; import org.apache.ignite.lang.IgniteFuture; public interface CacheWarmupper { /** * Warmup cache. * * @param cachename Cache name. * @return Future cache warmup. * @throws IgniteCheckedException If failed. */ IgniteFuture warmup(String cachename) throws IgniteCheckedException; } Thanks, S. пн, 27 июл. 2020 г. в 15:03, ткаленко кирилл : > Now, after restarting node, we have only cold caches, which at first > requests to them will gradually load data from disks, which can slow down > first calls to them. > If node has more RAM than data on disk, then they can be loaded at start > "warmup", thereby solving the issue of slowdowns during first calls to > caches. > > I suggest adding a warmup phase after recovery here [1] after [2], before > descovery. > > I suggest adding a new interface: > > package org.apache.ignite.internal.processors.cache; > > import org.apache.ignite.IgniteCheckedException; > import org.apache.ignite.internal.IgniteInternalFuture; > import org.jetbrains.annotations.Nullable; > > /** > * Interface for warming up cache. > */ > public interface CacheWarmup { > /** > * Warmup cache. > * > * @param cacheCtx Cache context. > * @return Future cache warmup. > * @throws IgniteCheckedException if failed. > */ > @Nullable IgniteInternalFuture process(GridCacheContext cacheCtx) > throws IgniteCheckedException; > } > > Which will allow to warm up caches in parallel and asynchronously. Warmup > phase will end after all IgniteInternalFuture for all caches isDone. > > Also adding the ability to customize via methods: > org.apache.ignite.configuration.IgniteConfiguration#setDefaultCacheWarmup > org.apache.ignite.configuration.CacheConfiguration#setCacheWarmup > > Which will allow for each cache to set implementation of cache warming up, > both for a specific cache, and for all if necessary. > > I suggest adding an implementation of SequentialWarmup that will use [3]. > > Questions, suggestions, comments? > > [1] - > org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#afterLogicalUpdatesApplied > [2] - > org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#restorePartitionStates > [3] - > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManager.CacheDataStore#preload >
[jira] [Created] (IGNITE-13302) Java thin client connect/disconnect during topology update may lead to partition divergence in ignite-sys-cache
Stepachev Maksim created IGNITE-13302: - Summary: Java thin client connect/disconnect during topology update may lead to partition divergence in ignite-sys-cache Key: IGNITE-13302 URL: https://issues.apache.org/jira/browse/IGNITE-13302 Project: Ignite Issue Type: Bug Reporter: Stepachev Maksim Assignee: Stepachev Maksim And you can see partition inconsistency in ignite-sys-cache {noformat} [2020-04-23 15:26:31,816][WARN ][sys-#45%gridgain.Sdsb11784Ver20%][root] Partition states validation has failed for group: ignite-sys-cache, msg: Partitions cache sizes are inconsistent for Part 31: [127.0.0.1:47500=1 127.0.0.1:47501=2 ] Part 43: [127.0.0.1:47500=3 127.0.0.1:47501=4 ] Part 44: [127.0.0.1:47500=1 127.0.0.1:47501=2 ] Part 46: [127.0.0.1:47500=0 127.0.0.1:47501=1 ] Part 91: [127.0.0.1:47500=1 127.0.0.1:47501=2 ] {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13301) IgniteScheduler has to run inside the Ignite Sandbox.
Denis Garus created IGNITE-13301: Summary: IgniteScheduler has to run inside the Ignite Sandbox. Key: IGNITE-13301 URL: https://issues.apache.org/jira/browse/IGNITE-13301 Project: Ignite Issue Type: Task Components: security Reporter: Denis Garus Assignee: Denis Garus IgniteScheduler has to run inside the Ignite Sandbox on a remote node. For example: {code:java} Ignition.localIgnite().compute().run(() -> { IgniteScheduler scheduler = Ignition.localIgnite().scheduler(); scheduler.runLocal(AbstractSandboxTest::controlAction).get(); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSSION] Cache warmup
Now, after restarting node, we have only cold caches, which at first requests to them will gradually load data from disks, which can slow down first calls to them. If node has more RAM than data on disk, then they can be loaded at start "warmup", thereby solving the issue of slowdowns during first calls to caches. I suggest adding a warmup phase after recovery here [1] after [2], before descovery. I suggest adding a new interface: package org.apache.ignite.internal.processors.cache; import org.apache.ignite.IgniteCheckedException; import org.apache.ignite.internal.IgniteInternalFuture; import org.jetbrains.annotations.Nullable; /** * Interface for warming up cache. */ public interface CacheWarmup { /** * Warmup cache. * * @param cacheCtx Cache context. * @return Future cache warmup. * @throws IgniteCheckedException if failed. */ @Nullable IgniteInternalFuture process(GridCacheContext cacheCtx) throws IgniteCheckedException; } Which will allow to warm up caches in parallel and asynchronously. Warmup phase will end after all IgniteInternalFuture for all caches isDone. Also adding the ability to customize via methods: org.apache.ignite.configuration.IgniteConfiguration#setDefaultCacheWarmup org.apache.ignite.configuration.CacheConfiguration#setCacheWarmup Which will allow for each cache to set implementation of cache warming up, both for a specific cache, and for all if necessary. I suggest adding an implementation of SequentialWarmup that will use [3]. Questions, suggestions, comments? [1] - org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#afterLogicalUpdatesApplied [2] - org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#restorePartitionStates [3] - org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManager.CacheDataStore#preload
[jira] [Created] (IGNITE-13300) Ignite sandbox vulnerability allows to execute user code in privileged proxy
Aleksey Plekhanov created IGNITE-13300: -- Summary: Ignite sandbox vulnerability allows to execute user code in privileged proxy Key: IGNITE-13300 URL: https://issues.apache.org/jira/browse/IGNITE-13300 Project: Ignite Issue Type: Bug Components: security Affects Versions: 2.9 Reporter: Aleksey Plekhanov Assignee: Aleksey Plekhanov Ignite sandbox returns a privileged proxy for Ignite and some other system interfaces. If the user implements one of these interfaces and gets via privileged proxy an instance of implemented class, privileged proxy for user class will be returned. Reproducer: {code:java} public void testPrivelegedUserObject() throws Exception { grid(CLNT_FORBIDDEN_WRITE_PROP).getOrCreateCache(DEFAULT_CACHE_NAME).put(0, new TestIterator<>()); runForbiddenOperation(() -> grid(CLNT_FORBIDDEN_WRITE_PROP).compute().run(() -> { GridIterator it = (GridIterator)Ignition.localIgnite().cache(DEFAULT_CACHE_NAME).get(0); it.iterator(); }), AccessControlException.class); } public static class TestIterator extends GridIterableAdapter { public TestIterator() { super(Collections.emptyIterator()); } @Override public GridIterator iterator() { controlAction(); return super.iterator(); } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13299) Flacky GridServiceDeployClusterReadOnlyModeTest and all including tests.
Stanilovsky Evgeny created IGNITE-13299: --- Summary: Flacky GridServiceDeployClusterReadOnlyModeTest and all including tests. Key: IGNITE-13299 URL: https://issues.apache.org/jira/browse/IGNITE-13299 Project: Ignite Issue Type: Bug Components: managed services Affects Versions: 2.8.1 Reporter: Stanilovsky Evgeny Assignee: Stanilovsky Evgeny Starting GridServiceDeployClusterReadOnlyModeTest#testDeployClusterSingletonAllowed until failure and catch assertion on 200 iteration. This failure is due incorrect assumption that if service already deployed org.apache.ignite.services.Service#execute would be called before returning from deployment would be happened. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Extended logging for rebalance performance analysis
Discussed in personal correspondence with Stas, decided to improve the message: Completed rebalancing [grp=grp0, supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, partitions=2, entries=60, duration=8ms, bytesRcvd=5,9 KB, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], progress=1/3, rebalanceId=1] into: Completed rebalancing [grp=grp0, supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, partitions=2, entries=60, duration=8ms, bytesRcvd=5,9 KB, avgSpeed=5,9 KB/sec, histPartitions=1, histEntries=30, histBytesRcvd=1 KB, fullPartitions=1, fullEntries=30, fullBytesRcvd=3 KB topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], progress=1/3, rebalanceId=1] Where: partitions=2 - total number of partitions received entries=60 - total number of entries received duration=8ms - duration from first demand of message to output of message to log bytesRcvd=5,9 KB - total number of bytes received in B,KB,MB,GB avgSpeed= bytesRcvd/duration in KB/sec histPartitions=1 - total number of partitions received by historical mode histEntries=30 - total number of entries received by historical mode histBytesRcvd=1 KB - total number of bytes received in B,KB,MB,GB by historical mode fullPartitions=1 - total number of partitions received by full mode fullEntries=30 - total number of entries received by full mode fullBytesRcvd=3 KB - total number of bytes received in B,KB,MB,GB by full mode 27.07.2020, 11:50, "ткаленко кирилл" : > Discussed in personal correspondence with Stas, decided to improve the > message: > Completed rebalancing [grp=grp0, > supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, > partitions=2, entries=60, duration=8ms, bytesRcvd=5,9 KB, > topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], progress=1/3, > rebalanceId=1] > > into: > Completed rebalancing [grp=grp0, > supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, > partitions=2, entries=60, duration=8ms, bytesRcvd=5,9 KB, avgSpeed=5,9 > KB/sec, > histPartitions=1, histEntries=30, histBytesRcvd=1 KB, > fullPartitions=1, fullEntries=30, fullBytesRcvd=3 KB > topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], progress=1/3, > rebalanceId=1] > > Where: > partitions=2 - total number of partitions received > entries=60 - total number of entries received > duration=8ms - duration from first demand of message to output of message to > log > bytesRcvd=5,9 KB - total number of bytes received in B,KB,MB,GB > > avgSpeed= bytesRcvd/duration in KB/sec > > histPartitions=1 - total number of partitions received by historical mode > histEntries=30 - total number of entries received by historical mode > histBytesRcvd=1 KB - total number of bytes received in B,KB,MB,GB by > historical mode > > fullPartitions=1 - total number of partitions received by full mode > fullEntries=30 - total number of entries received by full mode > fullBytesRcvd=3 KB - total number of bytes received in B,KB,MB,GB by full mode > > 03.07.2020, 17:21, "ткаленко кирилл" : >> Sorry, forget. >> >> [1] - >> org.apache.ignite.internal.processors.cache.CacheGroupsMetricsRebalanceTest#testCacheGroupRebalance >> >> 03.07.2020, 17:20, "ткаленко кирилл" : >>> Hi, Stan! >>> >>> I don't understand you yet. >>> >>> Now you can use metrics as it was done in the test [1]. Or can you tell me >>> where to do this, for example when completing rebalancing for all groups? >>> >>> See what is now available and added in the logs: >>> 1)Which group is rebalanced and which type of rebalance. >>> Starting rebalance routine [grp0, topVer=AffinityTopologyVersion >>> [topVer=4, minorTopVer=0], supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, >>> fullPartitions=[4, 7], histPartitions=[], rebalanceId=1] >>> >>> 2) Completion of rebalancing from one of the suppliers. >>> Completed rebalancing [grp=grp0, >>> supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, partitions=2, entries=60, >>> duration=8ms, bytesRcvd=5,9 KB, topVer=AffinityTopologyVersion [topVer=4, >>> minorTopVer=0], progress=1/3, rebalanceId=1] >>> >>> 3) Completion of the entire rebalance. >>> Completed rebalance chain: [rebalanceId=1, partitions=116, entries=400, >>> duration=41ms, bytesRcvd=40,4 KB] >>> >>> These messages have a common parameter rebalanceId=1. >>> >>> 03.07.2020, 16:48, "Stanislav Lukyanov" : > On 3 Jul 2020, at 09:51, ткаленко кирилл wrote: > > To calculate the average value, you can use the existing metrics > "RebalancingStartTime", "RebalancingLastCancelledTime", > "RebalancingEndTime", "RebalancingPartitionsLeft", > "RebalancingReceivedKeys" and "RebalancingReceivedBytes". You can calculate it, and I believe that this is the first thing anyone would do when reading these logs and metrics. If that's an essential thing then maybe it should be available out of the box? > This also works correctly with the historical rebalance. > Now we can see rebalance type
Re: PDS suites fail with exit code 137
Hi Ivan P., I configured it for both PDS (Indexing) and PDS 4 (was asked by Nikita Tolstunov). It totally worked, not a single 137 since then. Occasional 130 will be fixed in [1], it has a different problem behind it. Now I'm trying to find someone who knows TC configuration better and will be able to propagate the setting to all suites. Also I don't have the access to agents so "jemalloc" is definitely not an option for me specifically. [1] https://issues.apache.org/jira/browse/IGNITE-13266 вс, 26 июл. 2020 г. в 17:36, Ivan Pavlukhin : > Ivan B., > > I noticed that you were able to configure environment variables for > PDS (Indexing). Do field experiments show that the suggested approach > fixes the problem? > > Interesting stuff with jemalloc. It might be useful to file a ticket. > > 2020-07-23 16:07 GMT+03:00, Ivan Daschinsky : > >> > >> About "jemalloc" - it's also an option, but it also requires > >> reconfiguring > >> suites on > >> TC, maybe in a more complicated way. It requires additional > installation, > >> right? > >> Can we stick to the solution that I already tested or should we update > TC > >> agents? :) > > > > > > Yes, if you want to use jemalloc, you should install it and configure a > > specific env variable. > > This is just an option to consider, nothing more. I suppose that your > > approach is may be the > > best variant right now. > > > > > > чт, 23 июл. 2020 г. в 15:28, Ivan Bessonov : > > > >> > > >> > glibc allocator uses arenas for minimize contention between threads > >> > >> > >> I understand it the same way. I did testing with running of Indexing > >> suite > >> locally > >> and periodically executing "pmap ", it showed that the number of > >> 64mb > >> arenas grows constantly and never shrinks. By the middle of the suite > the > >> amount > >> of virtual memory was close to 50 Gb and used physical memory was at > >> least > >> 6-7 Gb, if I recall it correctly. I have only 8 cores BTW, so it should > >> be > >> worse on TC. > >> It means that there is enough contention somewhere in tests. > >> > >> About "jemalloc" - it's also an option, but it also requires > >> reconfiguring > >> suites on > >> TC, maybe in a more complicated way. It requires additional > installation, > >> right? > >> Can we stick to the solution that I already tested or should we update > TC > >> agents? :) > >> > >> чт, 23 июл. 2020 г. в 15:02, Ivan Daschinsky : > >> > >> > AFAIK, glibc allocator uses arenas for minimize contention between > >> threads > >> > when they trying to access > >> > or free preallocated bit of memory. But seems that we > >> > use -XX:+AlwaysPreTouch, so heap is allocated > >> > and committed at start time. We allocate memory for durable memory in > >> > one > >> > thread. > >> > So I think there will be not so much contention between threads for > >> native > >> > memory pools. > >> > > >> > Also, there is another approach -- try to use jemalloc. > >> > This allocator shows better result than default glibc malloc in our > >> > scenarios. (memory consumption) [1] > >> > > >> > [1] -- > >> > > >> > > >> > http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/ > >> > > >> > > >> > > >> > чт, 23 июл. 2020 г. в 14:19, Ivan Bessonov : > >> > > >> > > Hello Ivan, > >> > > > >> > > It feels like the problem is more about new starting threads rather > >> than > >> > > the > >> > > allocation of offheap regions. Plus I'd like to see results soon, > >> > > your > >> > > proposal is > >> > > a major change for Ignite that can't be implemented fast enough. > >> > > > >> > > Anyway, I think this makes sense, considering that one day Unsafe > >> > > will > >> be > >> > > removed. But I wouldn't think about it right now, maybe as a > separate > >> > > proposal... > >> > > > >> > > > >> > > > >> > > чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky : > >> > > > >> > > > Ivan, I think that we should use mmap/munmap to allocate huge > >> > > > chunks > >> of > >> > > > memory. > >> > > > > >> > > > I've experimented with JNA and invoke mmap/munmap with it and it > >> works > >> > > > fine. > >> > > > May be we can create module (similar to direct-io) that use > >> mmap/munap > >> > on > >> > > > platforms, that support them > >> > > > and fallback to Unsafe if not? > >> > > > > >> > > > чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov >: > >> > > > > >> > > > > Hello Igniters, > >> > > > > > >> > > > > I'd like to discuss the current issue with "out of memory" fails > >> > > > > on > >> > > > > TeamCity. Particularly suites [1] > >> > > > > and [2], they have quite a lot of "Exit code 137" failures. > >> > > > > > >> > > > > I investigated the "PDS (Indexing)" suite under [3]. There's > >> another > >> > > > > similar issue as well: [4]. > >> > > > > I came to the conclusion that the main problem is inside the > >> default > >> > > > memory > >> > > > > allocator (malloc). > >> > > > > Let me explain the way I see it right now: > >> >