[jira] [Created] (IGNITE-13304) Improve javadocs for classes related to cache configuration enrichment

2020-07-27 Thread Vyacheslav Koptilin (Jira)
Vyacheslav Koptilin created IGNITE-13304:


 Summary: Improve javadocs for classes related to cache 
configuration enrichment 
 Key: IGNITE-13304
 URL: https://issues.apache.org/jira/browse/IGNITE-13304
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.8.1
Reporter: Vyacheslav Koptilin
Assignee: Vyacheslav Koptilin


In my opinion, some classes related to cache configuration enrichment need to 
be 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13303) Update mockito from 1.10.19 to 3.x

2020-07-27 Thread Alexey Kuznetsov (Jira)
Alexey Kuznetsov created IGNITE-13303:
-

 Summary: Update mockito from 1.10.19 to 3.x
 Key: IGNITE-13303
 URL: https://issues.apache.org/jira/browse/IGNITE-13303
 Project: Ignite
  Issue Type: Task
  Components: general
Affects Versions: 2.9
Reporter: Alexey Kuznetsov
Assignee: Alexey Kuznetsov
 Fix For: 2.10


We are using very old version of Mockito: 1.10.19

{panel:title=According to Mockito GitHub:}
Still on Mockito 1.x? See what's new in Mockito 2!
Mockito 3 does not introduce any breaking API changes, but now requires Java 8 
over Java 6 for Mockito 2.{panel}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSSION] Cache warmup

2020-07-27 Thread Вячеслав Коптилин
Hello Kirill,

Thanks a lot for driving this activity. If I am not mistaken, this
discussion relates to IEP-40.

> I suggest adding a warmup phase after recovery here [1] after [2], before
discovery.
This means that the user's thread, which starts Ignite via
Ignition.start(), will wait for ana additional step - cache warm-up.
I think this fact has to be clearly mentioned in our documentation (at
Javadocat least) because this step can be time-consuming.

> I suggest adding a new interface:
I would change it a bit. First of all, it would be nice to place this
interface to a public package and get rid of using GridCacheContext,
which is an internal class and it should not leak to the public API in any
case.
Perhaps, this parameter is not needed at all or we should add some public
abstraction instead of internal class.

package org.apache.ignite.configuration;

import org.apache.ignite.IgniteCheckedException;
import org.apache.ignite.lang.IgniteFuture;

public interface CacheWarmupper {
/**
 * Warmup cache.
 *
 * @param cachename Cache name.
 * @return Future cache warmup.
 * @throws IgniteCheckedException If failed.
 */
IgniteFuture warmup(String cachename) throws IgniteCheckedException;
}

Thanks,
S.

пн, 27 июл. 2020 г. в 15:03, ткаленко кирилл :

> Now, after restarting node, we have only cold caches, which at first
> requests to them will gradually load data from disks, which can slow down
> first calls to them.
> If node has more RAM than data on disk, then they can be loaded at start
> "warmup", thereby solving the issue of slowdowns during first calls to
> caches.
>
> I suggest adding a warmup phase after recovery here [1] after [2], before
> descovery.
>
> I suggest adding a new interface:
>
> package org.apache.ignite.internal.processors.cache;
>
> import org.apache.ignite.IgniteCheckedException;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.jetbrains.annotations.Nullable;
>
> /**
>  * Interface for warming up cache.
>  */
> public interface CacheWarmup {
> /**
>  * Warmup cache.
>  *
>  * @param cacheCtx Cache context.
>  * @return Future cache warmup.
>  * @throws IgniteCheckedException if failed.
>  */
> @Nullable IgniteInternalFuture process(GridCacheContext cacheCtx)
> throws IgniteCheckedException;
> }
>
> Which will allow to warm up caches in parallel and asynchronously. Warmup
> phase will end after all IgniteInternalFuture for all caches isDone.
>
> Also adding the ability to customize via methods:
> org.apache.ignite.configuration.IgniteConfiguration#setDefaultCacheWarmup
> org.apache.ignite.configuration.CacheConfiguration#setCacheWarmup
>
> Which will allow for each cache to set implementation of cache warming up,
> both for a specific cache, and for all if necessary.
>
> I suggest adding an implementation of SequentialWarmup that will use [3].
>
> Questions, suggestions, comments?
>
> [1] -
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#afterLogicalUpdatesApplied
> [2] -
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#restorePartitionStates
> [3] -
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManager.CacheDataStore#preload
>


[jira] [Created] (IGNITE-13302) Java thin client connect/disconnect during topology update may lead to partition divergence in ignite-sys-cache

2020-07-27 Thread Stepachev Maksim (Jira)
Stepachev Maksim created IGNITE-13302:
-

 Summary: Java thin client connect/disconnect during topology 
update may lead to partition divergence in ignite-sys-cache
 Key: IGNITE-13302
 URL: https://issues.apache.org/jira/browse/IGNITE-13302
 Project: Ignite
  Issue Type: Bug
Reporter: Stepachev Maksim
Assignee: Stepachev Maksim


And you can see partition inconsistency in ignite-sys-cache
{noformat}
[2020-04-23 15:26:31,816][WARN ][sys-#45%gridgain.Sdsb11784Ver20%][root] 
Partition states validation has failed for group: ignite-sys-cache, msg: 
Partitions cache sizes are inconsistent for Part 31: [127.0.0.1:47500=1 
127.0.0.1:47501=2 ] Part 43: [127.0.0.1:47500=3 127.0.0.1:47501=4 ] Part 44: 
[127.0.0.1:47500=1 127.0.0.1:47501=2 ] Part 46: [127.0.0.1:47500=0 
127.0.0.1:47501=1 ] Part 91: [127.0.0.1:47500=1 127.0.0.1:47501=2 ]
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13301) IgniteScheduler has to run inside the Ignite Sandbox.

2020-07-27 Thread Denis Garus (Jira)
Denis Garus created IGNITE-13301:


 Summary: IgniteScheduler has to run inside the Ignite Sandbox.
 Key: IGNITE-13301
 URL: https://issues.apache.org/jira/browse/IGNITE-13301
 Project: Ignite
  Issue Type: Task
  Components: security
Reporter: Denis Garus
Assignee: Denis Garus


IgniteScheduler has to run inside the Ignite Sandbox on a remote node.

For example:

{code:java}
Ignition.localIgnite().compute().run(() -> {
IgniteScheduler scheduler = Ignition.localIgnite().scheduler();

scheduler.runLocal(AbstractSandboxTest::controlAction).get();
}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSSION] Cache warmup

2020-07-27 Thread ткаленко кирилл
Now, after restarting node, we have only cold caches, which at first requests 
to them will gradually load data from disks, which can slow down first calls to 
them.
If node has more RAM than data on disk, then they can be loaded at start 
"warmup", thereby solving the issue of slowdowns during first calls to caches.

I suggest adding a warmup phase after recovery here [1] after [2], before 
descovery.

I suggest adding a new interface:

package org.apache.ignite.internal.processors.cache;

import org.apache.ignite.IgniteCheckedException;
import org.apache.ignite.internal.IgniteInternalFuture;
import org.jetbrains.annotations.Nullable;

/**
 * Interface for warming up cache.
 */
public interface CacheWarmup {
/**
 * Warmup cache.
 *
 * @param cacheCtx Cache context.
 * @return Future cache warmup.
 * @throws IgniteCheckedException if failed.
 */
@Nullable IgniteInternalFuture process(GridCacheContext cacheCtx) throws 
IgniteCheckedException;
}

Which will allow to warm up caches in parallel and asynchronously. Warmup phase 
will end after all IgniteInternalFuture for all caches isDone.

Also adding the ability to customize via methods:
org.apache.ignite.configuration.IgniteConfiguration#setDefaultCacheWarmup
org.apache.ignite.configuration.CacheConfiguration#setCacheWarmup

Which will allow for each cache to set implementation of cache warming up, both 
for a specific cache, and for all if necessary.

I suggest adding an implementation of SequentialWarmup that will use [3].

Questions, suggestions, comments?

[1] - 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#afterLogicalUpdatesApplied
[2] - 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#restorePartitionStates
[3] - 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManager.CacheDataStore#preload


[jira] [Created] (IGNITE-13300) Ignite sandbox vulnerability allows to execute user code in privileged proxy

2020-07-27 Thread Aleksey Plekhanov (Jira)
Aleksey Plekhanov created IGNITE-13300:
--

 Summary: Ignite sandbox vulnerability allows to execute user code 
in privileged proxy
 Key: IGNITE-13300
 URL: https://issues.apache.org/jira/browse/IGNITE-13300
 Project: Ignite
  Issue Type: Bug
  Components: security
Affects Versions: 2.9
Reporter: Aleksey Plekhanov
Assignee: Aleksey Plekhanov


Ignite sandbox returns a privileged proxy for Ignite and some other system 
interfaces. If the user implements one of these interfaces and gets via 
privileged proxy an instance of implemented class, privileged proxy for user 
class will be returned.
Reproducer:

{code:java}
public void testPrivelegedUserObject() throws Exception {

grid(CLNT_FORBIDDEN_WRITE_PROP).getOrCreateCache(DEFAULT_CACHE_NAME).put(0, new 
TestIterator<>());

runForbiddenOperation(() -> 
grid(CLNT_FORBIDDEN_WRITE_PROP).compute().run(() -> {
GridIterator it = 
(GridIterator)Ignition.localIgnite().cache(DEFAULT_CACHE_NAME).get(0);

it.iterator();
}), AccessControlException.class);
}

public static class TestIterator extends GridIterableAdapter {
public TestIterator() {
super(Collections.emptyIterator());
}

@Override public GridIterator iterator() {
controlAction();

return super.iterator();
}
}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13299) Flacky GridServiceDeployClusterReadOnlyModeTest and all including tests.

2020-07-27 Thread Stanilovsky Evgeny (Jira)
Stanilovsky Evgeny created IGNITE-13299:
---

 Summary: Flacky GridServiceDeployClusterReadOnlyModeTest and all 
including tests.
 Key: IGNITE-13299
 URL: https://issues.apache.org/jira/browse/IGNITE-13299
 Project: Ignite
  Issue Type: Bug
  Components: managed services
Affects Versions: 2.8.1
Reporter: Stanilovsky Evgeny
Assignee: Stanilovsky Evgeny


Starting 
GridServiceDeployClusterReadOnlyModeTest#testDeployClusterSingletonAllowed 
until failure and catch assertion on 200 iteration.  This failure is due 
incorrect assumption that if service already deployed 
org.apache.ignite.services.Service#execute would be called before  returning 
from deployment would be happened. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Extended logging for rebalance performance analysis

2020-07-27 Thread ткаленко кирилл
Discussed in personal correspondence with Stas, decided to improve the message:
Completed rebalancing [grp=grp0, supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, 
partitions=2, entries=60, duration=8ms, bytesRcvd=5,9 KB,
topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], progress=1/3, 
rebalanceId=1]

into:
Completed rebalancing [grp=grp0, supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, 
partitions=2, entries=60, duration=8ms, bytesRcvd=5,9 KB, avgSpeed=5,9 
KB/sec,
histPartitions=1, histEntries=30, histBytesRcvd=1 KB,
fullPartitions=1, fullEntries=30, fullBytesRcvd=3 KB
topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], progress=1/3, 
rebalanceId=1]

Where:
partitions=2 - total number of partitions received
entries=60 - total number of entries received
duration=8ms - duration from first demand of message to output of message to log
bytesRcvd=5,9 KB - total number of bytes received in B,KB,MB,GB

avgSpeed= bytesRcvd/duration in KB/sec

histPartitions=1 - total number of partitions received by historical mode
histEntries=30 - total number of entries received by historical mode
histBytesRcvd=1 KB - total number of bytes received in B,KB,MB,GB by historical 
mode

fullPartitions=1 - total number of partitions received by full mode
fullEntries=30 - total number of entries received by full mode
fullBytesRcvd=3 KB - total number of bytes received in B,KB,MB,GB by full mode

27.07.2020, 11:50, "ткаленко кирилл" :
> Discussed in personal correspondence with Stas, decided to improve the 
> message:
> Completed rebalancing [grp=grp0, 
> supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1,
>     partitions=2, entries=60, duration=8ms, bytesRcvd=5,9 KB,
>     topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], progress=1/3, 
> rebalanceId=1]
>
> into:
> Completed rebalancing [grp=grp0, 
> supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1,
>     partitions=2, entries=60, duration=8ms, bytesRcvd=5,9 KB, avgSpeed=5,9 
> KB/sec,
>     histPartitions=1, histEntries=30, histBytesRcvd=1 KB,
>     fullPartitions=1, fullEntries=30, fullBytesRcvd=3 KB
>     topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], progress=1/3, 
> rebalanceId=1]
>
> Where:
> partitions=2 - total number of partitions received
> entries=60 - total number of entries received
> duration=8ms - duration from first demand of message to output of message to 
> log
> bytesRcvd=5,9 KB - total number of bytes received in B,KB,MB,GB
>
> avgSpeed= bytesRcvd/duration in KB/sec
>
> histPartitions=1 - total number of partitions received by historical mode
> histEntries=30 - total number of entries received by historical mode
> histBytesRcvd=1 KB - total number of bytes received in B,KB,MB,GB by 
> historical mode
>
> fullPartitions=1 - total number of partitions received by full mode
> fullEntries=30 - total number of entries received by full mode
> fullBytesRcvd=3 KB - total number of bytes received in B,KB,MB,GB by full mode
>
> 03.07.2020, 17:21, "ткаленко кирилл" :
>> Sorry, forget.
>>
>> [1] - 
>> org.apache.ignite.internal.processors.cache.CacheGroupsMetricsRebalanceTest#testCacheGroupRebalance
>>
>> 03.07.2020, 17:20, "ткаленко кирилл" :
>>>  Hi, Stan!
>>>
>>>  I don't understand you yet.
>>>
>>>  Now you can use metrics as it was done in the test [1]. Or can you tell me 
>>> where to do this, for example when completing rebalancing for all groups?
>>>
>>>  See what is now available and added in the logs:
>>>  1)Which group is rebalanced and which type of rebalance.
>>>  Starting rebalance routine [grp0, topVer=AffinityTopologyVersion 
>>> [topVer=4, minorTopVer=0], supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, 
>>> fullPartitions=[4, 7], histPartitions=[], rebalanceId=1]
>>>
>>>  2) Completion of rebalancing from one of the suppliers.
>>>  Completed rebalancing [grp=grp0, 
>>> supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, partitions=2, entries=60, 
>>> duration=8ms, bytesRcvd=5,9 KB, topVer=AffinityTopologyVersion [topVer=4, 
>>> minorTopVer=0], progress=1/3, rebalanceId=1]
>>>
>>>  3) Completion of the entire rebalance.
>>>  Completed rebalance chain: [rebalanceId=1, partitions=116, entries=400, 
>>> duration=41ms, bytesRcvd=40,4 KB]
>>>
>>>  These messages have a common parameter rebalanceId=1.
>>>
>>>  03.07.2020, 16:48, "Stanislav Lukyanov" :
>    On 3 Jul 2020, at 09:51, ткаленко кирилл  wrote:
>
>    To calculate the average value, you can use the existing metrics 
> "RebalancingStartTime", "RebalancingLastCancelledTime", 
> "RebalancingEndTime", "RebalancingPartitionsLeft", 
> "RebalancingReceivedKeys" and "RebalancingReceivedBytes".

   You can calculate it, and I believe that this is the first thing anyone 
 would do when reading these logs and metrics.
   If that's an essential thing then maybe it should be available out of 
 the box?

>    This also works correctly with the historical rebalance.
>    Now we can see rebalance type 

Re: PDS suites fail with exit code 137

2020-07-27 Thread Ivan Bessonov
Hi Ivan P.,

I configured it for both PDS (Indexing) and PDS 4 (was asked by Nikita
Tolstunov). It totally worked, not a single 137 since then.
Occasional 130 will be fixed in [1], it has a different problem behind it.

Now I'm trying to find someone who knows TC configuration better and
will be able to propagate the setting to all suites. Also I don't have the
access to agents so "jemalloc" is definitely not an option for me
specifically.

[1] https://issues.apache.org/jira/browse/IGNITE-13266

вс, 26 июл. 2020 г. в 17:36, Ivan Pavlukhin :

> Ivan B.,
>
> I noticed that you were able to configure environment variables for
> PDS (Indexing). Do field experiments show that the suggested approach
> fixes the problem?
>
> Interesting stuff with jemalloc. It might be useful to file a ticket.
>
> 2020-07-23 16:07 GMT+03:00, Ivan Daschinsky :
> >>
> >> About "jemalloc" - it's also an option, but it also requires
> >> reconfiguring
> >> suites on
> >> TC, maybe in a more complicated way. It requires additional
> installation,
> >> right?
> >> Can we stick to the solution that I already tested or should we update
> TC
> >> agents? :)
> >
> >
> > Yes, if you want to use jemalloc, you should install it and configure a
> > specific env variable.
> > This is just an option to consider, nothing more. I suppose that your
> > approach is may be the
> > best variant right now.
> >
> >
> > чт, 23 июл. 2020 г. в 15:28, Ivan Bessonov :
> >
> >> >
> >> > glibc allocator uses arenas for minimize contention between threads
> >>
> >>
> >> I understand it the same way. I did testing with running of Indexing
> >> suite
> >> locally
> >> and periodically executing "pmap ", it showed that the number of
> >> 64mb
> >> arenas grows constantly and never shrinks. By the middle of the suite
> the
> >> amount
> >> of virtual memory was close to 50 Gb and used physical memory was at
> >> least
> >> 6-7 Gb, if I recall it correctly. I have only 8 cores BTW, so it should
> >> be
> >> worse on TC.
> >> It means that there is enough contention somewhere in tests.
> >>
> >> About "jemalloc" - it's also an option, but it also requires
> >> reconfiguring
> >> suites on
> >> TC, maybe in a more complicated way. It requires additional
> installation,
> >> right?
> >> Can we stick to the solution that I already tested or should we update
> TC
> >> agents? :)
> >>
> >> чт, 23 июл. 2020 г. в 15:02, Ivan Daschinsky :
> >>
> >> > AFAIK, glibc allocator uses arenas for minimize contention between
> >> threads
> >> > when they trying to access
> >> > or free preallocated bit of memory. But seems that we
> >> > use -XX:+AlwaysPreTouch, so heap is allocated
> >> > and committed at start time. We allocate memory for durable memory in
> >> > one
> >> > thread.
> >> > So I think there will be not so much contention between threads for
> >> native
> >> > memory pools.
> >> >
> >> > Also, there is another approach -- try to use jemalloc.
> >> > This allocator shows better result than default glibc malloc in our
> >> > scenarios. (memory consumption) [1]
> >> >
> >> > [1] --
> >> >
> >> >
> >>
> http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/
> >> >
> >> >
> >> >
> >> > чт, 23 июл. 2020 г. в 14:19, Ivan Bessonov :
> >> >
> >> > > Hello Ivan,
> >> > >
> >> > > It feels like the problem is more about new starting threads rather
> >> than
> >> > > the
> >> > > allocation of offheap regions. Plus I'd like to see results soon,
> >> > > your
> >> > > proposal is
> >> > > a major change for Ignite that can't be implemented fast enough.
> >> > >
> >> > > Anyway, I think this makes sense, considering that one day Unsafe
> >> > > will
> >> be
> >> > > removed. But I wouldn't think about it right now, maybe as a
> separate
> >> > > proposal...
> >> > >
> >> > >
> >> > >
> >> > > чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky :
> >> > >
> >> > > > Ivan, I think that we should use mmap/munmap to allocate huge
> >> > > > chunks
> >> of
> >> > > > memory.
> >> > > >
> >> > > > I've experimented with JNA and invoke mmap/munmap with it and it
> >> works
> >> > > > fine.
> >> > > > May be we can create module (similar to direct-io) that use
> >> mmap/munap
> >> > on
> >> > > > platforms, that support them
> >> > > > and fallback to Unsafe if not?
> >> > > >
> >> > > > чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov  >:
> >> > > >
> >> > > > > Hello Igniters,
> >> > > > >
> >> > > > > I'd like to discuss the current issue with "out of memory" fails
> >> > > > > on
> >> > > > > TeamCity. Particularly suites [1]
> >> > > > > and [2], they have quite a lot of "Exit code 137" failures.
> >> > > > >
> >> > > > > I investigated the "PDS (Indexing)" suite under [3]. There's
> >> another
> >> > > > > similar issue as well: [4].
> >> > > > > I came to the conclusion that the main problem is inside the
> >> default
> >> > > > memory
> >> > > > > allocator (malloc).
> >> > > > > Let me explain the way I see it right now:
> >> >