Re: Stop sending IGNITE Created e-mails to dev@

2021-04-14 Thread Denis Mekhanikov
Huge +1 to this.

I've already brought up this topic in the past:
http://apache-ignite-developers.2346864.n4.nabble.com/Bots-on-dev-list-td34406.html
I hope some day newcomers won't need to set up their email filters when
they come to the developers list.

Denis

ср, 14 апр. 2021 г. в 18:07, Atri Sharma :

> +1 to move issues to the issues list.
>
> For MTCGA, maybe build@?
>
> On Wed, Apr 14, 2021 at 8:35 PM Ilya Kasnacheev  wrote:
> >
> > Hello!
> >
> > We have a discussion on how to ensure best engagement in dev@ list, and
> it
> > seems that Issue Created emails from IGNITE project consume a lot of
> screen
> > space, it's hard to spot genuine discussions in
> > https://lists.apache.org/list.html?dev@ignite.apache.org for example.
> >
> > We already have issues@ mailing list. I propose that we stop sending any
> > JIRA emails to dev@. If anyone wishes to get just Created emails, they
> can
> > subscribe to these messages in their JIRA account settings. I imagine
> most
> > of you already filter these messages out, so you may need to adjust your
> > filters slightly.
> >
> > A distant second is MTCGA messages, which are also autogenerated and not
> > informative for most readers of the channel, since they are at best
> > targeted at a single committer and at worst flaky.
> >
> > Where could we move those? What is your opinion here, on both issues?
> >
> > Regards,
>
> --
> Regards,
>
> Atri
> Apache Concerted
>


Re: IGNITE-13399 Fix access right issues in computation of system metrics

2021-04-09 Thread Denis Mekhanikov
Mirza,

Thanks for the review!

Can somebody help with a merge?

Denis

чт, 1 апр. 2021 г. в 13:07, Mirza Aliev :

> Hi Denis!
>
> Thank you for the PR!
>
> I've left a comment.
>
> Best regards,
> Mirza Aliev
>
> чт, 1 апр. 2021 г. в 11:48, Denis Mekhanikov :
>
> > Hi everyone!
> >
> > I've prepared a PR for the following issue:
> > https://issues.apache.org/jira/browse/IGNITE-13399
> > Currently on Java 8 CpuLoad metric is reported to be equal to -1 when
> > running in embedded mode.
> >
> > Could anyone take a look?
> > Thanks!
> >
> > Denis
> >
>


[jira] [Created] (IGNITE-14505) Print information about striped pool in metrics for local node

2021-04-08 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-14505:
-

 Summary: Print information about striped pool in metrics for local 
node
 Key: IGNITE-14505
 URL: https://issues.apache.org/jira/browse/IGNITE-14505
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Mekhanikov
Assignee: Denis Mekhanikov


Currently only information about public and system thread pools are printed in 
metrics for a local node. It would be good to have the same information about 
striped pool as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


IGNITE-13399 Fix access right issues in computation of system metrics

2021-04-01 Thread Denis Mekhanikov
Hi everyone!

I've prepared a PR for the following issue:
https://issues.apache.org/jira/browse/IGNITE-13399
Currently on Java 8 CpuLoad metric is reported to be equal to -1 when
running in embedded mode.

Could anyone take a look?
Thanks!

Denis


[jira] [Created] (IGNITE-14349) Add query attributes to QueryHistoryTracker#qryHist and QueryHistory

2021-03-19 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-14349:
-

 Summary: Add query attributes to QueryHistoryTracker#qryHist and 
QueryHistory
 Key: IGNITE-14349
 URL: https://issues.apache.org/jira/browse/IGNITE-14349
 Project: Ignite
  Issue Type: Improvement
  Components: sql
Reporter: Denis Mekhanikov


We want to see two additional metrics into an query history: enforceJoinOrder 
and lazy. Someone should add them to QueryHistoryTracker#qryHist and map to 
QueryHistory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13886) Change units of cache-related histograms to milliseconds

2020-12-22 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-13886:
-

 Summary: Change units of cache-related histograms to milliseconds
 Key: IGNITE-13886
 URL: https://issues.apache.org/jira/browse/IGNITE-13886
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Mekhanikov


Ignite has different metrics that have the "histogram" type:
 * tx.nodeSystemTimeHistogram
 * tx.nodeUserTimeHistogram
 * pme.DurationHistogram
 * pme.CacheOperationsBlockedDurationHistogram
 * cache..GetTime
 * cache..PutTime
 * cache..RemoveTime
 * cache..CommitTime
 * cache..RollbackTime

First four have buckets corresponding to the amount of time the operation took 
in milliseconds.

Cache-related histograms are measured in nanoseconds, while it would be enough 
to use milliseconds there as well.

It's hard to distinguish between 1 and 10 nanoseconds visually.

The following set of buckets should be used:
 * 1
 * 10
 * 100
 * 250
 * 1000

The values are provided in milliseconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13642) Nodes fail when they meet objects of unknown type in metastorage

2020-10-29 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-13642:
-

 Summary: Nodes fail when they meet objects of unknown type in 
metastorage
 Key: IGNITE-13642
 URL: https://issues.apache.org/jira/browse/IGNITE-13642
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.9
Reporter: Denis Mekhanikov


When a node sees an object of a class that is missing on this node's classpath, 
it fails with the following exception:
{noformat}
[16:46:47,134][SEVERE][disco-notifier-worker-#41][] Critical system error 
detected. Will be handled accordingly to configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=CRITICAL_ERROR, err=class 
o.a.i.IgniteCheckedException: Failed to find class with given class loader for 
unmarshalling (make sure same versions of all classes are available on all 
nodes or enable peer-class-loading) 
[clsLdr=sun.misc.Launcher$AppClassLoader@764c12b6, 
cls=example.ClientNode$BamboozleClass]]]
class org.apache.ignite.IgniteCheckedException: Failed to find class with given 
class loader for unmarshalling (make sure same versions of all classes are 
available on all nodes or enable peer-class-loading) 
[clsLdr=sun.misc.Launcher$AppClassLoader@764c12b6, 
cls=example.ClientNode$BamboozleClass]
at 
org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:128)
at 
org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:138)
at 
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageUtil.unmarshal(DistributedMetaStorageUtil.java:61)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.completeWrite(DistributedMetaStorageImpl.java:1161)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onUpdateMessage(DistributedMetaStorageImpl.java:1089)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:650)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:521)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2718)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2756)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: example.ClientNode$BamboozleClass
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.ignite.internal.util.IgniteUtils.forName(IgniteUtils.java:9061)
at 
org.apache.ignite.marshaller.jdk.JdkMarshallerObjectInputStream.resolveClass(JdkMarshallerObjectInputStream.java:58)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1925)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1808)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2099)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
at 
org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:123)
... 11 more{noformat}
The result is that one node can write an object of some custom class to the 
metastorage and make all other nodes fail.

The following reproducer can be used:
{code:java}
public class ClientNode {
public static void main(String[] args) throws IgniteCheckedException {
IgniteConfiguration igniteCfg = nodeConfiguration().setClientMode(true);

IgniteKernal ignite = (IgniteKernal) Ignition.start(igniteCfg);
DistributedMetaStorage metaStorage = 
ignite.context().distributedMetastorage();

metaStorage.write("hey", new BamboozleClass());
}

private static IgniteConfiguration nodeConfiguration() {
IgniteConfiguration igniteCfg = new IgniteConfiguration();

TcpDiscoverySpi discover

[jira] [Created] (IGNITE-13631) Improve names and descriptions of data storage metrics

2020-10-27 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-13631:
-

 Summary: Improve names and descriptions of data storage metrics
 Key: IGNITE-13631
 URL: https://issues.apache.org/jira/browse/IGNITE-13631
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Mekhanikov
Assignee: Denis Mekhanikov


Data storage metrics have unclear descriptions. They need to be improved.
||Metric||Description||Comment||
|*WalLoggingRate*|Average number of WAL records per second written during the 
last time interval.|The "time interval" part is unclear. Which time interval?|
|*WalFsyncTimeDuration*|Total duration of fsync|Why not just *WalFsyncDuration*?
The description could be more verbose.|
|*WalFsyncTimeNum*|Total count of fsync|Why not just *WalFsyncNum*? The 
description could be more verbose|
|*WalBuffPollSpinsRate*|WAL buffer poll spins number over the last time 
interval. |Over which time interval?|
|*LastCheckpointMarkDuration*|Duration of the checkpoint lock wait in 
milliseconds |The description doesn't match the name.|
|*CheckpointTotalTime*|Total duration of checkpoint|Is it the duration of the 
last checkpoint or all checkpoints from the beginning?|
|*StorageSize*|Storage space allocated, in bytes.|It's unclear which storage 
this is about. Is disk included, or is it just about memory?|
|*WalTotalSize*|Total size in bytes for storage wal files.|WAL should be 
capital. The grammar|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSSION] User-facing API for managing Maintenance Mode

2020-10-05 Thread Denis Mekhanikov
Sergey,

Thanks for such a detailed description!

I find the first option more attractive. I see the maintenance mode as a
special state of a node that a user can turn on and off. If you want to
perform defragmentation, you need to turn that mode on. If you try to do it
in a normal mode, you get an error and a suggestion to turn MM on. Certain
commands will have a dependency on this mode.
It's like "active" / "inactive" / "read-only" cluster states, but for
nodes. You need to have an active cluster to perform cache puts. Similarly
you'll need to have a node in a maintenance mode to perform PDS recovery.

The approach with a "maintenance" command introduces the limitation that
the control utility will have to know about every command that requires
maintenance. There is a chance that this command will become bloated with
options. It will also be problematic for plugins to introduce new commands
requiring the maintenance mode.

Denis

вт, 29 сент. 2020 г. в 18:03, Sergey Chugunov :

> Hello Ignite dev community,
>
> As internal implementation of Maintenance Mode [1] is getting closer to
> finish I want to discuss one more thing: user-facing API (I will use
> control utility for examples) for managing it.
>
> What should be managed?
> When a node enters MM, it may start some automatic actions (like
> defragmentation) or wait for a user to intervene and resolve the issue
> (like in case of pds corruption).
>
> So for manually triggered operations like pds cleanup after corruption we
> should provide the user with a way to actually trigger the operation.
> And for long-running automatic operations like defragmentation actions like
> status and cancel are reasonable to implement.
>
> At the same time Maintenance Mode is a supporting feature; it doesn't bring
> any value by itself but enables implementation of other features.
> Thus putting it at the center of API and build all commands around the main
> "maintenance" command may not be right.
>
> There are two alternatives - "*Big features deserve their own commands*"
> and "*Everything should be unified*". Consider them.
>
> Big features deserve their own commands
> Here for each big feature we implement its own command. Defragmentation is
> a big separate feature so why shouldn't it have its own commands to request
> or cancel it?
>
> Examples
> *control.sh defragmentation request-for-node --nodeId 
> [--caches ]* - defragmentation will be started on the
> particular node after its restart.
> *control.sh defragmentation status* - prints information about status
> of on-going defragmentation.
> *control.sh defragmentation cancel* - cancels on-going defragmentation.
>
> Another command - "maintenance" - will be used for more generic purposes.
>
> Examples
> *control.sh maintenance list-records* - prints information about each
> maintenance record (id and name of the record, parameters, description,
> current status).
> *control.sh maintenance record-actions --id * - prints
> information about user-triggered actions available for this record (e.g.
> for pds corruption record it may be "clean-corrupted-files")
> *control.sh maintenance execute-action --id  --action-name
> * - triggers execution of particular action and prints
> results.
>
> *Pros:*
>
>1. Big features like defragmentation get their own commands and more
>freedom in implementing them.
>2. It is emphasized that maintenance mode is just a supporting thing and
>not a first-class feature (it is not at the center of API).
>
> *Cons:*
>
>1. Duplication of functionality. The same functions may be available via
>general maintenance command and a separate command of the feature.
>2. Information about a feature may be split into two commands. One piece
>of information is available in the "feature" command, another in the
>"maintenance" command.
>
>
> Everything should be unified
> We can go another way and gather all features that rely on MM under one
> unified command.
>
> API for node that is already in MM looks complete and logical, very
> intuitive:
> *control.sh maintenance list-records* - output all records that have to
> be resolved to finish maintenance.
> *control.sh maintenance record-actions --id * - all actions
> available for the record.
> *control.sh maintenance execute-action --id  --action-name
> * - executes action of the given name (like general actions
> "status" or "delete" and more specific action "clean-corrupted-files" for
> corrupted pds situation).
>
> But API to request node to enter maintenance mode becomes more vague.
> *control.sh maintenance available-operations* - prints all operations
> available to request (for instance, defragmentation).
> control.sh maintenance request-operation --id  --params
>  - requests given operation to start on next node
> restart.
> Here we have to distinguish operations that are requested automatically
> (like pds corruption) and not show them to the user.
>
> *Pros:*
>

Re: Too many messages in log in case of exceptions in user computations.

2020-09-25 Thread Denis Mekhanikov
Nikolay,

First the error is printed on the map node, then on the reduce node, and
then the exception is thrown from the method that triggered the execution.
The compute grid seems to use the "print an error message and delegate
exception to the caller" which doesn't make much sense.
If you don't know what to do with an error and delegate it to the caller,
what's the point in printing it to log?
This is the kind of information that should only be printed when debug
logging is enabled.

I created a ticket for this issue:
https://issues.apache.org/jira/browse/IGNITE-13487

Denis

пн, 10 авг. 2020 г. в 18:45, Nikolay Izhikov :

> Hello, Vasiliy.
>
> This messages are shown on the different nodes, isn’t it?
>
> > 10 авг. 2020 г., в 18:34, Vasiliy Sisko 
> написал(а):
> >
> > In case of errors in user computations the Ignite Compute Grid produces a
> > lot of errors into the log.
> > In the worst way it produces the next messages:
> >1. Failed to execute job: …
> >2. Failed to reduce job results: …
> >3. Failed to execute task: …
> >
> > There is a suggestion to decrease log level for first and second messages
> > to DEBUG level, because of the third message will still be shown.
> >
> > --
> > Vasiliy Sisko
>
>


[jira] [Created] (IGNITE-13487) Decrease logging level for exceptions throws from compute engine

2020-09-25 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-13487:
-

 Summary: Decrease logging level for exceptions throws from compute 
engine
 Key: IGNITE-13487
 URL: https://issues.apache.org/jira/browse/IGNITE-13487
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Mekhanikov
Assignee: Denis Mekhanikov


When a compute job fails during execution, it leads to two error messages 
printed on different nodes:

1. {{Failed to execute job}} on the map node.
 2. {{Failed to obtain remote job result policy for result from 
ComputeTask.result(..) method (will fail the whole task)}} on the reduce node.

Also an exception is thrown from the {{execute()}} method that triggered this 
task.

It seems that none of these errors should actually be shown to users. This 
information should be printed only if debug logging is enabled for the 
corresponding packages.

The issue can be reproduced by running the following example:
{code:java}
public class ComputeException {
public static void main(String[] args) {
new ComputeException().run();
}

void run() {
IgniteConfiguration igniteCfg = 
Ignition.loadSpringBean("config/ignite.xml", "ignite.cfg");
igniteCfg.setClientMode(true);

try (Ignite ignite = Ignition.start(igniteCfg)) {
ignite.compute(ignite.cluster().forServers()).execute(new 
ErroneousTask(), null);
}
}

public static class ErroneousTask extends ComputeTaskAdapter {
@Override public @NotNull Map 
map(List list,
@Nullable Object o) throws IgniteException {
LinkedHashMap map = new 
LinkedHashMap<>();
for (ClusterNode node : list)
map.put(new ErroneousJob(), node);

return map;
}

@Override public @Nullable Object reduce(List list) 
throws IgniteException {
return null;
}
}

public static class ErroneousJob extends ComputeJobAdapter {
@Override public Object execute() throws IgniteException {
throw new IgniteException("I failed. Sorry :(");
}
}
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13399) CpuLoad metric reports -1 under Java 11 in embedded mode

2020-09-03 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-13399:
-

 Summary: CpuLoad metric reports -1 under Java 11 in embedded mode
 Key: IGNITE-13399
 URL: https://issues.apache.org/jira/browse/IGNITE-13399
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Mekhanikov


When running a node in embedded mode under Java 11, the CpuLoad metric reports 
-1. The process needs to be started with the following option: 

{{--add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED}}

We need to get rid of this requirement to run Java with additional flags to get 
proper values for the CpuLoad metric.

Some investigation was done under the following issue: 
https://issues.apache.org/jira/browse/IGNITE-13306



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13398) NPE in IgniteServiceProcessor when destroying a cache

2020-09-03 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-13398:
-

 Summary: NPE in IgniteServiceProcessor when destroying a cache 
 Key: IGNITE-13398
 URL: https://issues.apache.org/jira/browse/IGNITE-13398
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Mekhanikov
 Attachments: Main.java

Try running the attached reproducer: [^Main.java]. The following exception is 
printed to the logs:
{noformat}
Sep 03, 2020 12:13:58 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Failed to notify direct custom event listener: DynamicCacheChangeBatch 
[id=c1d6e335471-6bafb375-9d3e-487a-974d-35927ae02c04, reqs=ArrayList 
[DynamicCacheChangeRequest [cacheName=foo, hasCfg=false, 
nodeId=5e41fda8-e749-432c-9832-7b1c6ee3d0c8, clientStartOnly=false, stop=true, 
destroy=false, disabledAfterStartfalse]], exchangeActions=ExchangeActions 
[startCaches=null, stopCaches=[foo], startGrps=[], stopGrps=[foo, 
destroy=true], resetParts=null, stateChangeRequest=null], startCaches=false]
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.service.IgniteServiceProcessor.lambda$processDynamicCacheChangeRequest$6(IgniteServiceProcessor.java:1694)
at java.util.Collection.removeIf(Collection.java:414)
at 
org.apache.ignite.internal.processors.service.IgniteServiceProcessor.processDynamicCacheChangeRequest(IgniteServiceProcessor.java:1691)
at 
org.apache.ignite.internal.processors.service.IgniteServiceProcessor.access$200(IgniteServiceProcessor.java:108)
at 
org.apache.ignite.internal.processors.service.IgniteServiceProcessor$3.onCustomEvent(IgniteServiceProcessor.java:232)
at 
org.apache.ignite.internal.processors.service.IgniteServiceProcessor$3.onCustomEvent(IgniteServiceProcessor.java:229)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:665)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:528)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2608)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2646)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Distinguishing node-local metrics from cluster-wide

2020-08-05 Thread Denis Mekhanikov
Hi Igniters!

My team and I are building a monitoring system on top of the new metrics
framework described in the following IEP:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392
So far it's going well, but we'd like to improve the way metrics are
exported from Ignite.

There are different kinds of metrics that you can access through this
framework. Some of them are local for a node, like used heap, or CPU load.
It makes sense to send them independently from every node to the
centralized storage. Let's assume that we attach nodeID to metric names, so
that we can distinguish between metrics coming from different nodes.
It makes sense to work with local metrics using some kind of patterns on
metric names. For example, if I want to draw a chart for CPU load on every
node, I can use a pattern similar to the following one: sys.CpuLoad.*

There are also the kind of metrics that have the same value, no matter
which node the metric is taken from. For example, cache size, progress of
rebalance or topology version are global things that don't depend on the
node. If I take any of the metrics matching the pattern pme.Duration.*, I
will get what I need.

I wonder, what is the recommended approach to global metrics? I know that
there are tools like Prometheus and Graphite that allow similar
manipulations with metric names. Is it supposed that global and local
metrics are differentiated on the side of monitoring tools using functions
like any(pme.Duration.*) ? It seems that Graphite is lacking one, for
example.
Maybe it makes sense to introduce a property for metrics that will let the
exporters distinguish between them and not parameterize the names with node
ID?

What do you think?

Denis


Re: [DISCUSSION] Cache warmup

2020-08-04 Thread Denis Mekhanikov
Kirill,

When I discussed this functionality with Ignite users, I heard the
following thoughts about warming up:

   - Node restarts affect performance of queries. The main reason for that
   is that the pages that were loaded into memory before the restart are on
   disk after the restart. It takes time to reach the same distribution of
   data between memory and disk. Until that point the performance is usually
   degraded. No simple rule like "load everything" helps here if only a part
   of data fits in memory.
   - It would be nice to have a way to give preferences to indices when
   doing a warmup. Usually indices are used more often than data nodes, so
   loading indices first would bring more benefits.

The first point can be addressed by implementing the policy that would
restore the memory state that was observed before the restart. I don't see
how it can be implemented using the suggested interface.
The second one requires direct work with data pages, but not with a cache
context, so it's also impossible to implement.

When loading of all cache data is required, it can be done by running a
local scan query. It will iterate through all data pages and result in
their allocation in memory.

So, I don't really see a scenario when the suggested API will help. Do you
have a suitable use-case that will be covered?

Denis

вт, 4 авг. 2020 г. в 13:42, ткаленко кирилл :

> Hi, Denis!
>
> Previously, I answered Slava about implementation that I keep in mind, now
> it will be possible to add own warm-up strategy implementations. Which will
> be possible to implement in different ways.
>
> At the moment, I suggest implementing one "Load all" strategy, which will
> be effective if persistent storage is less than RAM.
>
>
> 28.07.2020, 19:46, "Denis Mekhanikov" :
> > Kirill,
> >
> > That will be a great feature! Other popular databases already have it
> (e.g.
> > Postgres: https://www.postgresql.org/docs/11/pgprewarm.html), so it's
> good
> > that we're also going to have it in Ignite.
> >
> > What implementation of CacheWarmup interface do you have in mind? Will
> > there be some preconfigured implementation, and will users be able to
> > implement it themselves?
> >
> > Do you think it should be cache-based? I would say that a
> DataRegion-based
> > warm-up would come more naturally. Page IDs that are loaded into the data
> > region can be dumped periodically to disk and recovered on restarts. This
> > is more or less how it works in Postgres.
> > I'm afraid that if we make it cache-based, the implementation won't be
> that
> > obvious. We already have an API for warmup that appeared to be pretty
> much
> > impossible to apply in a useful way:
> >
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteCache.html#preloadPartition-int-
> > Let's make sure that our new tool for warming up is actually useful.
> >
> > Denis
> >
> > вт, 28 июл. 2020 г. в 09:17, Zhenya Stanilovsky
>  >> :
> >
> >>  Looks like we need additional func for static caches, for
> >>  example: warmup(List cconf) it would be helpful for
> >>  spring too.
> >>
> >>  >
> >>  >--- Forwarded message ---
> >>  >From: "Вячеслав Коптилин" < slava.kopti...@gmail.com >
> >>  >To: dev@ignite.apache.org
> >>  >Cc:
> >>  >Subject: Re: [DISCUSSION] Cache warmup
> >>  >Date: Mon, 27 Jul 2020 16:47:48 +0300
> >>  >
> >>  >Hello Kirill,
> >>  >
> >>  >Thanks a lot for driving this activity. If I am not mistaken, this
> >>  >discussion relates to IEP-40.
> >>  >
> >>  >> I suggest adding a warmup phase after recovery here [1] after [2],
> >>  before
> >>  >discovery.
> >>  >This means that the user's thread, which starts Ignite via
> >>  >Ignition.start(), will wait for ana additional step - cache warm-up.
> >>  >I think this fact has to be clearly mentioned in our documentation (at
> >>  >Javadocat least) because this step can be time-consuming.
> >>  >
> >>  >> I suggest adding a new interface:
> >>  >I would change it a bit. First of all, it would be nice to place this
> >>  >interface to a public package and get rid of using GridCacheContext,
> >>  >which is an internal class and it should not leak to the public API
> in any
> >>  >case.
> >>  >Perhaps, this parameter is not needed at all or we should add some
> public
> >>  >abstraction instead of internal class.
> >>  >
> >>  &g

Re: Re[2]: Apache Ignite 2.9.0 RELEASE [Time, Scope, Manager]

2020-07-29 Thread Denis Mekhanikov
Guys,

Is there a chance to squeeze the fix for the following issue in:
https://issues.apache.org/jira/browse/IGNITE-13306
?
The issue makes the CPU load metric show -1 on Java 11. This is a quite
important metric, and this bug makes it harder to configure its monitoring.
Mirza, who's currently working on this issue says that he'll be able to
finish working on it today or tomorrow.

What do you think?

Denis

пт, 24 июл. 2020 г. в 11:04, Alex Plehanov :

> Guys,
>
> I've cherry-picked IGNITE-12438 (One-way client-server connections) and
> IGNITE-13038 (Web console removing) to 2.9.
>
> Since there are no objections I will move IGNITE-13006 (Spring libs upgrade
> to 5.2 version), IGNITE-12489 (Error during purges by expiration) and
> IGNITE-12553 (public Java metrics API) to the next release.
>
> I will cherry-pick ticket IGNITE-11942 (IGFS and Hadoop removing) after it
> will be reviewed and merged to master.
>
> What about IGNITE-12911 [1] (B+Tree Corrupted exception when using a key
> extracted from a BinaryObject value object --- and SQL enabled)? Anton
> Kalashnikov, Ilya Kasnacheev, can you please clarify the ticket status?
>
> [1]: https://issues.apache.org/jira/browse/IGNITE-12911
>
> чт, 23 июл. 2020 г. в 12:08, Alexey Kuznetsov :
>
> > Alex, Denis
> >
> > Issue with moving of Web Console to separate repository is merged to
> > master.
> > See:  https://issues.apache.org/jira/browse/IGNITE-13038
> >
> > Please, consider to cherry-pick to ignite-2.9
> >
> > ---
> > Alexey Kuznetsov
> >
> >
>


Re: [DISCUSSION] Cache warmup

2020-07-28 Thread Denis Mekhanikov
Kirill,

That will be a great feature! Other popular databases already have it (e.g.
Postgres: https://www.postgresql.org/docs/11/pgprewarm.html), so it's good
that we're also going to have it in Ignite.

What implementation of CacheWarmup interface do you have in mind? Will
there be some preconfigured implementation, and will users be able to
implement it themselves?

Do you think it should be cache-based? I would say that a DataRegion-based
warm-up would come more naturally. Page IDs that are loaded into the data
region can be dumped periodically to disk and recovered on restarts. This
is more or less how it works in Postgres.
I'm afraid that if we make it cache-based, the implementation won't be that
obvious. We already have an API for warmup that appeared to be pretty much
impossible to apply in a useful way:
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteCache.html#preloadPartition-int-
Let's make sure that our new tool for warming up is actually useful.

Denis

вт, 28 июл. 2020 г. в 09:17, Zhenya Stanilovsky :

>
> Looks like we need additional func for static caches, for
> example: warmup(List cconf) it would be helpful for
> spring too.
>
> >
> >--- Forwarded message ---
> >From: "Вячеслав Коптилин" < slava.kopti...@gmail.com >
> >To:  dev@ignite.apache.org
> >Cc:
> >Subject: Re: [DISCUSSION] Cache warmup
> >Date: Mon, 27 Jul 2020 16:47:48 +0300
> >
> >Hello Kirill,
> >
> >Thanks a lot for driving this activity. If I am not mistaken, this
> >discussion relates to IEP-40.
> >
> >> I suggest adding a warmup phase after recovery here [1] after [2],
> before
> >discovery.
> >This means that the user's thread, which starts Ignite via
> >Ignition.start(), will wait for ana additional step - cache warm-up.
> >I think this fact has to be clearly mentioned in our documentation (at
> >Javadocat least) because this step can be time-consuming.
> >
> >> I suggest adding a new interface:
> >I would change it a bit. First of all, it would be nice to place this
> >interface to a public package and get rid of using GridCacheContext,
> >which is an internal class and it should not leak to the public API in any
> >case.
> >Perhaps, this parameter is not needed at all or we should add some public
> >abstraction instead of internal class.
> >
> >package org.apache.ignite.configuration;
> >
> >import org.apache.ignite.IgniteCheckedException;
> >import org.apache.ignite.lang.IgniteFuture;
> >
> >public interface CacheWarmupper {
> >  /**
> >   * Warmup cache.
> >   *
> >   * @param cachename Cache name.
> >   * @return Future cache warmup.
> >   * @throws IgniteCheckedException If failed.
> >   */
> >  IgniteFuture warmup(String cachename) throws
> >IgniteCheckedException;
> >}
> >
> >Thanks,
> >S.
> >
> >пн, 27 июл. 2020 г. в 15:03, ткаленко кирилл < tkalkir...@yandex.ru >:
> >
> >> Now, after restarting node, we have only cold caches, which at first
> >> requests to them will gradually load data from disks, which can slow
> down
> >> first calls to them.
> >> If node has more RAM than data on disk, then they can be loaded at start
> >> "warmup", thereby solving the issue of slowdowns during first calls to
> >> caches.
> >>
> >> I suggest adding a warmup phase after recovery here [1] after [2],
> before
> >> descovery.
> >>
> >> I suggest adding a new interface:
> >>
> >> package org.apache.ignite.internal.processors.cache;
> >>
> >> import org.apache.ignite.IgniteCheckedException;
> >> import org.apache.ignite.internal.IgniteInternalFuture;
> >> import org.jetbrains.annotations.Nullable;
> >>
> >> /**
> >> * Interface for warming up cache.
> >> */
> >> public interface CacheWarmup {
> >> /**
> >> * Warmup cache.
> >> *
> >> * @param cacheCtx Cache context.
> >> * @return Future cache warmup.
> >> * @throws IgniteCheckedException if failed.
> >> */
> >> @Nullable IgniteInternalFuture process(GridCacheContext cacheCtx)
> >> throws IgniteCheckedException;
> >> }
> >>
> >> Which will allow to warm up caches in parallel and asynchronously.
> Warmup
> >> phase will end after all IgniteInternalFuture for all caches isDone.
> >>
> >> Also adding the ability to customize via methods:
> >>
> org.apache.ignite.configuration.IgniteConfiguration#setDefaultCacheWarmup
> >> org.apache.ignite.configuration.CacheConfiguration#setCacheWarmup
> >>
> >> Which will allow for each cache to set implementation of cache warming
> >> up,
> >> both for a specific cache, and for all if necessary.
> >>
> >> I suggest adding an implementation of SequentialWarmup that will use
> [3].
> >>
> >> Questions, suggestions, comments?
> >>
> >> [1] -
> >>
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#afterLogicalUpdatesApplied
> >> [2] -
> >>
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#restorePartitionStates
> >> [3] -
> >>
> 

Re: Listening cluster activation events by default

2020-07-21 Thread Denis Mekhanikov
Alex,

I think it makes sense to enable distribution of these events by default,
since they won't introduce any performance impact, and they may be pretty
useful for the application lifecycle management.

The following events also make sense to be enabled by default:

   - EVT_BASELINE_CHANGED
   - EVT_CLUSTER_STATE_CHANGED

Events that won't make any impact but don't have great practical use either:

   - EVT_BASELINE_AUTO_ADJUST_ENABLED_CHANGED
   - EVT_BASELINE_AUTO_ADJUST_AWAITING_TIME_CHANGED

Denis

вт, 21 июл. 2020 г. в 11:39, Alex Kozhenkov :

> Igniters,
>
> There are 2 events in Ignite (EVT_CLUSTER_ACTIVATED
> and EVT_CLUSTER_DEACTIVATED) that are only listened to by the coordinator.
> To listen to them by other nodes, they must be included in
> IgniteConfiguration.
>
> There are also discovery events that are listened to by all nodes.
>
> Both activation and discovery events are rare, system and cluster-wide, so
> I suggest to enable activation events by default on all nodes.
>


Re: Moving binary metadata to PDS storage folder

2020-05-18 Thread Denis Mekhanikov
Users shouldn't care about binary_meta and marshaller directories. Those
are internal details. Having to go through the documentation on those
things to figure out whether you need to run each of the provided scripts
or not will lead to terrible user experience.
If we add the binary_marshaller_move.sh script for this specific case, it
will stay with us forever just because migration from Ignite 2.8 to 2.9
required running this thing.

Denis


Re: Extensions for control.sh

2020-05-14 Thread Denis Mekhanikov

Anton,

Do you mean, that external plugins should be able to configure the 
connection that is used to communicate with a cluster? Could you give an 
example, what kind of plugin would benefit from it?


If there are some connection-specific properties that can change the way 
how control.sh communicates with a cluster, then it makes sense to 
donate such configuration to Ignite. Or am I missing something?


Denis

On 14.05.2020 11:43, Anton Vinogradov wrote:

Denis,

In addition to extending the features list it's also important to find some
way to allow customization of control.sh connection configuration/code.
For example, it may be useful to set some attributes to binary rest client.

On Thu, May 14, 2020 at 2:09 AM Denis Magda  wrote:


Perfect idea to use this the tool for configuration and addition of
extensions!

-
Denis


On Wed, May 13, 2020 at 11:43 AM Denis Mekhanikov 
wrote:


Hi everyone!

Control.sh is a command-line management tool that you can use to manage
your grid and check its vital parameters like topology version or
availability of baseline nodes. It has is good set of commands which are
suitable to work with vanilla Ignite.

There is also a way to extend functionality of Ignite by implementing a
3rd-party plugin or a module. Any plugin or external module should have
some kind of API to manage and monitor its activity.
If a command-line management command needs to be added, then the only
way to achieve that is to provide an additional script, separate from
control.sh. If you use multiple such plugins, then the set of required
tools may grow and lead to confusion, which script should be used to
configure which extension. Instead of doing that it would be convenient
for users to have ability to use the same script, but with an extended
set of options. It should make lifes of 3rd-party vendors easier.

Currently many integrations and community-supported modules are being
moved outside of the core product:


https://cwiki.apache.org/confluence/display/IGNITE/IEP-36%3A+Modularization

I think it makes sense to provide a possibility to configure extensions
using control.sh, since their number will grow over time, and some of
them will require some runtime configuration.

What do you think?

Denis




Re: Moving binary metadata to PDS storage folder

2020-05-13 Thread Denis Mekhanikov

Maxim,

This way we'll introduce a migration procedure between versions, which 
we currently don't have. Different Ignite versions are compatible by 
persistence storage. If we add a migration script, we need to decide, 
whether we need to run it every time when the version is upgraded, or 
only some specific versions are affected.


I suggest having a procedure that will look for metadata in the work 
directory, and if it finds it there, then the node will use it. 
Otherwise the persistence directory is used.


Denis

On 13.05.2020 21:40, Maxim Muzafarov wrote:

Folks,

I think it's important to discuss the following question regarding this thread:
Should we consider moving the migration procedure from the java
production code to migration scripts?

 From my understanding, keeping all such things in java production
source code has some disadvantages:
1. It executes only once at the migration stage.
2. It affects the complexity of the source code and code maintenance.
3. Node crash cases must be covered during the migration procedure.
4. It affects the production usage e.g. the process doesn't have the
right access to the old directory (migration already completed) and
will fail the node start.


The right behavior from my point should be:
1. Change the default path of binary/marshaller directory to the new one.
2. Provide migration scripts for users.

WDYT?

On Wed, 13 May 2020 at 21:10, Denis Mekhanikov  wrote:

Sounds great!

It happens pretty frequently that users migrate to a new version of
Ignite and preserve persistence files only without caring too much about
the work folder. But it turns out, that the work folder actually has
some important stuff.
This improvement should help with this issue.

What's about in-memory mode? As far as I know, we write binary metadata
to disk even when no persistence is configured. Do you plan to address
it in any way?

Denis

On 12.05.2020 15:56, Sergey Antonov wrote:

Hello Semyon,

This is a good idea!

вт, 12 мая 2020 г. в 15:53, Вячеслав Коптилин :


Hello Semyon,

This is a good and long-awaited improvement! Thank you for your efforts!

Thanks,
S.

вт, 12 мая 2020 г. в 15:11, Данилов Семён :


Hello!

I would like to propose moving /binary_meta and /marshaller folders to

the

PDS folder.

Motivation: data, directly related to the persistence, is stored outside
the persistence dir, which can lead to various issues and also is not

very

convenient to use. In particular, with k8s, deployment disk that is
attached to a container can not be accessed from other containers or
outside of k8s. In case if support will need to drop persistence except
data, there will be no way to recover due to binary metadata is required

to

process PDS files.

I created an issue (https://issues.apache.org/jira/browse/IGNITE-12994)

and a

pull request(https://github.com/apache/ignite/pull/7792) that fixes the
issue.

In that PR I made the following:


* store binary meta and marshaller data inside db/ folder
* if binary meta of marshaller are found in "legacy" locations --
safely move them to new locations during the node startup


Kind regards,

Semyon Danilov.



Extensions for control.sh

2020-05-13 Thread Denis Mekhanikov

Hi everyone!

Control.sh is a command-line management tool that you can use to manage 
your grid and check its vital parameters like topology version or 
availability of baseline nodes. It has is good set of commands which are 
suitable to work with vanilla Ignite.


There is also a way to extend functionality of Ignite by implementing a 
3rd-party plugin or a module. Any plugin or external module should have 
some kind of API to manage and monitor its activity.
If a command-line management command needs to be added, then the only 
way to achieve that is to provide an additional script, separate from 
control.sh. If you use multiple such plugins, then the set of required 
tools may grow and lead to confusion, which script should be used to 
configure which extension. Instead of doing that it would be convenient 
for users to have ability to use the same script, but with an extended 
set of options. It should make lifes of 3rd-party vendors easier.


Currently many integrations and community-supported modules are being 
moved outside of the core product: 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-36%3A+Modularization
I think it makes sense to provide a possibility to configure extensions 
using control.sh, since their number will grow over time, and some of 
them will require some runtime configuration.


What do you think?

Denis



Re: Moving binary metadata to PDS storage folder

2020-05-13 Thread Denis Mekhanikov

Sounds great!

It happens pretty frequently that users migrate to a new version of 
Ignite and preserve persistence files only without caring too much about 
the work folder. But it turns out, that the work folder actually has 
some important stuff.

This improvement should help with this issue.

What's about in-memory mode? As far as I know, we write binary metadata 
to disk even when no persistence is configured. Do you plan to address 
it in any way?


Denis

On 12.05.2020 15:56, Sergey Antonov wrote:

Hello Semyon,

This is a good idea!

вт, 12 мая 2020 г. в 15:53, Вячеслав Коптилин :


Hello Semyon,

This is a good and long-awaited improvement! Thank you for your efforts!

Thanks,
S.

вт, 12 мая 2020 г. в 15:11, Данилов Семён :


Hello!

I would like to propose moving /binary_meta and /marshaller folders to

the

PDS folder.

Motivation: data, directly related to the persistence, is stored outside
the persistence dir, which can lead to various issues and also is not

very

convenient to use. In particular, with k8s, deployment disk that is
attached to a container can not be accessed from other containers or
outside of k8s. In case if support will need to drop persistence except
data, there will be no way to recover due to binary metadata is required

to

process PDS files.

I created an issue (https://issues.apache.org/jira/browse/IGNITE-12994)

and a

pull request(https://github.com/apache/ignite/pull/7792) that fixes the
issue.

In that PR I made the following:


* store binary meta and marshaller data inside db/ folder
* if binary meta of marshaller are found in "legacy" locations --
safely move them to new locations during the node startup


Kind regards,

Semyon Danilov.





Re: Server Node comes down with : (err) Failed to notify listener: GridDhtTxPrepareFuture Error

2020-03-26 Thread Denis Mekhanikov
Thanks for the report!

The issue here is that a remote filter for a continuous query is loaded
using peer class loading, and other classes that this remote filter depends
on can be lazily loaded during its work.
Loading every dependency class involves going to the node where the
originating class was loading from, and asking that node to send missing
classes over the network.
The issues begin when this node is not in the cluster anymore, and the
continuous query wasn't undeployed yet.
A server sends a request for a class to a node that is not available, but
wasn't kicked out of the topology yet, since a failure detection timeout
hasn't elapsed yet.
It leads to a NoClasDefFound exception that you observe in the logs.

The biggest issue here is that this exception triggers a failure handler
that makes the whole node go down.
I would expect that only one request would fail, but not the whole node.

As a temporary solution you can stop relying on peer class loading for
continuous queries and provide the code of remote filters to the classpath
of server nodes.
This way no lazy class loading will be performed over the network since
they will all be available locally.

Denis

пт, 13 мар. 2020 г. в 20:39, VeenaMithare :

> Raised this jira :
> https://issues.apache.org/jira/browse/IGNITE-12784
>
> Observed in 2.7.6. Unable to easily test in 2.8.0 because of other issues.
> One of them being -
>
> http://apache-ignite-users.70518.x6.nabble.com/2-8-0-JDBC-Thin-Client-Unable-to-load-the-tables-via-DBeaver-td31681.html
> Please note this happens
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


[jira] [Created] (IGNITE-12794) Scan query fails with an assertion error: Unexpected row key

2020-03-17 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-12794:
-

 Summary: Scan query fails with an assertion error: Unexpected row 
key
 Key: IGNITE-12794
 URL: https://issues.apache.org/jira/browse/IGNITE-12794
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.8
Reporter: Denis Mekhanikov
Assignee: Denis Mekhanikov
 Attachments: ScanQueryExample.java

Scan query fails with an exception:
{noformat}
Exception in thread "main" java.lang.AssertionError: Unexpected row key
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:548)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:512)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$ScanQueryIterator.advance(GridCacheQueryManager.java:3045)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$ScanQueryIterator.onHasNext(GridCacheQueryManager.java:2997)
at 
org.apache.ignite.internal.util.GridCloseableIteratorAdapter.hasNextX(GridCloseableIteratorAdapter.java:53)
at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
at 
org.apache.ignite.internal.processors.cache.QueryCursorImpl.getAll(QueryCursorImpl.java:127)
at scan.ScanQueryExample.main(ScanQueryExample.java:31)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)"
{noformat}
The issue is reproduced when performing concurrent scan queries and updates. A 
reproducer is attached. You will need to enable asserts in order to reproduce 
this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12753) Cache SSL contexts in SslContextFactory

2020-03-05 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-12753:
-

 Summary: Cache SSL contexts in SslContextFactory
 Key: IGNITE-12753
 URL: https://issues.apache.org/jira/browse/IGNITE-12753
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Mekhanikov
Assignee: Denis Mekhanikov


When SSL is enabled in a cluster, SslContextFactory#createSslContext is created 
every time when connections between nodes are created. It involves accessing 
key storage on disk. It may slow down creation of new communication connections 
and block striped pool threads if disks are slow.

SSL contexts are stateless and can be shared between threads, so it's safe to 
create an SSL context once and use the same instance every time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Reference of local service.

2020-03-02 Thread Denis Mekhanikov
Vyacheslav,

You can't make service interfaces extend
*org.apache.ignite.services.Service*. Currently it works perfectly if
*org.apache.ignite.services.Service* and a user-defined interface are
independent. This is actually the case in our current examples:
https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/servicegrid/SimpleMapService.java
I mentioned the *Serializable* interface just as an example of an interface
that can be present, but it's not the one that is going to be called by a
user.

What I'm trying to say is that there is no way to say whether the service
is going to be used through a proxy only, or usage of a local instance is
also possible.

Vladimir,

I don't like the idea, that enabling or disabling of metrics will change
the behaviour of the component you collect the metrics for. Such behaviour
is far from obvious.

Nikolay,

I agree, that such approach is valid and makes total sense. But making the
*IgniteServices#serviceProxy()* method always return a proxy instead of a
local instance will change the public contract. The javadoc currently says
the following:

> If service is available locally, then local instance is returned,
> otherwise, a remote proxy is dynamically created and provided for the
> specified service.


I propose introducing a new method that will always return a service proxy
regardless of local availability, and deprecating *serviceProxy()* and
*service()
*methods. What do you think?

Denis

пн, 2 мар. 2020 г. в 16:08, Nikolay Izhikov :

> Hello, Vladimir.
>
> > What if we just provide an option to disable service metrics at all?
>
> I don't think we should create an explicit property for service metrics.
> We will implement the way to disable any metrics in the scope of
> IGNITE-11927 [1].
>
> > Usage of a proxy instead of service instances can lead to performance
> > degradation for local instances, which is another argument against such
> change.
>
> As far as I know, many and many modern frameworks use a proxy approach.
> Just to name one - Spring framework works with the proxy.
>
> We should measure the impact on the performance that brings proxy+metric
> and after it make the decision on local service metrics implementation.
> Vladimir, can you, as a contributor of this task make this measurement?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-11927
>
> пн, 2 мар. 2020 г. в 12:56, Vladimir Steshin :
>
> > Denis, Vyacheslav, hi.
> >
> > What if we just provide an option to disable service metrics at all? It
> > would keep direct references for local services. Also, we can make
> service
> > metrics disabled by default to keep current code working. A warning of
> > local service issues will be set with the option.
> >
> > пн, 2 мар. 2020 г. в 11:26, Vyacheslav Daradur :
> >
> > > >> Moreover, I don't see a way of implementing such a check. Are you
> > going
> > > to look just for any interface? What about Serializable? Will it do?
> > >
> > > The check should look for the interface which implements
> > > "org.apache.ignite.services.Service", it covers the requirement to be
> > > Serializable.
> > >
> > > >> For now though the best thing we can do is to calculate remote
> > > invocations only, since all of them go through a proxy.
> > >
> > > Let's introduce a system property to manage local services monitoring:
> > > - local services monitoring will be disabled by default - to avoid any
> > > backward compatibility issues;
> > > - local services monitoring can be enabled runtime with a known
> > limitation
> > > for new services for example;
> > > Moreover, if we introduce such a feature flag to ServiceConfiguration -
> > > the new feature can be enabled per service separately.
> > >
> > > What do you think?
> > >
> > >
> > >
> > > On Mon, Mar 2, 2020 at 12:33 AM Denis Mekhanikov <
> dmekhani...@gmail.com>
> > > wrote:
> > >
> > >> Vladimir, Slava,
> > >>
> > >> In general, I like the idea of abstracting the service deployment from
> > >> its usage, but there are some backward-compatibility considerations
> that
> > >> won't let us do so.
> > >>
> > >> Or we can declare usage of services without interfaces incorrect
> > >>
> > >>
> > >> I don't think we can introduce a requirement for all services to have
> an
> > >> interface, unfortunately. Such change can potentially break existing
> > code,
> > >> since such requirement doesn't exist currently.
> >

Re: Reference of local service.

2020-03-01 Thread Denis Mekhanikov
Vladimir, Slava,

In general, I like the idea of abstracting the service deployment from its
usage, but there are some backward-compatibility considerations that won't
let us do so.

Or we can declare usage of services without interfaces incorrect


I don't think we can introduce a requirement for all services to have an
interface, unfortunately. Such change can potentially break existing code,
since such requirement doesn't exist currently.
Moreover, I don't see a way of implementing such a check. Are you going to
look just for any interface? What about Serializable? Will it do?

Usage of a proxy instead of service instances can lead to performance
degradation for local instances, which is another argument against such
change.

I think, it will make sense to make all service invocations work through a
proxy in Ignite 3.
For now though the best thing we can do is to calculate remote invocations
only, since all of them go through a proxy.
Another option is to provide a simple way for a user to account the service
invocations themselves.

What do you guys think?

Denis


вт, 25 февр. 2020 г. в 16:50, Vyacheslav Daradur :

> It is not a change of public API from my point of view.
>
> Also, there is a check to allow getting proxy only for an interface, not
> implementation.
>
> Denis, what do you think?
>
>
> вт, 25 февр. 2020 г. в 16:28, Vladimir Steshin :
>
>> Vyacheslav, this is exactly what I found. I'm doing [1] (metrics for
>> services) and realized I have to wrap local calls by a proxy. Is it a
>> change of public API and should come with major release only? Or we can
>> declare usage of services without interfaces incorrect?
>> [1] https://issues.apache.org/jira/browse/IGNITE-12464
>>
>> вт, 25 февр. 2020 г. в 16:17, Vyacheslav Daradur :
>>
>>> {IgniteServices#service(String name)} returns direct reference in the
>>> current implementation.
>>>
>>> So, class casting should work for your example:
>>> ((MyServiceImpl)ignite.services().service(“myService”)).bar();
>>>
>>> It is safer to use an interface instead of an implementation, there is
>>> no guarantee that in future releases direct link will be returned, a
>>> service instance might be wrapped for monitoring for example.
>>>
>>>
>>> On Tue, Feb 25, 2020 at 4:09 PM Vladimir Steshin 
>>> wrote:
>>>
 Vyacheslav, Hi.

 I see. But can we consider 'locally deployed service' is a proxy too,
 not direct reference? What if I need to wrap it? This would be local
 service working via proxy or null.

 вт, 25 февр. 2020 г. в 16:03, Vyacheslav Daradur :

> Hi, Vladimir
>
> The answer is in API docs: "Gets *locally deployed service* with
> specified name." [1]
>
> That means {IgniteServices#service(String name)} returns only locally
> deployed instance or null.
>
> {IgniteServices#serviceProxy(…)} returns proxy to call instances
> across the cluster. Might be used for load-balancing.
>
> [1]
> https://github.com/apache/ignite/blob/56975c266e7019f307bb9da42333a6db4e47365e/modules/core/src/main/java/org/apache/ignite/IgniteServices.java#L569
>
> On Tue, Feb 25, 2020 at 3:51 PM Vladimir Steshin 
> wrote:
>
>> Hello, Igniters.
>>
>> Previous e-mail was with wrong topic 'daradu...@gmail.com' :)
>>
>> I got a question what exactly IgniteServices#service(String name) is
>> supposed to return: reference to the object or a proxy for some reason 
>> like
>> IgniteServices#serviceProxy(…)? Vyacheslav D., can you tell me your 
>> opinion?
>>
>> public interface MyService {
>>
>>public void foo();
>>
>> }
>>
>> public class MyServiceImpl implements Service, MyService {
>>
>>@Override public void foo(){ … }
>>
>>public void bar(){ … };
>>
>> }
>>
>>
>> // Is it required to support
>>
>> MyServiceImpl srvc = ignite.services().service(“myService”);
>>
>> srvc.foo();
>>
>> srvc.bar();
>>
>>
>>
>> // Or is the only correct way:
>>
>> MyService srvc = ignite.services().service(“myService”);
>>
>> srvc.foo();
>>
>>
>
> --
> Best Regards, Vyacheslav D.
>

>>>
>>> --
>>> Best Regards, Vyacheslav D.
>>>
>> --
> Best Regards, Vyacheslav D.
>


[jira] [Created] (IGNITE-12702) Print warning when a cache value contains @AffinityKeyMapped annotation

2020-02-19 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-12702:
-

 Summary: Print warning when a cache value contains 
@AffinityKeyMapped annotation
 Key: IGNITE-12702
 URL: https://issues.apache.org/jira/browse/IGNITE-12702
 Project: Ignite
  Issue Type: Improvement
  Components: cache
Reporter: Denis Mekhanikov


Consider the following code snippet:
{code:java}
public class WrongAffinityExample {
public static void main(String[] args) {
Ignite ignite = Ignition.start("config/ignite.xml");

IgniteCache cache = 
ignite.getOrCreateCache("employees");

EmployeeKey key = new EmployeeKey(1);
EmployeeValue value = new EmployeeValue(1, "Denis");
cache.put(key, value);
}

public static class EmployeeKey {
private int id;

public EmployeeKey(int id) {
this.id = id;
}
}

public static class EmployeeValue {
@AffinityKeyMapped
int departmentId;
String name;

public EmployeeValue(int departmentId, String name) {
this.departmentId = departmentId;
this.name = name;
}
}
}
{code}
Note, that {{EmployeeValue}} contains an {{@AffinityKeyMapped}} annotation, 
which doesn't have any effect, since it's specified in a value, and not in a 
key.

Such mistake is simple to make and pretty hard to track down.
 This configuration should trigger a warning message printed in log to let the 
user know that this affinity key configuration is not applied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12480) Add BinaryFieldExtractionSelfTest to the Binary Objects test suite

2019-12-20 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-12480:
-

 Summary: Add BinaryFieldExtractionSelfTest to the Binary Objects 
test suite
 Key: IGNITE-12480
 URL: https://issues.apache.org/jira/browse/IGNITE-12480
 Project: Ignite
  Issue Type: Test
  Components: binary
Reporter: Denis Mekhanikov
Assignee: Denis Mekhanikov
 Fix For: 2.8


BinaryFieldExtractionSelfTest is not run on TeamCity because it's not included 
into any test suite.
It should be added into IgniteBinaryObjectsTestSuite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12479) All binary types are registered twice

2019-12-20 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-12479:
-

 Summary: All binary types are registered twice
 Key: IGNITE-12479
 URL: https://issues.apache.org/jira/browse/IGNITE-12479
 Project: Ignite
  Issue Type: Bug
  Components: binary
Reporter: Denis Mekhanikov
Assignee: Denis Mekhanikov
 Fix For: 2.8


When a POJO is put into a cache, its binary type is registered twice during 
marshalling.

Example:
{code:java}
public class MetadataRegistrationExample {
public static void main(String[] args) {
Ignite ignite = Ignition.start("config/ignite.xml");
Person p = new Person("Denis");
ignite.getOrCreateCache("cache").put(1, p);
}

static class Person {
private String name;
public Person(String name) {
this.name = name;
}
}
}
{code}
 

Here is the generated debug log from the package
{noformat}
[23:31:14,020][DEBUG][main][CacheObjectBinaryProcessorImpl] Requesting metadata 
update [typeId=-1210012928, 
typeName=binary.NestedObjectMarshallingExample$Person, changedSchemas=[], 
holder=null, fut=MetadataUpdateResultFuture [key=SyncKey [typeId=-1210012928, 
ver=0]]]
[23:31:14,023][DEBUG][disco-notifier-worker-#41][CacheObjectBinaryProcessorImpl]
 Received MetadataUpdateProposedListener [typeId=-1210012928, 
typeName=binary.NestedObjectMarshallingExample$Person, pendingVer=0, 
acceptedVer=0, schemasCnt=0]
[23:31:14,024][DEBUG][disco-notifier-worker-#41][CacheObjectBinaryProcessorImpl]
 Versions are stamped on coordinator [typeId=-1210012928, changedSchemas=[], 
pendingVer=1, acceptedVer=0]
[23:31:14,024][DEBUG][disco-notifier-worker-#41][CacheObjectBinaryProcessorImpl]
 Updated metadata on originating node: [typeId=-1210012928, pendingVer=1, 
acceptedVer=0]
[23:31:14,025][DEBUG][disco-notifier-worker-#41][CacheObjectBinaryProcessorImpl]
 Received MetadataUpdateAcceptedMessage MetadataUpdateAcceptedMessage 
[id=599e0a86c61-183a790b-7038-4dd5-b99d-89f1483e3635, typeId=-1210012928, 
acceptedVer=1, duplicated=false]
[23:31:14,025][DEBUG][disco-notifier-worker-#41][CacheObjectBinaryProcessorImpl]
 Completing future MetadataUpdateResultFuture [key=SyncKey [typeId=-1210012928, 
ver=1]] for [typeId=-1210012928, pendingVer=1, acceptedVer=1]
[23:31:14,026][DEBUG][main][CacheObjectBinaryProcessorImpl] Completed metadata 
update [typeId=-1210012928, 
typeName=binary.NestedObjectMarshallingExample$Person, waitTime=4ms, 
fut=MetadataUpdateResultFuture [key=SyncKey [typeId=-1210012928, ver=1]], 
tx=null]
[23:31:14,027][DEBUG][main][CacheObjectBinaryProcessorImpl] Requesting metadata 
update [typeId=-1210012928, 
typeName=binary.NestedObjectMarshallingExample$Person, 
changedSchemas=[1975878747], holder=[typeId=-1210012928, pendingVer=1, 
acceptedVer=1], fut=MetadataUpdateResultFuture [key=SyncKey 
[typeId=-1210012928, ver=0]]]
[23:31:14,027][DEBUG][disco-notifier-worker-#41][CacheObjectBinaryProcessorImpl]
 Received MetadataUpdateProposedListener [typeId=-1210012928, 
typeName=binary.NestedObjectMarshallingExample$Person, pendingVer=0, 
acceptedVer=0, schemasCnt=1]
[23:31:14,028][DEBUG][disco-notifier-worker-#41][CacheObjectBinaryProcessorImpl]
 Versions are stamped on coordinator [typeId=-1210012928, 
changedSchemas=[1975878747], pendingVer=2, acceptedVer=1]
[23:31:14,028][DEBUG][disco-notifier-worker-#41][CacheObjectBinaryProcessorImpl]
 Updated metadata on originating node: [typeId=-1210012928, pendingVer=2, 
acceptedVer=1]
[23:31:14,028][DEBUG][disco-notifier-worker-#41][CacheObjectBinaryProcessorImpl]
 Received MetadataUpdateAcceptedMessage MetadataUpdateAcceptedMessage 
[id=d99e0a86c61-183a790b-7038-4dd5-b99d-89f1483e3635, typeId=-1210012928, 
acceptedVer=2, duplicated=false]
[23:31:14,028][DEBUG][disco-notifier-worker-#41][CacheObjectBinaryProcessorImpl]
 Completing future MetadataUpdateResultFuture [key=SyncKey [typeId=-1210012928, 
ver=2]] for [typeId=-1210012928, pendingVer=2, acceptedVer=2]
[23:31:14,029][DEBUG][main][CacheObjectBinaryProcessorImpl] Completed metadata 
update [typeId=-1210012928, 
typeName=binary.NestedObjectMarshallingExample$Person, waitTime=1ms, 
fut=MetadataUpdateResultFuture [key=SyncKey [typeId=-1210012928, ver=2]], 
tx=null]
{noformat}

You can see, that a type is registered twice. First it's registered without any 
fields, and only the second time the type is registered properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Service grid webinar

2019-11-05 Thread Denis Mekhanikov
Hi Igniters!

I’ve been working on the Service Grid functionality in Apache Ignite for a 
while, and at some point I've decided to make a webinar with a high-level 
overview of this part of the project.

If you want to learn more about services, look at some use-cases or just ask a 
few questions somebody who takes part in the development, please feel free to 
join the presentation on November 6th, at 10 AM PST.

You can sign up by the following link: 
https://www.gridgain.com/resources/webinars/best-practices-microservices-architecture-apache-ignite

Denis


Re: Gracefully shutting down the data grid

2019-10-08 Thread Denis Mekhanikov
Shiva,

What version of Ignite do you use and do you have security configured in the 
cluster?

There was a bug in Ignite before version 2.7, that has similar symptoms: 
https://issues.apache.org/jira/browse/IGNITE-7624
It’s fixed under the following ticket: 
https://issues.apache.org/jira/browse/IGNITE-9535

Try updating to the latest version of Ignite and see if the issue is resolved 
there.

If this is not your case, then please collect thread dumps from all nodes and 
share them in this thread. Logs will also be useful.
Please don’t add it to the message body, use attachment.

Denis
On 30 Sep 2019, 17:49 +0300, Shiva Kumar , wrote:
> Hi all,
>
> I am trying to deactivate a cluster which is being connected with few clients 
> over JDBC.
> As part of these clients connections, it inserts some records to many tables 
> and runs some long-running queries.
> At this time I am trying to deactivate the cluster [basically trying to take 
> data backup, so before this, I need to de-activate the cluster] But 
> de-activation is hanging and control.sh not returning the control and hangs 
> infinitely.
> when I check the current cluster state with rest API calls it sometime it 
> returns saying cluster is inactive.
> After some time I am trying to activate the cluster but it returns this error:
>
> [root@ignite-test]# curl 
> "http://ignite-service-shiv.ignite.svc.cluster.local:8080/ignite?cmd=activate=ignite=ignite;
>   | jq
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  
> Current
>                                  Dload  Upload   Total   Spent    Left  Speed
> 100   207  100   207    0     0   2411      0 --:--:-- --:--:-- --:--:--  2406
> {
>   "successStatus": 0,
>   "sessionToken": "654F094484E24232AA74F35AC5E83481",
>   "error": "Failed to activate, because another state change operation is 
> currently in progress: deactivate\nsuppressed: \n",
>   "response": null
> }
>
>
> This means that my earlier de-activation has not succeeded properly.
> Is there any other way to de-activate the cluster or to terminate the 
> existing client connections or to terminate the running queries.
> I tried "kill -k -ar" from visor shell but it restarts few nodes and it ended 
> up with some exception related to page corruption.
> Note: My Ignite deployment is on Kubernetes
>
> Any help is appreciated.
>
> regards,
> shiva
>
>


[jira] [Created] (IGNITE-12265) JavaDoc doesn't have documentation for the org.apache.ignite.client package

2019-10-07 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-12265:
-

 Summary: JavaDoc doesn't have documentation for the 
org.apache.ignite.client package
 Key: IGNITE-12265
 URL: https://issues.apache.org/jira/browse/IGNITE-12265
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Mekhanikov


JavaDoc published on the website doesn't have documentation for the 
{{org.apache.ignite.client}} package. Link to the website: 
[https://ignite.apache.org/releases/2.7.6/javadoc/]

A lack of {{package-info.java}} file or exclusion from the 
{{maven-javadoc-plugin}} in the root {{pom.xml}} may be the reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12237) Forbid thin client connections dynamically

2019-09-27 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-12237:
-

 Summary: Forbid thin client connections dynamically
 Key: IGNITE-12237
 URL: https://issues.apache.org/jira/browse/IGNITE-12237
 Project: Ignite
  Issue Type: Improvement
  Components: thin client
Reporter: Denis Mekhanikov


Sometimes it's useful to forbid thin clients connections to nodes for some 
period of time. At this time cluster may be performing some activation needed 
for correct work of the application.

It would be good to have an API call, opening and closing thin client 
connections.

This feature was requested in the following StackOverflow question: 
https://stackoverflow.com/questions/58106297/how-to-block-java-thin-client-request-till-preloading-of-data-in-ignite-cluster



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: nodes are restarting when i try to drop a table created with persistence enabled

2019-09-25 Thread Denis Mekhanikov
I think, the issue is that Ignite can't recover from
IgniteOutOfMemory, even by removing data.
Shiva, did IgniteOutOfMemory occur for the first time when you did the
DROP TABLE, or before that?

Denis

ср, 25 сент. 2019 г. в 02:30, Denis Magda :
>
> Shiva,
>
> Does this issue still exist? Ignite Dev how do we debug this sort of thing?
>
> -
> Denis
>
>
> On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar  wrote:
>>
>> Hi dmagda,
>>
>> I am trying to drop the table which has around 10 million records and I am 
>> seeing "Out of memory in data region" error messages in Ignite logs and 
>> ignite node [Ignite pod on kubernetes] is restarting.
>> I have configured 3GB for default data region, 7GB for JVM and total 15GB 
>> for Ignite container and enabled native persistence.
>> Earlier I was in an impression that restart was caused by 
>> "SYSTEM_WORKER_BLOCKED" errors but now I am realized that  
>> "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual cause 
>> is " CRITICAL_ERROR " due to  "Out of memory in data region"
>>
>> This is the error messages in logs:
>>
>> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted 
>> immediately due to the failure: [failureCtx=FailureContext 
>> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: 
>> Failed to find a page for eviction [segmentCapacity=971652, loaded=381157, 
>> maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, 
>> failedToPrepare=381155]
>> Out of memory in data region [name=Default_Region, initSize=500.0 MiB, 
>> maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>>   ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
>>   ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
>>   ^-- Enable eviction or expiration policies]]
>>
>> Could you please help me on why drop table operation causing  "Out of memory 
>> in data region"? and how I can avoid it?
>>
>> We have a use case where application inserts records to many tables in 
>> Ignite simultaneously for some time period and other applications run a 
>> query on that time period data and update the dashboard. we need to delete 
>> the records inserted in the previous time period before inserting new 
>> records.
>>
>> even during delete from table operation, I have seen:
>>
>> "Critical system error detected. Will be handled accordingly to configured 
>> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
>> super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], 
>> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class 
>> o.a.i.IgniteException: Checkpoint read lock acquisition has been timed 
>> out.]] class org.apache.ignite.IgniteException: Checkpoint read lock 
>> acquisition has been timed out.|
>>
>>
>>
>> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda  wrote:
>>>
>>> Hi Shiva,
>>>
>>> That was designed to prevent global cluster performance degradation or 
>>> other outages. Have you tried to apply my recommendation of turning of the 
>>> failure handler for this system threads?
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar  
>>> wrote:

 HI Denis,

 is there any specific reason for the blocking of critical thread, like CPU
 is full or Heap is full ?
 We are again and again hitting this issue.
 is there any other way to drop tables/cache ?
 This looks like a critical issue.

 regards,
 shiva



 --
 Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Nabble message wrapping

2019-09-17 Thread Denis Mekhanikov
I've created an INFRA ticket: https://issues.apache.org/jira/browse/INFRA-19042

Denis
On 12 Sep 2019, 00:46 +0300, Denis Magda , wrote:
> Thanks for sharing details.
>
> 2. Change the ezmlm configuration to match the one of the users list. We
> > need to find a person who has access to the Apache infra in order to do this
>
>
> We can open a ticket for ASF INFRA. Could you please do that and see what
> they say?
>
> -
> Denis
>
>
> On Wed, Sep 11, 2019 at 1:36 AM Denis Mekhanikov 
> wrote:
>
> > Denis,
> >
> > I did some investigation on this and started a Nabble support thread:
> > http://support.nabble.com/Line-wrapping-in-quotes-td7604136.html
> > It turned out, that Nabble has a little to do with it. It’s the mail list
> > itself.
> >
> > When you send a plain text message to the dev list either from Nabble or
> > in any other way, it’s getting wrapped to be 80 characters wide.
> > HTML messages don’t have this issue. Nabble uses plain text by default.
> > User list doesn’t wrap quotes though, even if you use plain text format.
> > You can see that my reply in the following thread is wrapped, but not the
> > quote:
> > http://apache-ignite-users.70518.x6.nabble.com/Ignition-Start-Timeout-if-connection-is-unsuccessful-td29289.html
> >
> > I see two possibilities of solving this issue:
> >
> > 1. Make Nabble always send messages in HTML format somehow. The issue will
> > still occur for other mail clients though.
> > 2. Change the ezmlm configuration to match the one of the users list. We
> > need to find a person who has access to the Apache infra in order to do
> > this.
> >


Re: Nabble message wrapping

2019-09-11 Thread Denis Mekhanikov
Denis,

I did some investigation on this and started a Nabble support thread: 
http://support.nabble.com/Line-wrapping-in-quotes-td7604136.html
It turned out, that Nabble has a little to do with it. It’s the mail list 
itself.

When you send a plain text message to the dev list either from Nabble or in any 
other way, it’s getting wrapped to be 80 characters wide.
HTML messages don’t have this issue. Nabble uses plain text by default.
User list doesn’t wrap quotes though, even if you use plain text format. You 
can see that my reply in the following thread is wrapped, but not the quote: 
http://apache-ignite-users.70518.x6.nabble.com/Ignition-Start-Timeout-if-connection-is-unsuccessful-td29289.html

I see two possibilities of solving this issue:

1. Make Nabble always send messages in HTML format somehow. The issue will 
still occur for other mail clients though.
2. Change the ezmlm configuration to match the one of the users list. We need 
to find a person who has access to the Apache infra in order to do this.


Denis
On 11 Sep 2019, 00:53 +0300, Denis Magda , wrote:
> Denis, could you set up the dev list the way needed?
>
> -
> Denis
>
>
> On Thu, Aug 29, 2019 at 1:18 PM Denis Magda  wrote:
>
> > Denis,
> >
> > I granted your account the admin access. Please do all the required
> > changes and let me know once done.
> >
> > -
> > Denis
> >
> >
> > On Thu, Aug 29, 2019 at 2:00 AM Denis Mekhanikov 
> > wrote:
> >
> > > Guys,
> > >
> > > Any update? Who has a Nabble admin account for the developers list?
> > >
> > > Denis
> > > On 27 Aug 2019, 14:55 +0300, Dmitriy Pavlov , wrote:
> > > > Hi Denis,
> > > >
> > > > AFAIK, nabble forums are service, which resides outside of ASF infra.
> > > They
> > > > have their separate support/feedback form.
> > > >
> > > > Maybe some PMC members could have some credentials for this service. If
> > > so,
> > > > please share login info in SVN/private/credentials.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > >
> >


Re: The ASF Slack

2019-09-10 Thread Denis Mekhanikov
Guys, please disregard my previous message.

What I was missing is the following link: https://s.apache.org/slack-invite

Denis
On 10 Sep 2019, 12:52 +0300, Denis Mekhanikov , wrote:
> Anton,
>
> You need to have an apache email to register in the ASF Slack. Is it
> supposed to be like this? I don't think it's fair to limit the circle
> of communication this way.
> Non-blocking PME discussion is going to happen there, for example, and
> I won't be able to attend it, since I don't have an Apache email. But
> I'd really like to hear the discussion.
> Is there a way to relief this limitation?
>
> Denis
>
> пн, 26 авг. 2019 г. в 21:58, Denis Magda :
> >
> > Anton,
> >
> > Thanks for starting the conversation. I think that we need to explain to
> > our community how and why we came to this proposal. I've started another
> > discussion with details, sorry for not doing it earlier:
> > http://apache-ignite-developers.2346864.n4.nabble.com/Making-Ignite-Collaboration-100-Open-and-Transparent-td43244.html
> >
> > --
> > Denis
> >
> >
> >
> > On Mon, Aug 26, 2019 at 7:43 AM Andrey Gura  wrote:
> >
> > > > But the slack is not indexing and not accessible for all people in the
> > > Internet
> > >
> > > +1
> > >
> > > On Mon, Aug 26, 2019 at 4:56 PM Alexey Zinoviev 
> > > wrote:
> > > >
> > > > I think that channels for separate IEP-s is too much.
> > > > But the slack is not indexing and not accessible for all people in the
> > > > Internet. Maybe it's wrong idea to discuss something there instead of
> > > > dev-list?
> > > >
> > > > пн, 26 авг. 2019 г. в 16:27, Dmitriy Pavlov :
> > > >
> > > > > Hi,
> > > > >
> > > > > If we use the example of Apache Beam, usually these channels relate to
> > > a
> > > > > module or language, e.g. beam-python, beam-ml, etc.
> > > > >
> > > > > I also support the idea of ASF slack usage. Moreover, we can ask Infra
> > > to
> > > > > install automatic translation of messages to English. It may help
> > > > > non-native speakers to communicate.
> > > > >
> > > > > Let's just remember the motto:
> > > > > If it didn't happen on the list - it didn't happen.
> > > > >
> > > > > Sincerely,
> > > > > Dmitriy Pavlov
> > > > >
> > > > > пн, 26 авг. 2019 г. в 16:22, Anton Vinogradov :
> > > > >
> > > > > > Nikolay,
> > > > > >
> > > > > > Let's try ;)
> > > > > >
> > > > > > On Mon, Aug 26, 2019 at 4:15 PM Nikolay Izhikov  > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Anton,
> > > > > > >
> > > > > > > Can we create channels for ticket, IEP discussions in ASF slack?
> > > > > > >
> > > > > > >
> > > > > > > В Пн, 26/08/2019 в 16:09 +0300, Anton Vinogradov пишет:
> > > > > > > > Igniters,
> > > > > > > > I'd like to propose you to register at the ASF Slack [1]
> > > (committers
> > > > > > > seems
> > > > > > > > to be already registered) and join the Ignite channel [2].
> > > > > > > > This should simplify communication between the contributors.
> > > > > > > >
> > > > > > > > P.s. I'm not saying we have to replace devlist with the Slack,
> > > but
> > > > > > Slack
> > > > > > > is
> > > > > > > > a more suitable place for short or private communications.
> > > > > > > >
> > > > > > > > [1] https://the-asf.slack.com
> > > > > > > > [2] https://the-asf.slack.com/messages/C7E04VCPK
> > > > > > >
> > > > > >
> > > > >
> > >


Re: The ASF Slack

2019-09-10 Thread Denis Mekhanikov
Anton,

You need to have an apache email to register in the ASF Slack. Is it
supposed to be like this? I don't think it's fair to limit the circle
of communication this way.
Non-blocking PME discussion is going to happen there, for example, and
I won't be able to attend it, since I don't have an Apache email. But
I'd really like to hear the discussion.
Is there a way to relief this limitation?

Denis

пн, 26 авг. 2019 г. в 21:58, Denis Magda :
>
> Anton,
>
> Thanks for starting the conversation. I think that we need to explain to
> our community how and why we came to this proposal. I've started another
> discussion with details, sorry for not doing it earlier:
> http://apache-ignite-developers.2346864.n4.nabble.com/Making-Ignite-Collaboration-100-Open-and-Transparent-td43244.html
>
> --
> Denis
>
>
>
> On Mon, Aug 26, 2019 at 7:43 AM Andrey Gura  wrote:
>
> > > But the slack is not indexing and not accessible for all people in the
> > Internet
> >
> > +1
> >
> > On Mon, Aug 26, 2019 at 4:56 PM Alexey Zinoviev 
> > wrote:
> > >
> > > I think that channels for separate IEP-s is too much.
> > > But the slack is not indexing and not accessible for all people in the
> > > Internet. Maybe it's wrong idea to discuss something there instead of
> > > dev-list?
> > >
> > > пн, 26 авг. 2019 г. в 16:27, Dmitriy Pavlov :
> > >
> > > > Hi,
> > > >
> > > > If we use the example of Apache Beam, usually these channels relate to
> > a
> > > > module or language, e.g. beam-python,  beam-ml, etc.
> > > >
> > > > I also support the idea of ASF slack usage. Moreover, we can ask Infra
> > to
> > > > install automatic translation of messages to English. It may help
> > > > non-native speakers to communicate.
> > > >
> > > > Let's just remember the motto:
> > > > If it didn't happen on the list - it didn't happen.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > пн, 26 авг. 2019 г. в 16:22, Anton Vinogradov :
> > > >
> > > > > Nikolay,
> > > > >
> > > > > Let's try ;)
> > > > >
> > > > > On Mon, Aug 26, 2019 at 4:15 PM Nikolay Izhikov  > >
> > > > > wrote:
> > > > >
> > > > > > Anton,
> > > > > >
> > > > > > Can we create channels for ticket, IEP discussions in ASF slack?
> > > > > >
> > > > > >
> > > > > > В Пн, 26/08/2019 в 16:09 +0300, Anton Vinogradov пишет:
> > > > > > > Igniters,
> > > > > > > I'd like to propose you to register at the ASF Slack [1]
> > (committers
> > > > > > seems
> > > > > > > to be already registered) and join the Ignite channel [2].
> > > > > > > This should simplify communication between the contributors.
> > > > > > >
> > > > > > > P.s. I'm not saying we have to replace devlist with the Slack,
> > but
> > > > > Slack
> > > > > > is
> > > > > > > a more suitable place for short or private communications.
> > > > > > >
> > > > > > > [1] https://the-asf.slack.com
> > > > > > > [2] https://the-asf.slack.com/messages/C7E04VCPK
> > > > > >
> > > > >
> > > >
> >


Re: Do I have to use --illegal-access=permit for Java thin client and JDBC with JDK 9/10/11.

2019-09-05 Thread Denis Mekhanikov
Alex,

Thanks for providing the details.
Ignite 2.7.0 doesn’t officially support Java 11+, so what you wrote seems 
valid. So, currently we don’t have an example, that fails with an Ignite thin 
client version 2.7.5 and Java 11+.
Please let us know if you find a counterexample.

Denis
On 3 Sep 2019, 19:00 +0300, Alex Plehanov , wrote:
> Denis, there is almost nothing to share: thin client connects to the
> server, creates a cache, do some puts, gets and queries. I run this test by
> IDE specifying different JVM options, JDK versions and dependency Ignite
> versions (it's not scripted anyhow)
>
> pom:
> 
> http://maven.apache.org/POM/4.0.0;
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
> http://maven.apache.org/xsd/maven-4.0.0.xsd;>
> 4.0.0
>
> thin-client-test
> org.apache.ignite
> 1.0-SNAPSHOT
>
> 
> 
> org.apache.ignite
> ignite-core
> 2.7.0
> 
> 
> 
>
> java:
> public static void main(String[] args) throws Exception {
> IgniteClient igniteClient = Ignition.startClient(new
> ClientConfiguration().setAddresses("127.0.0.1:10800"));
> ClientCache cache =
> igniteClient.getOrCreateCache("test.cache");
> cache.put(1, "value1");
> cache.put(2, "value2");
> cache.get(1);
> cache.query(new SqlFieldsQuery("SELECT * FROM
> IGNITE.NODES")).getAll();
> }
>
> вт, 3 сент. 2019 г. в 17:26, Denis Mekhanikov :
>
> > Alex,
> >
> > Could you share the project you’re checking? A GitHub repository would be
> > nice.
> >
> > Denis
> > On 3 Sep 2019, 17:10 +0300, Alex Plehanov ,
> > wrote:
> > > Dmitrii,
> > >
> > > What version of Ignite you are using?
> > >
> > > I've rechecked Java thin client recently (forgot to share results here),
> > in
> > > my tests:
> > > Client Version 2.7.0
> > > OracleJDK 11: Client won't start unless
> > > "--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED" option is
> > specified.
> > > OpenJDK 12: Client can't start at all
> > >
> > > Client Version 2.7.5
> > > OracleJDK 11: Client starts without any additional options
> > > OpenJDK 12: Client starts without any additional options
> > >
> > > "--add-opens=java.base/java.nio=ALL-UNNAMED" suppress warning messages on
> > > all versions.
> > >
> > >
> > > вт, 3 сент. 2019 г. в 16:35, Dmitrii Sherstobitov <
> > dnsherstobi...@gmail.com
> > > > :
> > >
> > > > Hi!
> > > >
> > > > I've made some simple tests using Apache Ignite documentation for JDBC
> > and
> > > > Java Thin client with using of following API functions:
> > > >
> > > > JDBC: executeQuery, execute, preparedStatement
> > > > Java Thin: cache get, put, create
> > > >
> > > > None of these API requires additional options for JVM. However, some
> > > > options are optional and used to suppress warning messages.
> > > >
> > > > Tested with Open JDK 9.0.4, 10.0.2, 11.0.2, 12.0.2 on Ubuntu and Mac
> > OS.
> > > >
> > > >
> > > > Best regards, Dmitry Sherstobitov
> > > > On 26 Aug 2019, 16:22 +0300, Alex Plehanov ,
> > > > wrote:
> > > > >
> > > > > Dmitry,
> > > > >
> > > > > As I said before, thin client uses BinaryHeapOutputStream, which uses
> > > > > Unsafe, so "--illegal-access=deny" has an effect.
> > > > > With "--illegal-access=deny" thin client will not start unless you
> > > > specify
> > > > > "--add-opens=java.base/java.nio=ALL-UNNAMED"
> > > >
> >


Re: Do I have to use --illegal-access=permit for Java thin client and JDBC with JDK 9/10/11.

2019-09-03 Thread Denis Mekhanikov
Alex,

Could you share the project you’re checking? A GitHub repository would be nice.

Denis
On 3 Sep 2019, 17:10 +0300, Alex Plehanov , wrote:
> Dmitrii,
>
> What version of Ignite you are using?
>
> I've rechecked Java thin client recently (forgot to share results here), in
> my tests:
> Client Version 2.7.0
> OracleJDK 11: Client won't start unless
> "--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED" option is specified.
> OpenJDK 12: Client can't start at all
>
> Client Version 2.7.5
> OracleJDK 11: Client starts without any additional options
> OpenJDK 12: Client starts without any additional options
>
> "--add-opens=java.base/java.nio=ALL-UNNAMED" suppress warning messages on
> all versions.
>
>
> вт, 3 сент. 2019 г. в 16:35, Dmitrii Sherstobitov  > :
>
> > Hi!
> >
> > I've made some simple tests using Apache Ignite documentation for JDBC and
> > Java Thin client with using of following API functions:
> >
> > JDBC: executeQuery, execute, preparedStatement
> > Java Thin: cache get, put, create
> >
> > None of these API requires additional options for JVM. However, some
> > options are optional and used to suppress warning messages.
> >
> > Tested with Open JDK 9.0.4, 10.0.2, 11.0.2, 12.0.2 on Ubuntu and Mac OS.
> >
> >
> > Best regards, Dmitry Sherstobitov
> > On 26 Aug 2019, 16:22 +0300, Alex Plehanov ,
> > wrote:
> > >
> > > Dmitry,
> > >
> > > As I said before, thin client uses BinaryHeapOutputStream, which uses
> > > Unsafe, so "--illegal-access=deny" has an effect.
> > > With "--illegal-access=deny" thin client will not start unless you
> > specify
> > > "--add-opens=java.base/java.nio=ALL-UNNAMED"
> >


Re: Nabble message wrapping

2019-08-29 Thread Denis Mekhanikov
Guys,

Any update? Who has a Nabble admin account for the developers list?

Denis
On 27 Aug 2019, 14:55 +0300, Dmitriy Pavlov , wrote:
> Hi Denis,
>
> AFAIK, nabble forums are service, which resides outside of ASF infra. They
> have their separate support/feedback form.
>
> Maybe some PMC members could have some credentials for this service. If so,
> please share login info in SVN/private/credentials.
>
> Sincerely,
> Dmitriy Pavlov


Nabble message wrapping

2019-08-27 Thread Denis Mekhanikov
Hi!

The Nabble forum formats emails in a way that makes quoted messages unreadable.

For example, take a look at the latest messages in the following thread: 
http://apache-ignite-developers.2346864.n4.nabble.com/Apache-Ignite-2-7-6-Time-Scope-and-Release-manager-td42944.html

The beginning of the messages look fine, but by the end they turn into 
something like

>> > > > > > > > > > > > > > > If nobody minds, I will create branch 2.7.6
>> based
>> > > on
>> > > > >
>> > > > > 2.7.5
>> > > > > > > and
>> > > > > > > > > > set up
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > in
>> > > > > > > > > > > > > > > the TC Bot during the weekend.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Sincerely,
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Dmitriy Pavlov
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > --
>> > > > > > > > > > > > > Best regards,
>> > > > > > > > > > > > > Ivan Pavlukhin
>> > > > > > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Zhenya Stanilovsky
>> > > > > > > >
>> > >
>> >
>>

Email clients suffer from such formatting. Some start working extremely slowly, 
some just apply psychedelic colouring.
Can we do something with it? It seems to me that line wrapping is causing this.

Nabble forum for users list doesn’t seem to suffer from such issue: 
http://apache-ignite-users.70518.x6.nabble.com/
For example, the following thread is long, but it’s formatted just fine: 
http://apache-ignite-users.70518.x6.nabble.com/Access-a-cache-loaded-by-DataStreamer-with-SQL-td27180.html
Can we configure the dev list forum in the same way?

Denis


Re: Asynchronous registration of binary metadata

2019-08-23 Thread Denis Mekhanikov
Sergey,

Yes, your understanding is similar to mine.

I created a JIRA ticket for this change: 
https://issues.apache.org/jira/browse/IGNITE-12099

Denis
On 23 Aug 2019, 14:27 +0300, Sergey Chugunov , wrote:
> Alexei, If my understanding is correct (Denis please correct me if I'm
> wrong) we'll indeed delay only reqs that touch "dirty" metadata (metadata
> with unfinished write to disk).
>
> I don't expect significant performance impact here because for now we don't
> allow other threads to use "dirty" metadata anyway and declare it "clean"
> only when it is fully written to disk.
>
> As far as I can see the only source of performance degradation here would
> be additional handing-off "write metadata tasks" between discovery thread
> and "writer" thread. But this should be minor comparing to IO operations.
>
> On Fri, Aug 23, 2019 at 2:02 PM Alexei Scherbakov <
> alexey.scherbak...@gmail.com> wrote:
>
> > Do I understand correctly what only affected requests with "dirty" metadata
> > will be delayed, but not all ?
> > Doesn't this check hurt performance? Otherwise ALL requests will be blocked
> > until some unrelated metadata is written which is highly undesirable.
> >
> > Otherwise looks good if performance will not be affected by implementation.
> >
> >
> > чт, 22 авг. 2019 г. в 15:18, Denis Mekhanikov :
> >
> > > Alexey,
> > >
> > > Making only one node write metadata to disk synchronously is a possible
> > > and easy to implement solution, but it still has a few drawbacks:
> > >
> > > • Discovery will still be blocked on one node. This is better than
> > > blocking all nodes one by one, but disk write may take indefinite time,
> > so
> > > discovery may still be affected.
> > > • There is an unlikely but at the same time an unpleasant case:
> > > 1. A coordinator writes metadata synchronously to disk and finalizes
> > > the metadata registration. Other nodes do it asynchronously, so actual
> > > fsync to a disk may be delayed.
> > > 2. A transaction is committed.
> > > 3. The cluster is shut down before all nodes finish their fsync of
> > > metadata.
> > > 4. Nodes are started again one by one.
> > > 5. Before the previous coordinator is started again, a read operation
> > > tries to read the data, that uses the metadata that wasn’t fsynced
> > anywhere
> > > except the coordinator, which is still not started.
> > > 6. Error about unknown metadata is generated.
> > >
> > > In the scheme, that Sergey and me proposed, this situation isn’t
> > possible,
> > > since the data won’t be written to disk until fsync is finished. Every
> > > mapped node will wait on a future until metadata is written to disk
> > before
> > > performing any cache changes.
> > > What do you think about such fix?
> > >
> > > Denis
> > > On 22 Aug 2019, 12:44 +0300, Alexei Scherbakov <
> > > alexey.scherbak...@gmail.com>, wrote:
> > > > Denis Mekhanikov,
> > > >
> > > > I think at least one node (coordinator for example) still should write
> > > > metadata synchronously to protect from a scenario:
> > > >
> > > > tx creating new metadata is commited <- all nodes in grid are failed
> > > > (powered off) <- async writing to disk is completed
> > > >
> > > > where <- means "happens before"
> > > >
> > > > All other nodes could write asynchronously, by using separate thread or
> > > not
> > > > doing fsync( same effect)
> > > >
> > > >
> > > >
> > > > ср, 21 авг. 2019 г. в 19:48, Denis Mekhanikov :
> > > >
> > > > > Alexey,
> > > > >
> > > > > I’m not suggesting to duplicate anything.
> > > > > My point is that the proper fix will be implemented in a relatively
> > > > > distant future. Why not improve the existing mechanism now instead of
> > > > > waiting for the proper fix?
> > > > > If we don’t agree on doing this fix in master, I can do it in a fork
> > > and
> > > > > use it in my setup. So please let me know if you see any other
> > > drawbacks in
> > > > > the proposed solution.
> > > > >
> > > > > Denis
> > > > >
> > > > > > On 21 Aug 2019, at 15:53, Alexei Scherbakov <
> > > > > al

[jira] [Created] (IGNITE-12099) Don't write metadata to disk in discovery thread

2019-08-23 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-12099:
-

 Summary: Don't write metadata to disk in discovery thread
 Key: IGNITE-12099
 URL: https://issues.apache.org/jira/browse/IGNITE-12099
 Project: Ignite
  Issue Type: Improvement
  Components: binary
Reporter: Denis Mekhanikov


When persistence is enabled, binary metadata is written to disk upon 
registration. Currently it happens in the discovery thread, which makes 
processing of related messages very slow.

A different thread should be used to write metadata to disk. Binary type 
registration will be considered finished before information about it will is 
written to disks on all nodes.

The implementation should guarantee consistency in cases of cluster restarts.

Dev list discussion: 
http://apache-ignite-developers.2346864.n4.nabble.com/Asynchronous-registration-of-binary-metadata-td43021.html



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: Do I have to use --illegal-access=permit for Java thin client and JDBC with JDK 9/10/11.

2019-08-22 Thread Denis Mekhanikov
Denis,

I didn’t find any usages of JDK internals in the implementation of the thin 
clients.
It would be nice to verify in tests that thin clients can work without these 
flags.

Do our Java 9/10/11 tests include thin client testing? If so, do these tests 
include these flags?

Denis
On 15 Aug 2019, 11:09 +0300, Denis Magda , wrote:
> Denis,
>
> Does it mean we don't need to pass any flags from this list [1] at all for
> the JDBC and thin clients?
>
> [1]
> https://apacheignite.readme.io/docs/getting-started#section-running-ignite-with-java-9-10-11
>
> -
> Denis
>
>
> On Wed, Aug 14, 2019 at 5:56 PM Denis Mekhanikov 
> wrote:
>
> > Hi!
> >
> > There are two JDK internal things that are used by Ignite: Unsafe and
> > sun.nio.ch package.
> > None of these things are used by thin clients. So, it’s fine to use thin
> > clients without additional flags.
> >
> > Denis
> >
> > > On 13 Aug 2019, at 23:01, Shane Duan  wrote:
> > >
> > > Hi Igniter,
> > >
> > > I understand that --illegal-access=permit is required for JDK 9/10/11 on
> > > Ignite server. But do I have to include this JVM parameter for Ignite
> > Java
> > > thin client and JDBC client? I tried some simple test without it and it
> > > seems working fine...
> > >
> > >
> > > Thanks,
> > > Shane
> >
> >


Re: Asynchronous registration of binary metadata

2019-08-22 Thread Denis Mekhanikov
Alexey,

Making only one node write metadata to disk synchronously is a possible and 
easy to implement solution, but it still has a few drawbacks:

• Discovery will still be blocked on one node. This is better than blocking all 
nodes one by one, but disk write may take indefinite time, so discovery may 
still be affected.
• There is an unlikely but at the same time an unpleasant case:
1. A coordinator writes metadata synchronously to disk and finalizes the 
metadata registration. Other nodes do it asynchronously, so actual fsync to a 
disk may be delayed.
2. A transaction is committed.
3. The cluster is shut down before all nodes finish their fsync of metadata.
4. Nodes are started again one by one.
5. Before the previous coordinator is started again, a read operation tries 
to read the data, that uses the metadata that wasn’t fsynced anywhere except 
the coordinator, which is still not started.
6. Error about unknown metadata is generated.

In the scheme, that Sergey and me proposed, this situation isn’t possible, 
since the data won’t be written to disk until fsync is finished. Every mapped 
node will wait on a future until metadata is written to disk before performing 
any cache changes.
What do you think about such fix?

Denis
On 22 Aug 2019, 12:44 +0300, Alexei Scherbakov , 
wrote:
> Denis Mekhanikov,
>
> I think at least one node (coordinator for example) still should write
> metadata synchronously to protect from a scenario:
>
> tx creating new metadata is commited <- all nodes in grid are failed
> (powered off) <- async writing to disk is completed
>
> where <- means "happens before"
>
> All other nodes could write asynchronously, by using separate thread or not
> doing fsync( same effect)
>
>
>
> ср, 21 авг. 2019 г. в 19:48, Denis Mekhanikov :
>
> > Alexey,
> >
> > I’m not suggesting to duplicate anything.
> > My point is that the proper fix will be implemented in a relatively
> > distant future. Why not improve the existing mechanism now instead of
> > waiting for the proper fix?
> > If we don’t agree on doing this fix in master, I can do it in a fork and
> > use it in my setup. So please let me know if you see any other drawbacks in
> > the proposed solution.
> >
> > Denis
> >
> > > On 21 Aug 2019, at 15:53, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com> wrote:
> > >
> > > Denis Mekhanikov,
> > >
> > > If we are still talking about "proper" solution the metastore (I've meant
> > > of course distributed one) is the way to go.
> > >
> > > It has a contract to store cluster wide metadata in most efficient way
> > and
> > > can have any optimization for concurrent writing inside.
> > >
> > > I'm against creating some duplicating mechanism as you suggested. We do
> > not
> > > need another copy/paste code.
> > >
> > > Another possibility is to carry metadata along with appropriate request
> > if
> > > it's not found locally but this is a rather big modification.
> > >
> > >
> > >
> > > вт, 20 авг. 2019 г. в 17:26, Denis Mekhanikov :
> > >
> > > > Eduard,
> > > >
> > > > Usages will wait for the metadata to be registered and written to disk.
> > No
> > > > races should occur with such flow.
> > > > Or do you have some specific case on your mind?
> > > >
> > > > I agree, that using a distributed meta storage would be nice here.
> > > > But this way we will kind of move to the previous scheme with a
> > replicated
> > > > system cache, where metadata was stored before.
> > > > Will scheme with the metastorage be different in any way? Won’t we
> > decide
> > > > to move back to discovery messages again after a while?
> > > >
> > > > Denis
> > > >
> > > >
> > > > > On 20 Aug 2019, at 15:13, Eduard Shangareev <
> > eduard.shangar...@gmail.com>
> > > > wrote:
> > > > >
> > > > > Denis,
> > > > > How would we deal with races between registration and metadata usages
> > > > with
> > > > > such fast-fix?
> > > > >
> > > > > I believe, that we need to move it to distributed metastorage, and
> > await
> > > > > registration completeness if we can't find it (wait for work in
> > > > progress).
> > > > > Discovery shouldn't wait for anything here.
> > > > >
> > > > > On Tue, Aug 20, 2019 at 11:55 AM Denis Mekhani

Re: Asynchronous registration of binary metadata

2019-08-21 Thread Denis Mekhanikov
Alexey,

I’m not suggesting to duplicate anything.
My point is that the proper fix will be implemented in a relatively distant 
future. Why not improve the existing mechanism now instead of waiting for the 
proper fix?
If we don’t agree on doing this fix in master, I can do it in a fork and use it 
in my setup. So please let me know if you see any other drawbacks in the 
proposed solution.

Denis

> On 21 Aug 2019, at 15:53, Alexei Scherbakov  
> wrote:
> 
> Denis Mekhanikov,
> 
> If we are still talking about "proper" solution the metastore (I've meant
> of course distributed one) is the way to go.
> 
> It has a contract to store cluster wide metadata in most efficient way and
> can have any optimization for concurrent writing inside.
> 
> I'm against creating some duplicating mechanism as you suggested. We do not
> need another copy/paste code.
> 
> Another possibility is to carry metadata along with appropriate request if
> it's not found locally but this is a rather big modification.
> 
> 
> 
> вт, 20 авг. 2019 г. в 17:26, Denis Mekhanikov :
> 
>> Eduard,
>> 
>> Usages will wait for the metadata to be registered and written to disk. No
>> races should occur with such flow.
>> Or do you have some specific case on your mind?
>> 
>> I agree, that using a distributed meta storage would be nice here.
>> But this way we will kind of move to the previous scheme with a replicated
>> system cache, where metadata was stored before.
>> Will scheme with the metastorage be different in any way? Won’t we decide
>> to move back to discovery messages again after a while?
>> 
>> Denis
>> 
>> 
>>> On 20 Aug 2019, at 15:13, Eduard Shangareev 
>> wrote:
>>> 
>>> Denis,
>>> How would we deal with races between registration and metadata usages
>> with
>>> such fast-fix?
>>> 
>>> I believe, that we need to move it to distributed metastorage, and await
>>> registration completeness if we can't find it (wait for work in
>> progress).
>>> Discovery shouldn't wait for anything here.
>>> 
>>> On Tue, Aug 20, 2019 at 11:55 AM Denis Mekhanikov >> 
>>> wrote:
>>> 
>>>> Sergey,
>>>> 
>>>> Currently metadata is written to disk sequentially on every node. Only
>> one
>>>> node at a time is able to write metadata to its storage.
>>>> Slowness accumulates when you add more nodes. A delay required to write
>>>> one piece of metadata may be not that big, but if you multiply it by say
>>>> 200, then it becomes noticeable.
>>>> But If we move the writing out from discovery threads, then nodes will
>> be
>>>> doing it in parallel.
>>>> 
>>>> I think, it’s better to block some threads from a striped pool for a
>>>> little while rather than blocking discovery for the same period, but
>>>> multiplied by a number of nodes.
>>>> 
>>>> What do you think?
>>>> 
>>>> Denis
>>>> 
>>>>> On 15 Aug 2019, at 10:26, Sergey Chugunov 
>>>> wrote:
>>>>> 
>>>>> Denis,
>>>>> 
>>>>> Thanks for bringing this issue up, decision to write binary metadata
>> from
>>>>> discovery thread was really a tough decision to make.
>>>>> I don't think that moving metadata to metastorage is a silver bullet
>> here
>>>>> as this approach also has its drawbacks and is not an easy change.
>>>>> 
>>>>> In addition to workarounds suggested by Alexei we have two choices to
>>>>> offload write operation from discovery thread:
>>>>> 
>>>>> 1. Your scheme with a separate writer thread and futures completed
>> when
>>>>> write operation is finished.
>>>>> 2. PME-like protocol with obvious complications like failover and
>>>>> asynchronous wait for replies over communication layer.
>>>>> 
>>>>> Your suggestion looks easier from code complexity perspective but in my
>>>>> view it increases chances to get into starvation. Now if some node
>> faces
>>>>> really long delays during write op it is gonna be kicked out of
>> topology
>>>> by
>>>>> discovery protocol. In your case it is possible that more and more
>>>> threads
>>>>> from other pools may stuck waiting on the operation future, it is also
>>>> not
>>>>> good.
>>>>> 
>>>>

Re: Asynchronous registration of binary metadata

2019-08-20 Thread Denis Mekhanikov
Eduard,

Usages will wait for the metadata to be registered and written to disk. No 
races should occur with such flow.
Or do you have some specific case on your mind?

I agree, that using a distributed meta storage would be nice here. 
But this way we will kind of move to the previous scheme with a replicated 
system cache, where metadata was stored before.
Will scheme with the metastorage be different in any way? Won’t we decide to 
move back to discovery messages again after a while?

Denis


> On 20 Aug 2019, at 15:13, Eduard Shangareev  
> wrote:
> 
> Denis,
> How would we deal with races between registration and metadata usages with
> such fast-fix?
> 
> I believe, that we need to move it to distributed metastorage, and await
> registration completeness if we can't find it (wait for work in progress).
> Discovery shouldn't wait for anything here.
> 
> On Tue, Aug 20, 2019 at 11:55 AM Denis Mekhanikov 
> wrote:
> 
>> Sergey,
>> 
>> Currently metadata is written to disk sequentially on every node. Only one
>> node at a time is able to write metadata to its storage.
>> Slowness accumulates when you add more nodes. A delay required to write
>> one piece of metadata may be not that big, but if you multiply it by say
>> 200, then it becomes noticeable.
>> But If we move the writing out from discovery threads, then nodes will be
>> doing it in parallel.
>> 
>> I think, it’s better to block some threads from a striped pool for a
>> little while rather than blocking discovery for the same period, but
>> multiplied by a number of nodes.
>> 
>> What do you think?
>> 
>> Denis
>> 
>>> On 15 Aug 2019, at 10:26, Sergey Chugunov 
>> wrote:
>>> 
>>> Denis,
>>> 
>>> Thanks for bringing this issue up, decision to write binary metadata from
>>> discovery thread was really a tough decision to make.
>>> I don't think that moving metadata to metastorage is a silver bullet here
>>> as this approach also has its drawbacks and is not an easy change.
>>> 
>>> In addition to workarounds suggested by Alexei we have two choices to
>>> offload write operation from discovery thread:
>>> 
>>>  1. Your scheme with a separate writer thread and futures completed when
>>>  write operation is finished.
>>>  2. PME-like protocol with obvious complications like failover and
>>>  asynchronous wait for replies over communication layer.
>>> 
>>> Your suggestion looks easier from code complexity perspective but in my
>>> view it increases chances to get into starvation. Now if some node faces
>>> really long delays during write op it is gonna be kicked out of topology
>> by
>>> discovery protocol. In your case it is possible that more and more
>> threads
>>> from other pools may stuck waiting on the operation future, it is also
>> not
>>> good.
>>> 
>>> What do you think?
>>> 
>>> I also think that if we want to approach this issue systematically, we
>> need
>>> to do a deep analysis of metastorage option as well and to finally choose
>>> which road we wanna go.
>>> 
>>> Thanks!
>>> 
>>> On Thu, Aug 15, 2019 at 9:28 AM Zhenya Stanilovsky
>>>  wrote:
>>> 
>>>> 
>>>>> 
>>>>>> 1. Yes, only on OS failures. In such case data will be received from
>>>> alive
>>>>>> nodes later.
>>>> What behavior would be in case of one node ? I suppose someone can
>> obtain
>>>> cache data without unmarshalling schema, what in this case would be with
>>>> grid operability?
>>>> 
>>>>> 
>>>>>> 2. Yes, for walmode=FSYNC writes to metastore will be slow. But such
>>>> mode
>>>>>> should not be used if you have more than two nodes in grid because it
>>>> has
>>>>>> huge impact on performance.
>>>> Is wal mode affects metadata store ?
>>>> 
>>>>> 
>>>>>> 
>>>>>> ср, 14 авг. 2019 г. в 14:29, Denis Mekhanikov < dmekhani...@gmail.com
>>>>> :
>>>>>> 
>>>>>>> Folks,
>>>>>>> 
>>>>>>> Thanks for showing interest in this issue!
>>>>>>> 
>>>>>>> Alexey,
>>>>>>> 
>>>>>>>> I think removing fsync could help to mitigate performance issues
>> with
>>>>>>> current implementation
>>>>

Re: Asynchronous registration of binary metadata

2019-08-20 Thread Denis Mekhanikov
Sergey,

Currently metadata is written to disk sequentially on every node. Only one node 
at a time is able to write metadata to its storage.
Slowness accumulates when you add more nodes. A delay required to write one 
piece of metadata may be not that big, but if you multiply it by say 200, then 
it becomes noticeable.
But If we move the writing out from discovery threads, then nodes will be doing 
it in parallel.

I think, it’s better to block some threads from a striped pool for a little 
while rather than blocking discovery for the same period, but multiplied by a 
number of nodes.

What do you think?

Denis

> On 15 Aug 2019, at 10:26, Sergey Chugunov  wrote:
> 
> Denis,
> 
> Thanks for bringing this issue up, decision to write binary metadata from
> discovery thread was really a tough decision to make.
> I don't think that moving metadata to metastorage is a silver bullet here
> as this approach also has its drawbacks and is not an easy change.
> 
> In addition to workarounds suggested by Alexei we have two choices to
> offload write operation from discovery thread:
> 
>   1. Your scheme with a separate writer thread and futures completed when
>   write operation is finished.
>   2. PME-like protocol with obvious complications like failover and
>   asynchronous wait for replies over communication layer.
> 
> Your suggestion looks easier from code complexity perspective but in my
> view it increases chances to get into starvation. Now if some node faces
> really long delays during write op it is gonna be kicked out of topology by
> discovery protocol. In your case it is possible that more and more threads
> from other pools may stuck waiting on the operation future, it is also not
> good.
> 
> What do you think?
> 
> I also think that if we want to approach this issue systematically, we need
> to do a deep analysis of metastorage option as well and to finally choose
> which road we wanna go.
> 
> Thanks!
> 
> On Thu, Aug 15, 2019 at 9:28 AM Zhenya Stanilovsky
>  wrote:
> 
>> 
>>> 
>>>> 1. Yes, only on OS failures. In such case data will be received from
>> alive
>>>> nodes later.
>> What behavior would be in case of one node ? I suppose someone can obtain
>> cache data without unmarshalling schema, what in this case would be with
>> grid operability?
>> 
>>> 
>>>> 2. Yes, for walmode=FSYNC writes to metastore will be slow. But such
>> mode
>>>> should not be used if you have more than two nodes in grid because it
>> has
>>>> huge impact on performance.
>> Is wal mode affects metadata store ?
>> 
>>> 
>>>> 
>>>> ср, 14 авг. 2019 г. в 14:29, Denis Mekhanikov < dmekhani...@gmail.com
>>> :
>>>> 
>>>>> Folks,
>>>>> 
>>>>> Thanks for showing interest in this issue!
>>>>> 
>>>>> Alexey,
>>>>> 
>>>>>> I think removing fsync could help to mitigate performance issues with
>>>>> current implementation
>>>>> 
>>>>> Is my understanding correct, that if we remove fsync, then discovery
>> won’t
>>>>> be blocked, and data will be flushed to disk in background, and loss of
>>>>> information will be possible only on OS failure? It sounds like an
>>>>> acceptable workaround to me.
>>>>> 
>>>>> Will moving metadata to metastore actually resolve this issue? Please
>>>>> correct me if I’m wrong, but we will still need to write the
>> information to
>>>>> WAL before releasing the discovery thread. If WAL mode is FSYNC, then
>> the
>>>>> issue will still be there. Or is it planned to abandon the
>> discovery-based
>>>>> protocol at all?
>>>>> 
>>>>> Evgeniy, Ivan,
>>>>> 
>>>>> In my particular case the data wasn’t too big. It was a slow
>> virtualised
>>>>> disk with encryption, that made operations slow. Given that there are
>> 200
>>>>> nodes in a cluster, where every node writes slowly, and this process is
>>>>> sequential, one piece of metadata is registered extremely slowly.
>>>>> 
>>>>> Ivan, answering to your other questions:
>>>>> 
>>>>>> 2. Do we need a persistent metadata for in-memory caches? Or is it so
>>>>> accidentally?
>>>>> 
>>>>> It should be checked, if it’s safe to stop writing marshaller mappings
>> to
>>>>> disk without loosing any guarantees.
>>>>&

Re: Asynchronous registration of binary metadata

2019-08-14 Thread Denis Mekhanikov
Alexey, 

I still don’t understand completely if by using metastore we are going to stop 
using discovery for metadata registration, or not. Could you clarify that point?
Is it going to be a distributed metastore or a local one?

Are there any relevant JIRA tickets for this change?

Denis

> On 14 Aug 2019, at 19:37, Alexei Scherbakov  
> wrote:
> 
> Denis Mekhanikov,
> 
> 1. Yes, only on OS failures. In such case data will be received from alive
> nodes later.
> 2. Yes, for walmode=FSYNC writes to metastore will be slow. But such mode
> should not be used if you have more than two nodes in grid because it has
> huge impact on performance.
> 
> ср, 14 авг. 2019 г. в 14:29, Denis Mekhanikov :
> 
>> Folks,
>> 
>> Thanks for showing interest in this issue!
>> 
>> Alexey,
>> 
>>> I think removing fsync could help to mitigate performance issues with
>> current implementation
>> 
>> Is my understanding correct, that if we remove fsync, then discovery won’t
>> be blocked, and data will be flushed to disk in background, and loss of
>> information will be possible only on OS failure? It sounds like an
>> acceptable workaround to me.
>> 
>> Will moving metadata to metastore actually resolve this issue? Please
>> correct me if I’m wrong, but we will still need to write the information to
>> WAL before releasing the discovery thread. If WAL mode is FSYNC, then the
>> issue will still be there. Or is it planned to abandon the discovery-based
>> protocol at all?
>> 
>> Evgeniy, Ivan,
>> 
>> In my particular case the data wasn’t too big. It was a slow virtualised
>> disk with encryption, that made operations slow. Given that there are 200
>> nodes in a cluster, where every node writes slowly, and this process is
>> sequential, one piece of metadata is registered extremely slowly.
>> 
>> Ivan, answering to your other questions:
>> 
>>> 2. Do we need a persistent metadata for in-memory caches? Or is it so
>> accidentally?
>> 
>> It should be checked, if it’s safe to stop writing marshaller mappings to
>> disk without loosing any guarantees.
>> But anyway, I would like to have a property, that would control this. If
>> metadata registration is slow, then initial cluster warmup may take a
>> while. So, if we preserve metadata on disk, then we will need to warm it up
>> only once, and further restarts won’t be affected.
>> 
>>> Do we really need a fast fix here?
>> 
>> I would like a fix, that could be implemented now, since the activity with
>> moving metadata to metastore doesn’t sound like a quick one. Having a
>> temporary solution would be nice.
>> 
>> Denis
>> 
>>> On 14 Aug 2019, at 11:53, Павлухин Иван  wrote:
>>> 
>>> Denis,
>>> 
>>> Several clarifying questions:
>>> 1. Do you have an idea why metadata registration takes so long? So
>>> poor disks? So many data to write? A contention with disk writes by
>>> other subsystems?
>>> 2. Do we need a persistent metadata for in-memory caches? Or is it so
>>> accidentally?
>>> 
>>> Generally, I think that it is possible to move metadata saving
>>> operations out of discovery thread without loosing required
>>> consistency/integrity.
>>> 
>>> As Alex mentioned using metastore looks like a better solution. Do we
>>> really need a fast fix here? (Are we talking about fast fix?)
>>> 
>>> ср, 14 авг. 2019 г. в 11:45, Zhenya Stanilovsky
>> :
>>>> 
>>>> Alexey, but in this case customer need to be informed, that whole (for
>> example 1 node) cluster crash (power off) could lead to partial data
>> unavailability.
>>>> And may be further index corruption.
>>>> 1. Why your meta takes a substantial size? may be context leaking ?
>>>> 2. Could meta be compressed ?
>>>> 
>>>> 
>>>>> Среда, 14 августа 2019, 11:22 +03:00 от Alexei Scherbakov <
>> alexey.scherbak...@gmail.com>:
>>>>> 
>>>>> Denis Mekhanikov,
>>>>> 
>>>>> Currently metadata are fsync'ed on write. This might be the case of
>>>>> slow-downs in case of metadata burst writes.
>>>>> I think removing fsync could help to mitigate performance issues with
>>>>> current implementation until proper solution will be implemented:
>> moving
>>>>> metadata to metastore.
>>>>> 
>>>>> 
>>>>> вт, 13 авг. 2019 г. в 17:09, Denis Mekhanikov < dmekhani.

Re: Do I have to use --illegal-access=permit for Java thin client and JDBC with JDK 9/10/11.

2019-08-14 Thread Denis Mekhanikov
Hi!

There are two JDK internal things that are used by Ignite: Unsafe and 
sun.nio.ch package.
None of these things are used by thin clients. So, it’s fine to use thin 
clients without additional flags.

Denis

> On 13 Aug 2019, at 23:01, Shane Duan  wrote:
> 
> Hi Igniter,
> 
> I understand that --illegal-access=permit is required for JDK 9/10/11 on
> Ignite server. But do I have to  include this JVM parameter for Ignite Java
> thin client and JDBC client? I tried some simple test without it and it
> seems working fine...
> 
> 
> Thanks,
> Shane



Re: Asynchronous registration of binary metadata

2019-08-14 Thread Denis Mekhanikov
Folks, 

Thanks for showing interest in this issue!

Alexey,

> I think removing fsync could help to mitigate performance issues with current 
> implementation

Is my understanding correct, that if we remove fsync, then discovery won’t be 
blocked, and data will be flushed to disk in background, and loss of 
information will be possible only on OS failure? It sounds like an acceptable 
workaround to me.

Will moving metadata to metastore actually resolve this issue? Please correct 
me if I’m wrong, but we will still need to write the information to WAL before 
releasing the discovery thread. If WAL mode is FSYNC, then the issue will still 
be there. Or is it planned to abandon the discovery-based protocol at all?

Evgeniy, Ivan,

In my particular case the data wasn’t too big. It was a slow virtualised disk 
with encryption, that made operations slow. Given that there are 200 nodes in a 
cluster, where every node writes slowly, and this process is sequential, one 
piece of metadata is registered extremely slowly.

Ivan, answering to your other questions:

> 2. Do we need a persistent metadata for in-memory caches? Or is it so 
> accidentally?

It should be checked, if it’s safe to stop writing marshaller mappings to disk 
without loosing any guarantees.
But anyway, I would like to have a property, that would control this. If 
metadata registration is slow, then initial cluster warmup may take a while. 
So, if we preserve metadata on disk, then we will need to warm it up only once, 
and further restarts won’t be affected.

> Do we really need a fast fix here? 

I would like a fix, that could be implemented now, since the activity with 
moving metadata to metastore doesn’t sound like a quick one. Having a temporary 
solution would be nice.

Denis

> On 14 Aug 2019, at 11:53, Павлухин Иван  wrote:
> 
> Denis,
> 
> Several clarifying questions:
> 1. Do you have an idea why metadata registration takes so long? So
> poor disks? So many data to write? A contention with disk writes by
> other subsystems?
> 2. Do we need a persistent metadata for in-memory caches? Or is it so
> accidentally?
> 
> Generally, I think that it is possible to move metadata saving
> operations out of discovery thread without loosing required
> consistency/integrity.
> 
> As Alex mentioned using metastore looks like a better solution. Do we
> really need a fast fix here? (Are we talking about fast fix?)
> 
> ср, 14 авг. 2019 г. в 11:45, Zhenya Stanilovsky :
>> 
>> Alexey, but in this case customer need to be informed, that whole (for 
>> example 1 node) cluster crash (power off) could lead to partial data 
>> unavailability.
>> And may be further index corruption.
>> 1. Why your meta takes a substantial size? may be context leaking ?
>> 2. Could meta be compressed ?
>> 
>> 
>>> Среда, 14 августа 2019, 11:22 +03:00 от Alexei Scherbakov 
>>> :
>>> 
>>> Denis Mekhanikov,
>>> 
>>> Currently metadata are fsync'ed on write. This might be the case of
>>> slow-downs in case of metadata burst writes.
>>> I think removing fsync could help to mitigate performance issues with
>>> current implementation until proper solution will be implemented: moving
>>> metadata to metastore.
>>> 
>>> 
>>> вт, 13 авг. 2019 г. в 17:09, Denis Mekhanikov < dmekhani...@gmail.com >:
>>> 
>>>> I would also like to mention, that marshaller mappings are written to disk
>>>> even if persistence is disabled.
>>>> So, this issue affects purely in-memory clusters as well.
>>>> 
>>>> Denis
>>>> 
>>>>> On 13 Aug 2019, at 17:06, Denis Mekhanikov < dmekhani...@gmail.com >
>>>> wrote:
>>>>> 
>>>>> Hi!
>>>>> 
>>>>> When persistence is enabled, binary metadata is written to disk upon
>>>> registration. Currently it happens in the discovery thread, which makes
>>>> processing of related messages very slow.
>>>>> There are cases, when a lot of nodes and slow disks can make every
>>>> binary type be registered for several minutes. Plus it blocks processing of
>>>> other messages.
>>>>> 
>>>>> I propose starting a separate thread that will be responsible for
>>>> writing binary metadata to disk. So, binary type registration will be
>>>> considered finished before information about it will is written to disks on
>>>> all nodes.
>>>>> 
>>>>> The main concern here is data consistency in cases when a node
>>>> acknowledges type registration and then fails before writing the metadata
>>>> to di

Re: Asynchronous registration of binary metadata

2019-08-13 Thread Denis Mekhanikov
I would also like to mention, that marshaller mappings are written to disk even 
if persistence is disabled.
So, this issue affects purely in-memory clusters as well.

Denis

> On 13 Aug 2019, at 17:06, Denis Mekhanikov  wrote:
> 
> Hi!
> 
> When persistence is enabled, binary metadata is written to disk upon 
> registration. Currently it happens in the discovery thread, which makes 
> processing of related messages very slow.
> There are cases, when a lot of nodes and slow disks can make every binary 
> type be registered for several minutes. Plus it blocks processing of other 
> messages.
> 
> I propose starting a separate thread that will be responsible for writing 
> binary metadata to disk. So, binary type registration will be considered 
> finished before information about it will is written to disks on all nodes.
> 
> The main concern here is data consistency in cases when a node acknowledges 
> type registration and then fails before writing the metadata to disk.
> I see two parts of this issue:
> Nodes will have different metadata after restarting.
> If we write some data into a persisted cache and shut down nodes faster than 
> a new binary type is written to disk, then after a restart we won’t have a 
> binary type to work with.
> 
> The first case is similar to a situation, when one node fails, and after that 
> a new type is registered in the cluster. This issue is resolved by the 
> discovery data exchange. All nodes receive information about all binary types 
> in the initial discovery messages sent by other nodes. So, once you restart a 
> node, it will receive information, that it failed to finish writing to disk, 
> from other nodes.
> If all nodes shut down before finishing writing the metadata to disk, then 
> after a restart the type will be considered unregistered, so another 
> registration will be required.
> 
> The second case is a bit more complicated. But it can be resolved by making 
> the discovery threads on every node create a future, that will be completed 
> when writing to disk is finished. So, every node will have such future, that 
> will reflect the current state of persisting the metadata to disk.
> After that, if some operation needs this binary type, it will need to wait on 
> that future until flushing to disk is finished.
> This way discovery threads won’t be blocked, but other threads, that actually 
> need this type, will be.
> 
> Please let me know what you think about that.
> 
> Denis



Asynchronous registration of binary metadata

2019-08-13 Thread Denis Mekhanikov
Hi!

When persistence is enabled, binary metadata is written to disk upon 
registration. Currently it happens in the discovery thread, which makes 
processing of related messages very slow.
There are cases, when a lot of nodes and slow disks can make every binary type 
be registered for several minutes. Plus it blocks processing of other messages.

I propose starting a separate thread that will be responsible   for writing 
binary metadata to disk. So, binary type registration will be considered 
finished before information about it will is written to disks on all nodes.

The main concern here is data consistency in cases when a node acknowledges 
type registration and then fails before writing the metadata to disk.
I see two parts of this issue:
Nodes will have different metadata after restarting.
If we write some data into a persisted cache and shut down nodes faster than a 
new binary type is written to disk, then after a restart we won’t have a binary 
type to work with.

The first case is similar to a situation, when one node fails, and after that a 
new type is registered in the cluster. This issue is resolved by the discovery 
data exchange. All nodes receive information about all binary types in the 
initial discovery messages sent by other nodes. So, once you restart a node, it 
will receive information, that it failed to finish writing to disk, from other 
nodes.
If all nodes shut down before finishing writing the metadata to disk, then 
after a restart the type will be considered unregistered, so another 
registration will be required.

The second case is a bit more complicated. But it can be resolved by making the 
discovery threads on every node create a future, that will be completed when 
writing to disk is finished. So, every node will have such future, that will 
reflect the current state of persisting the metadata to disk.
After that, if some operation needs this binary type, it will need to wait on 
that future until flushing to disk is finished.
This way discovery threads won’t be blocked, but other threads, that actually 
need this type, will be.

Please let me know what you think about that.

Denis

[jira] [Created] (IGNITE-11914) Failures to deserialize discovery data should be handled by a failure handler

2019-06-13 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11914:
-

 Summary: Failures to deserialize discovery data should be handled 
by a failure handler
 Key: IGNITE-11914
 URL: https://issues.apache.org/jira/browse/IGNITE-11914
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7.5
Reporter: Denis Mekhanikov


When a node during join receives a discovery data packet, that it cannot 
deserialize, then this error is printed to log and not handled in any way. It 
leads to swallowing potentially important failures.

For example, a failure to deserialize a continuous query remote filter should 
be propagated to a failure handler, but it doesn't happen. Test is attached.

Error message:

{noformat}
Failed to unmarshal discovery data for component: 0
class org.apache.ignite.IgniteCheckedException: Failed to find class with given 
class loader for unmarshalling (make sure same versions of all classes are 
available on all nodes or enable peer-class-loading) 
[clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, 
cls=org.apache.ignite.tests.p2p.CacheDeploymentEntryEventFilterFactory]
at 
org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:146)
at 
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:93)
at 
org.apache.ignite.internal.util.IgniteUtils.unmarshalZip(IgniteUtils.java:10068)
at 
org.apache.ignite.spi.discovery.tcp.internal.DiscoveryDataPacket.unmarshalData(DiscoveryDataPacket.java:292)
at 
org.apache.ignite.spi.discovery.tcp.internal.DiscoveryDataPacket.unmarshalGridData(DiscoveryDataPacket.java:154)
at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.onExchange(TcpDiscoverySpi.java:2065)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddFinishedMessage(ServerImpl.java:4882)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2964)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2696)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7527)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2818)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7458)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:61)
Caused by: java.lang.ClassNotFoundException: 
org.apache.ignite.tests.p2p.CacheDeploymentEntryEventFilterFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.ignite.internal.util.IgniteUtils.forName(IgniteUtils.java:8672)
at 
org.apache.ignite.marshaller.jdk.JdkMarshallerObjectInputStream.resolveClass(JdkMarshallerObjectInputStream.java:59)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1863)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2037)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
at 
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandlerV2.readExternal(CacheContinuousQueryHandlerV2.java:179)
at 
java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:2113)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2062)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
at java.util.HashMap.readObject(HashMap.java:1409)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158

[jira] [Created] (IGNITE-11907) Registration of continuous query should fail if nodes don't have remote filter class

2019-06-10 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11907:
-

 Summary: Registration of continuous query should fail if nodes 
don't have remote filter class
 Key: IGNITE-11907
 URL: https://issues.apache.org/jira/browse/IGNITE-11907
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7
Reporter: Denis Mekhanikov
 Attachments: ContinuousQueryRemoteFilterMissingInClassPathSelfTest.java

If one of data nodes doesn't have a remote filter class, then registration of 
continuous queries should fail with an exception. Currently nodes fail instead.

Reproducer is attached: 
[^ContinuousQueryRemoteFilterMissingInClassPathSelfTest.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11854) Serialization of arrays of primitives in python thin client is not optimal

2019-05-16 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11854:
-

 Summary: Serialization of arrays of primitives in python thin 
client is not optimal
 Key: IGNITE-11854
 URL: https://issues.apache.org/jira/browse/IGNITE-11854
 Project: Ignite
  Issue Type: Bug
  Components: thin client
Affects Versions: 2.7
Reporter: Denis Mekhanikov


The following code hangs indefinitely inside of invocation to {{my_cache.put()}}
{code:java}
from pyignite import Client

arr_len = 3_000_000

content = bytearray(arr_len)

for i in range(arr_len):
content[i] = i % 256

client = Client()
client.connect('127.0.0.1', 10800)
my_cache = client.get_or_create_cache('my cache')
my_cache.put("key_bin", content){code}
While the value is only 3MB in size. Implementation of serialization of 
primitive arrays seems to be quadratic in length of the array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11792) Web console agent throws NullPointerException if node endpoint is incorrect

2019-04-22 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11792:
-

 Summary: Web console agent throws NullPointerException if node 
endpoint is incorrect
 Key: IGNITE-11792
 URL: https://issues.apache.org/jira/browse/IGNITE-11792
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7
Reporter: Denis Mekhanikov


Starting web agent using the following command: 
{code:bash}
./ignite-web-agent.sh -n localhost:8080 -s https://console.gridgain.com/
{code}

Note, that {{localhost:8080}} is specified without the {{http://}} part. It 
leads to the following exception:
{noformat}
[ERROR][pool-1-thread-1][ClusterListener] WatchTask failed
java.lang.NullPointerException
at 
org.apache.ignite.console.agent.rest.RestExecutor.sendRequest(RestExecutor.java:185)
at 
org.apache.ignite.console.agent.rest.RestExecutor.sendRequest(RestExecutor.java:237)
at 
org.apache.ignite.console.agent.handlers.ClusterListener$WatchTask.restCommand(ClusterListener.java:421)
at 
org.apache.ignite.console.agent.handlers.ClusterListener$WatchTask.topology(ClusterListener.java:457)
at 
org.apache.ignite.console.agent.handlers.ClusterListener$WatchTask.run(ClusterListener.java:506)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}

The {{localhost:8080}} format should either be supported or a reasonable error 
message should be printed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: New Committer: Vyacheslav Daradur

2019-04-12 Thread Denis Mekhanikov
Well done Slava!

It was great working with you on the service grid redesign.
Looking forward to seeing new commits from you!

Denis

чт, 11 апр. 2019 г. в 18:27, Denis Magda :

> Well deserved, Vyacheslav! Thanks for hardening Service Grid pushing it to
> a completely next level!
>
> -
> Denis
>
>
> On Thu, Apr 11, 2019 at 7:00 AM Dmitriy Pavlov  wrote:
>
> > Dear Ignite Developers,
> >
> > The Project Management Committee (PMC) for Apache Ignite has invited
> > Vyacheslav Daradur to become a committer and we are pleased to announce
> > that he has accepted. Apache Ignite PMC appreciates Vyacheslav’s
> > contribution to service grid redesign (is was collaborative efforts. BTW,
> > thanks to everyone involved), compatibility test framework, contribution
> to
> > community development, and to abbreviation plugin.
> >
> > Being a committer enables easier contribution to the project since there
> is
> > no need to go via the patch submission process. This should enable better
> > productivity.
> >
> > Please join me in welcoming Vyacheslav, and congratulating him on the new
> > role in the Apache Ignite Community.
> >
> > Best Regards,
> > Dmitriy Pavlov
> > on behalf of the Apache Ignite PMC
> >
>


[jira] [Created] (IGNITE-11628) Document the possibility to use JAR files in UriDeploymentSpi

2019-03-26 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11628:
-

 Summary: Document the possibility to use JAR files in 
UriDeploymentSpi
 Key: IGNITE-11628
 URL: https://issues.apache.org/jira/browse/IGNITE-11628
 Project: Ignite
  Issue Type: Task
  Components: documentation
Reporter: Denis Mekhanikov
Assignee: Artem Budnikov
 Fix For: 2.8


{{UriDeploymentSpi}} got a possibility to support regular JAR files along with 
GARs in https://issues.apache.org/jira/browse/IGNITE-11380
This possibility should be reflected in the documentation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: UriDeploymentSpi and GAR files

2019-03-25 Thread Denis Mekhanikov
Folks,

I prepared a patch for the second ticket:
https://github.com/apache/ignite/pull/6177
Ilya is concerned, that if you had some JAR files, lying next to your GARs
in a repository, which is referred to over UriDeploymentSpi, then these
JARs will now be loaded as well. So, this is a behaviour change.
I don't think, that this is really a problem. I don't see a simple solution
to this, that wouldn't require an API change. And a complex change would be
an overkill here.
Loading what's located in the repository is pretty natural, so you
shouldn't be surprised, when JARs start loading after an Ignite version
upgrade.

What do you think?

Denis

чт, 21 февр. 2019 г. в 17:48, Denis Mekhanikov :

> I created the following tickets:
>
> https://issues.apache.org/jira/browse/IGNITE-11379 – drop support of GARs
> https://issues.apache.org/jira/browse/IGNITE-11380 – support JARs
> https://issues.apache.org/jira/browse/IGNITE-11381 – document ignite.xml
> file format.
>
> Denis
>
> ср, 20 февр. 2019 г. в 12:30, Nikolay Izhikov :
>
>> Hello, Denis.
>>
>> > This XML may contain task descriptors, but I couldn't find any
>> documentation on this format.
>> > This information can be provided in simple JAR files with the same file
>> structure.
>>
>> I support you proposal. Let's:
>>
>> 1. Support jar files instead of gar.
>> 2. Write down documentation about XML config format.
>> 3. Provide some examples.
>>
>> Can you crate a tickets for it?
>>
>>
>> ср, 20 февр. 2019 г. в 11:49, Denis Mekhanikov :
>>
>> > Denis,
>> >
>> > This XML may contain task descriptors, but I couldn't find any
>> > documentation on this format.
>> > Also it may contain a userVersion [1] parameter, which can be used to
>> force
>> > tasks redeployment in some cases.
>> >
>> > This information can be provided in simple JAR files with the same file
>> > structure.
>> > There is no need to confuse people and require their packages to have a
>> GAR
>> > extension.
>> >
>> > Also if you don't specify the task descriptors, then all tasks in the
>> file
>> > will be registered.
>> > So, I doubt, that anybody will bother specifying the descriptors. XML is
>> > not very user-friendly.
>> > This piece of configuration doesn't seem necessary to me.
>> >
>> > [1]
>> >
>> >
>> https://apacheignite.readme.io/docs/deployment-modes#section-un-deployment-and-user-versions
>> >
>> > Denis
>> >
>> > ср, 20 февр. 2019 г. в 01:35, Denis Magda :
>> >
>> > > Denis,
>> > >
>> > > What was the purpose of having XML and other files within the GARs?
>> Guess
>> > > it was somehow versioning related - you might have several tasks of
>> the
>> > > same class but different versions running in a cluster.
>> > >
>> > > -
>> > > Denis
>> > >
>> > >
>> > > On Tue, Feb 19, 2019 at 8:40 AM Ilya Kasnacheev <
>> > ilya.kasnach...@gmail.com
>> > > >
>> > > wrote:
>> > >
>> > > > Hello!
>> > > >
>> > > > Yes, I think we should accept plain JARs if anybody needs this at
>> all.
>> > > > Might still keep meta info support for compatibility.
>> > > >
>> > > > Regards,
>> > > > --
>> > > > Ilya Kasnacheev
>> > > >
>> > > >
>> > > > вт, 19 февр. 2019 г. в 19:38, Denis Mekhanikov <
>> dmekhani...@gmail.com
>> > >:
>> > > >
>> > > > > Hi!
>> > > > >
>> > > > > There is a feature in Ignite called DeploymentSpi [1], that allows
>> > > adding
>> > > > > and changing implementation of compute tasks without nodes'
>> downtime.
>> > > > > The only usable implementation right now is UriDeploymentSpi [2],
>> > which
>> > > > > lets you provide classes of compute tasks packaged as an archive
>> of a
>> > > > > special form. And this special form is the worst part.
>> > > > > GAR file is just like a JAR, but with some additional meta info.
>> It
>> > may
>> > > > > contain an XML with description of tasks, a checksum and also
>> > > > dependencies.
>> > > > >
>> > > > > We barely have any tools to build these files, and they can be
>> > replaced
>> > > > > with simple uber-JARs.
>> > > > > The only tool we have right now is IgniteDeploymentGarAntTask,
>> which
>> > is
>> > > > not
>> > > > > documented anywhere, and it's supposed to be used from a
>> > long-forgotten
>> > > > > Apache Ant build system.
>> > > > >
>> > > > > I don't think we need this file format. How about we deprecate and
>> > > remove
>> > > > > it and make UriDeploymentSpi support plain JARs?
>> > > > >
>> > > > > [1] https://apacheignite.readme.io/docs/deployment-spi
>> > > > > [2]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/deployment/uri/UriDeploymentSpi.html
>> > > > >
>> > > > > Denis
>> > > > >
>> > > >
>> > >
>> >
>>
>


[jira] [Created] (IGNITE-11575) Make UriDeploymentSpi ignore archives with untrusted signature

2019-03-19 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11575:
-

 Summary: Make UriDeploymentSpi ignore archives with untrusted 
signature
 Key: IGNITE-11575
 URL: https://issues.apache.org/jira/browse/IGNITE-11575
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Mekhanikov


{{UriDeploymentSpi}} checks whether a loaded JAR/GAR file has a correct 
signature. But there is no way to specify the expected public key. So, it's 
possible to perform a "man-in-the-middle" attack by amending an archive being 
transferred from a remote storage to an Ignite node.
It's even possible just to remove the signature, and a completely unsigned file 
will be processed without errors.

There should be a way to specify an expected public key, that should be used 
while signing archives.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11543) TransactionOptimisticException on topology change when readThrough is enabled

2019-03-14 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11543:
-

 Summary: TransactionOptimisticException on topology change when 
readThrough is enabled
 Key: IGNITE-11543
 URL: https://issues.apache.org/jira/browse/IGNITE-11543
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.7
Reporter: Denis Mekhanikov
 Attachments: 
CacheOptimisticTransactionTopologyChangeReadThroughTest.java

When topology changes during an optimistic serializable transaction on a 
replicated cache with {{readThrough}} enabled, 
{{TransactionOptimisticException}} may appear.

Cache configuration:
{noformat}
cacheMode: REPLICATED
atomicityMode: TRANSACTIONAL
readThrough: true
{noformat}

Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11531) Merge concurrent registrations of the same binary type

2019-03-12 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11531:
-

 Summary: Merge concurrent registrations of the same binary type
 Key: IGNITE-11531
 URL: https://issues.apache.org/jira/browse/IGNITE-11531
 Project: Ignite
  Issue Type: Improvement
  Components: binary
Reporter: Denis Mekhanikov


When a binary type is registered multiple times simultaneously, then a lot of 
type versions are generated with the same schema. It leads to long binary type 
registration especially on big topologies.

The following code sample demonstrates the problem:
{code:java}
public class LongRegistration {
public static void main(String[] args) throws InterruptedException {
Ignite ignite = Ignition.start(igniteConfig());

int threadsNum = 50;

ExecutorService exec = Executors.newFixedThreadPool(threadsNum);

CyclicBarrier barrier = new CyclicBarrier(threadsNum);

long startTime = System.currentTimeMillis();

// register(ignite);

for (int i = 0; i < threadsNum; i++)
exec.submit(new TypeRegistrator(ignite, barrier));

exec.shutdown();
exec.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);

System.out.println("Total registration time: " + 
(System.currentTimeMillis() - startTime));
}

private static IgniteConfiguration igniteConfig() {
IgniteConfiguration igniteCfg = new IgniteConfiguration();

TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();

ipFinder.setAddresses(Collections.singletonList("127.0.0.1:47500..47509"));

TcpDiscoverySpi discoverySpi = new TcpDiscoverySpi();
discoverySpi.setLocalAddress("127.0.0.1");
discoverySpi.setLocalPort(47500);
discoverySpi.setIpFinder(ipFinder);

igniteCfg.setDiscoverySpi(discoverySpi);

return igniteCfg;
}

private static void register(Ignite ignite) {
long startTime = System.currentTimeMillis();

IgniteBinary binary = ignite.binary();

BinaryObjectBuilder builder = binary.builder("TestType");

builder.setField("intField", 1);

builder.build();

System.out.println("Registration time: " + (System.currentTimeMillis() 
- startTime));
}

private static class TypeRegistrator implements Runnable {
private Ignite ignite;
private CyclicBarrier cyclicBarrier;

TypeRegistrator(Ignite ignite, CyclicBarrier cyclicBarrier) {
this.ignite = ignite;
this.cyclicBarrier = cyclicBarrier;
}

@Override public void run() {
try {
cyclicBarrier.await();

register(ignite);
} catch (InterruptedException | BrokenBarrierException e) {
e.printStackTrace();
}
}
}
}
{code}

This code sample leads to registration of 50 versions of the same type. The 
effect is more noticeable if a cluster contains a lot of nodes.

If you uncomment the call to {{register()}} method, then overall registration 
becomes 10 times faster on topology of 5 nodes.

Registration of matching types should be merged to avoid long processing of 
such cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11520) SQL schema is overwritten by static query entity configuration

2019-03-11 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11520:
-

 Summary: SQL schema is overwritten by static query entity 
configuration
 Key: IGNITE-11520
 URL: https://issues.apache.org/jira/browse/IGNITE-11520
 Project: Ignite
  Issue Type: Bug
  Components: sql
Affects Versions: 2.7, 2.4
Reporter: Denis Mekhanikov
 Fix For: 2.8


Steps to reproduce:
1. Start and restart a node with persistence enabled and the following cache 
configuration:
{code}




















{code}

2. Execute the following DDL statement:
{code}
ALTER TABLE "cache".Person ADD COLUMN lastName varchar;
{code}

3. Restart the node.

After the restart Person table contains only two columns



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11490) System data region metrics are disabled regardless of metricsEnabled flag

2019-03-06 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11490:
-

 Summary: System data region metrics are disabled regardless of 
metricsEnabled flag
 Key: IGNITE-11490
 URL: https://issues.apache.org/jira/browse/IGNITE-11490
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7
Reporter: Denis Mekhanikov


System data region metrics are disabled regardless of value of 
`DataStorageConfiguration.metricsEnabled` flag. Memory metrics can only be 
enabled explicitly at runtime.

Expected behaviour: `metricsEnabled` flag shouldn't be ignored for the system 
data region.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11381) Document XML task config format for UriDeploymentSpi

2019-02-21 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11381:
-

 Summary: Document XML task config format for UriDeploymentSpi
 Key: IGNITE-11381
 URL: https://issues.apache.org/jira/browse/IGNITE-11381
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Mekhanikov


{{UriDeploymentSpi}} lets archives with deployed classes contain {{ignite.xml}} 
file, describing tasks in the the archive. Format of this file should be 
documented and examples should be provided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: UriDeploymentSpi and GAR files

2019-02-21 Thread Denis Mekhanikov
I created the following tickets:

https://issues.apache.org/jira/browse/IGNITE-11379 – drop support of GARs
https://issues.apache.org/jira/browse/IGNITE-11380 – support JARs
https://issues.apache.org/jira/browse/IGNITE-11381 – document ignite.xml
file format.

Denis

ср, 20 февр. 2019 г. в 12:30, Nikolay Izhikov :

> Hello, Denis.
>
> > This XML may contain task descriptors, but I couldn't find any
> documentation on this format.
> > This information can be provided in simple JAR files with the same file
> structure.
>
> I support you proposal. Let's:
>
> 1. Support jar files instead of gar.
> 2. Write down documentation about XML config format.
> 3. Provide some examples.
>
> Can you crate a tickets for it?
>
>
> ср, 20 февр. 2019 г. в 11:49, Denis Mekhanikov :
>
> > Denis,
> >
> > This XML may contain task descriptors, but I couldn't find any
> > documentation on this format.
> > Also it may contain a userVersion [1] parameter, which can be used to
> force
> > tasks redeployment in some cases.
> >
> > This information can be provided in simple JAR files with the same file
> > structure.
> > There is no need to confuse people and require their packages to have a
> GAR
> > extension.
> >
> > Also if you don't specify the task descriptors, then all tasks in the
> file
> > will be registered.
> > So, I doubt, that anybody will bother specifying the descriptors. XML is
> > not very user-friendly.
> > This piece of configuration doesn't seem necessary to me.
> >
> > [1]
> >
> >
> https://apacheignite.readme.io/docs/deployment-modes#section-un-deployment-and-user-versions
> >
> > Denis
> >
> > ср, 20 февр. 2019 г. в 01:35, Denis Magda :
> >
> > > Denis,
> > >
> > > What was the purpose of having XML and other files within the GARs?
> Guess
> > > it was somehow versioning related - you might have several tasks of the
> > > same class but different versions running in a cluster.
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Tue, Feb 19, 2019 at 8:40 AM Ilya Kasnacheev <
> > ilya.kasnach...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hello!
> > > >
> > > > Yes, I think we should accept plain JARs if anybody needs this at
> all.
> > > > Might still keep meta info support for compatibility.
> > > >
> > > > Regards,
> > > > --
> > > > Ilya Kasnacheev
> > > >
> > > >
> > > > вт, 19 февр. 2019 г. в 19:38, Denis Mekhanikov <
> dmekhani...@gmail.com
> > >:
> > > >
> > > > > Hi!
> > > > >
> > > > > There is a feature in Ignite called DeploymentSpi [1], that allows
> > > adding
> > > > > and changing implementation of compute tasks without nodes'
> downtime.
> > > > > The only usable implementation right now is UriDeploymentSpi [2],
> > which
> > > > > lets you provide classes of compute tasks packaged as an archive
> of a
> > > > > special form. And this special form is the worst part.
> > > > > GAR file is just like a JAR, but with some additional meta info. It
> > may
> > > > > contain an XML with description of tasks, a checksum and also
> > > > dependencies.
> > > > >
> > > > > We barely have any tools to build these files, and they can be
> > replaced
> > > > > with simple uber-JARs.
> > > > > The only tool we have right now is IgniteDeploymentGarAntTask,
> which
> > is
> > > > not
> > > > > documented anywhere, and it's supposed to be used from a
> > long-forgotten
> > > > > Apache Ant build system.
> > > > >
> > > > > I don't think we need this file format. How about we deprecate and
> > > remove
> > > > > it and make UriDeploymentSpi support plain JARs?
> > > > >
> > > > > [1] https://apacheignite.readme.io/docs/deployment-spi
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/deployment/uri/UriDeploymentSpi.html
> > > > >
> > > > > Denis
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (IGNITE-11379) Drop support of GAR files

2019-02-21 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11379:
-

 Summary: Drop support of GAR files
 Key: IGNITE-11379
 URL: https://issues.apache.org/jira/browse/IGNITE-11379
 Project: Ignite
  Issue Type: Task
Reporter: Denis Mekhanikov


GAR file format doesn't seem to be actually needed in Ignite. There are 
virtually no tools for their assembly, and simple JARs with the same structure 
could be used instead.

Dev list discussion: 
http://apache-ignite-developers.2346864.n4.nabble.com/UriDeploymentSpi-and-GAR-files-td40869.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11380) Make UriDeploymentSpi support JAR files

2019-02-21 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11380:
-

 Summary: Make UriDeploymentSpi support JAR files
 Key: IGNITE-11380
 URL: https://issues.apache.org/jira/browse/IGNITE-11380
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Mekhanikov
 Fix For: 2.8


{{UriDeploymentSpi}} doesn't support JAR files. Only GAR files can be used 
currently.

It would be good to add possibility to provide classes to the {{UriDeployment}} 
packaged as plain JARs.

Dev list discussion: 
http://apache-ignite-developers.2346864.n4.nabble.com/UriDeploymentSpi-and-GAR-files-td40869.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11371) Cache get operation with readThrough returns null if remove is performed concurrently

2019-02-20 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11371:
-

 Summary: Cache get operation with readThrough returns null if 
remove is performed concurrently
 Key: IGNITE-11371
 URL: https://issues.apache.org/jira/browse/IGNITE-11371
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Mekhanikov
 Attachments: IgniteInvalidationNullRunner.java

Consider a situation, when you have a cache with {{CacheStore}} and 
{{readThrough}} configured.

One may expect, that {{IgniteCache#get(...)}} operation will never return 
{{null}} for keys, that are present in the underlying {{CacheStore}}. But 
actually it's possible to get {{null}} in case if remove operation is called on 
the same key while {{CacheStore#load}} is running.

Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: UriDeploymentSpi and GAR files

2019-02-20 Thread Denis Mekhanikov
Denis,

This XML may contain task descriptors, but I couldn't find any
documentation on this format.
Also it may contain a userVersion [1] parameter, which can be used to force
tasks redeployment in some cases.

This information can be provided in simple JAR files with the same file
structure.
There is no need to confuse people and require their packages to have a GAR
extension.

Also if you don't specify the task descriptors, then all tasks in the file
will be registered.
So, I doubt, that anybody will bother specifying the descriptors. XML is
not very user-friendly.
This piece of configuration doesn't seem necessary to me.

[1]
https://apacheignite.readme.io/docs/deployment-modes#section-un-deployment-and-user-versions

Denis

ср, 20 февр. 2019 г. в 01:35, Denis Magda :

> Denis,
>
> What was the purpose of having XML and other files within the GARs? Guess
> it was somehow versioning related - you might have several tasks of the
> same class but different versions running in a cluster.
>
> -
> Denis
>
>
> On Tue, Feb 19, 2019 at 8:40 AM Ilya Kasnacheev  >
> wrote:
>
> > Hello!
> >
> > Yes, I think we should accept plain JARs if anybody needs this at all.
> > Might still keep meta info support for compatibility.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > вт, 19 февр. 2019 г. в 19:38, Denis Mekhanikov :
> >
> > > Hi!
> > >
> > > There is a feature in Ignite called DeploymentSpi [1], that allows
> adding
> > > and changing implementation of compute tasks without nodes' downtime.
> > > The only usable implementation right now is UriDeploymentSpi [2], which
> > > lets you provide classes of compute tasks packaged as an archive of a
> > > special form. And this special form is the worst part.
> > > GAR file is just like a JAR, but with some additional meta info. It may
> > > contain an XML with description of tasks, a checksum and also
> > dependencies.
> > >
> > > We barely have any tools to build these files, and they can be replaced
> > > with simple uber-JARs.
> > > The only tool we have right now is IgniteDeploymentGarAntTask, which is
> > not
> > > documented anywhere, and it's supposed to be used from a long-forgotten
> > > Apache Ant build system.
> > >
> > > I don't think we need this file format. How about we deprecate and
> remove
> > > it and make UriDeploymentSpi support plain JARs?
> > >
> > > [1] https://apacheignite.readme.io/docs/deployment-spi
> > > [2]
> > >
> > >
> >
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/deployment/uri/UriDeploymentSpi.html
> > >
> > > Denis
> > >
> >
>


UriDeploymentSpi and GAR files

2019-02-19 Thread Denis Mekhanikov
Hi!

There is a feature in Ignite called DeploymentSpi [1], that allows adding
and changing implementation of compute tasks without nodes' downtime.
The only usable implementation right now is UriDeploymentSpi [2], which
lets you provide classes of compute tasks packaged as an archive of a
special form. And this special form is the worst part.
GAR file is just like a JAR, but with some additional meta info. It may
contain an XML with description of tasks, a checksum and also dependencies.

We barely have any tools to build these files, and they can be replaced
with simple uber-JARs.
The only tool we have right now is IgniteDeploymentGarAntTask, which is not
documented anywhere, and it's supposed to be used from a long-forgotten
Apache Ant build system.

I don't think we need this file format. How about we deprecate and remove
it and make UriDeploymentSpi support plain JARs?

[1] https://apacheignite.readme.io/docs/deployment-spi
[2]
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/deployment/uri/UriDeploymentSpi.html

Denis


IgniteServices.serviceProxy and local services

2019-02-06 Thread Denis Mekhanikov
Folks,

Currently IgniteServices.serviceProxy(...) [1] method is designed to return
a locally deployed service if it's available. The remote services will be
considered only if current one doesn't have the needed service locally.
This behaviour breaks the load-balancing feature of services. Let's say we
have ten nodes and a node singleton service, which is deployed on all of
these ten nodes. And we have an endpoint on one of the nodes, that provides
API of this service for external users. If we arrange things this way,
all service method invocations will be routed to the local node, which will
do all the work, while other 9 will be just chilling.
If the "local-first" optimization weren't applied, then work would be
evenly balanced between nodes.

For those who want a local service instance we have an
IgniteServices.service(...) [2] method. So, you can check it first, and if
you get null from it, then get a proxy for a remote instance.

Such change will change the public contract though. So, we need another
method for service proxy acquisition. Something like
*serviceProxy(String name, Class svcItf, boolean sticky, boolean
localFirst)*

The contract of the existing method can be changed in Ignite 3.0

What do you think?

Denis

[1]
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteServices.html#serviceProxy-java.lang.String-java.lang.Class-boolean-
[2]
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteServices.html#service-java.lang.String-


Re: Services hot redeployment

2019-02-05 Thread Denis Mekhanikov
Vyacheslav,

I think, we can use IgniteConfiguration#deploymentSpi for tasks and
services.
Or we can add an analogous property.

Nik,

> 1. Is it possible to change the list of deployed resources in runtime via
built-in DeploymentSPI implementations?
> Can I add or remove jar to(from) class-path without node(cluster) restart?
Yes, this is the reason why the DeploymentSpi exists. But currently only
compute grid can use it.

> 2. Can we update service dependencies via DeploymentSPI? commons-lang,
log4j or any other common library?
Ideally such libraries should be loaded via app class loader at node
startup. Otherwise the same libraries will be loaded multiple times. It
will lead to a lot of memory leaks.
I think, we can support loading of dependencies, but discourage users from
doing it. The proper way should be described in the documentation, and
warnings could be generated, if too many classes get loaded via
DeploymentSpi.

> 3. User has to execute explicit Ignite API calls(undeploy(), deploy()) to
renew service implementation. Is it correct?
> I think we should develop some watcher, that will be watch for a resource
change and redeploy services automatically.
Correct. I don't like the idea to redeploy services automatically. I think,
redeployment should be triggered explicitly. Non-obvious actions should be
avoided.

4. Such feature would for sure improve usability of the service grid. But
it requires much more time and work to implement.
I think, it's better not to expand the scope too much. Otherwise
development will take another 6 moths.
This is a great idea, and we will keep it in mind though.

5. Yep, we need an extensive documentation on the service deployment
procedure.
This feature may not be perfectly clear to users, so we need some how-tos.

Denis

вт, 5 февр. 2019 г. в 08:19, Nikolay Izhikov :

> Hello, Denis.
>
> Thank you for this discussion.
> I have a few notes:
>
> 1. Is it possible to change the list of deployed resources in runtime via
> built-in DeploymentSPI implementations?
> Can I add or remove jar to(from) class-path without node(cluster) restart?
>
> 2. Can we update service dependencies via DeploymentSPI? commons-lang,
> log4j or any other common library?
>
> 3. User has to execute explicit Ignite API calls(undeploy(), deploy()) to
> renew service implementation. Is it correct?
> I think we should develop some watcher, that will be watch for a resource
> change and redeploy services automatically.
>
> 4. DeploymentSPI is *node-wide* configuration. This means we change
> classpath for all services with this SPI.
> I think this is a huge limitation of the SPI.
> We should provide an ability to configure service-wide classpath to our
> users as quickly as possible.
> It is a common feature in modern service, task executor engines.
>
> I think the perfect world scenario would be following:
>
> 1. Start a client node or connect to a cluster with thin client.
>
> 2. Configure service classpath with some new Ignite API.
> The only requirement for classes - they should be available
> locally(on client node or thin client host).
>
> 3. User deploy the service with some Ignite API.
>
> 4. After depoyment completes successfully client node can be
> stopped.
> All required resource to run a service should be safely stored in
> cluster and deployed to any new node.
>
> 5. I think we should develop examples for a DeploymentSPI usage.
> As far as I can see, there is no such examples in our codebase for now.
> Is it correct? If so, I will create a ticket to create such examples.
>
> В Вт, 05/02/2019 в 01:08 +0300, Vyacheslav Daradur пишет:
> > Denis, thank you for driving of Service Grid's development!
> >
> > Sounds like a good plan. Does it mean that a user will have to
> > register a classloader for service's class explicitly in case of using
> > the feature?
> >
> > On Mon, Feb 4, 2019 at 4:38 PM Denis Mekhanikov 
> wrote:
> > >
> > > Igniters,
> > >
> > > I'd like to start a dedicated thread for discussion of the design of
> > > services hot redeployment. The previous service design discussion can
> be
> > > found in the following thread: [1]
> > >
> > > Currently adding a new service or implementation change of an existing
> one
> > > requires restarting the hosting nodes. Service instances are
> deserialized
> > > using an application class loader, so the service class should be
> present
> > > on the classpath of the node. The only way to change the set of
> available
> > > classes is to restart the node. Potentially the whole cluster restart
> can
> > > be required. This is a major drawback in the current design. This

Re: Services hot redeployment

2019-02-05 Thread Denis Mekhanikov
In general, I think, we should work on the improvements in the following
order:

1. Cluster availability, when services are being updated. Should be solved
by usage of the DeploymentSpi
2. Service availability, when they are being updated. This one will
probably require introduction of new API methods like
redeploy(ServiceConfiguration).
3. Service versioning and packaging.

I'd like to focus on the first point. Service grid will become much more
usable and mature once we implement it.

Denis

вт, 5 февр. 2019 г. в 14:06, Denis Mekhanikov :

> Vyacheslav,
>
> I think, we can use IgniteConfiguration#deploymentSpi for tasks and
> services.
> Or we can add an analogous property.
>
> Nik,
>
> > 1. Is it possible to change the list of deployed resources in runtime
> via built-in DeploymentSPI implementations?
> > Can I add or remove jar to(from) class-path without node(cluster)
> restart?
> Yes, this is the reason why the DeploymentSpi exists. But currently only
> compute grid can use it.
>
> > 2. Can we update service dependencies via DeploymentSPI? commons-lang,
> log4j or any other common library?
> Ideally such libraries should be loaded via app class loader at node
> startup. Otherwise the same libraries will be loaded multiple times. It
> will lead to a lot of memory leaks.
> I think, we can support loading of dependencies, but discourage users from
> doing it. The proper way should be described in the documentation, and
> warnings could be generated, if too many classes get loaded via
> DeploymentSpi.
>
> > 3. User has to execute explicit Ignite API calls(undeploy(), deploy())
> to renew service implementation. Is it correct?
> > I think we should develop some watcher, that will be watch for a
> resource change and redeploy services automatically.
> Correct. I don't like the idea to redeploy services automatically. I
> think, redeployment should be triggered explicitly. Non-obvious actions
> should be avoided.
>
> 4. Such feature would for sure improve usability of the service grid. But
> it requires much more time and work to implement.
> I think, it's better not to expand the scope too much. Otherwise
> development will take another 6 moths.
> This is a great idea, and we will keep it in mind though.
>
> 5. Yep, we need an extensive documentation on the service deployment
> procedure.
> This feature may not be perfectly clear to users, so we need some how-tos.
>
> Denis
>
> вт, 5 февр. 2019 г. в 08:19, Nikolay Izhikov :
>
>> Hello, Denis.
>>
>> Thank you for this discussion.
>> I have a few notes:
>>
>> 1. Is it possible to change the list of deployed resources in runtime via
>> built-in DeploymentSPI implementations?
>> Can I add or remove jar to(from) class-path without node(cluster) restart?
>>
>> 2. Can we update service dependencies via DeploymentSPI? commons-lang,
>> log4j or any other common library?
>>
>> 3. User has to execute explicit Ignite API calls(undeploy(), deploy()) to
>> renew service implementation. Is it correct?
>> I think we should develop some watcher, that will be watch for a resource
>> change and redeploy services automatically.
>>
>> 4. DeploymentSPI is *node-wide* configuration. This means we change
>> classpath for all services with this SPI.
>> I think this is a huge limitation of the SPI.
>> We should provide an ability to configure service-wide classpath to our
>> users as quickly as possible.
>> It is a common feature in modern service, task executor engines.
>>
>> I think the perfect world scenario would be following:
>>
>> 1. Start a client node or connect to a cluster with thin client.
>>
>> 2. Configure service classpath with some new Ignite API.
>> The only requirement for classes - they should be available
>> locally(on client node or thin client host).
>>
>> 3. User deploy the service with some Ignite API.
>>
>> 4. After depoyment completes successfully client node can be
>> stopped.
>> All required resource to run a service should be safely stored in
>> cluster and deployed to any new node.
>>
>> 5. I think we should develop examples for a DeploymentSPI usage.
>> As far as I can see, there is no such examples in our codebase for now.
>> Is it correct? If so, I will create a ticket to create such examples.
>>
>> В Вт, 05/02/2019 в 01:08 +0300, Vyacheslav Daradur пишет:
>> > Denis, thank you for driving of Service Grid's development!
>> >
>> > Sounds like a good plan. Does it mean that a user will have to
>> > register a classloader for service's class explicitl

Services hot redeployment

2019-02-04 Thread Denis Mekhanikov
Igniters,

I'd like to start a dedicated thread for discussion of the design of
services hot redeployment. The previous service design discussion can be
found in the following thread: [1]

Currently adding a new service or implementation change of an existing one
requires restarting the hosting nodes. Service instances are deserialized
using an application class loader, so the service class should be present
on the classpath of the node. The only way to change the set of available
classes is to restart the node. Potentially the whole cluster restart can
be required. This is a major drawback in the current design. This problem
should be addressed first.

At the same time this problem can be resolved by relatively simple code
changes. We need to change the way services are deserialized, and use a
mechanism, that allows dynamic class changes. Deployment SPI [2] seems to
be suitable for this. We can apply the same approach, which is used for
tasks, so services will become dynamically modifiable.

With this approach user will still need to perform a cancel-deploy routine
for the changed service. But even with that the usability improvement will
be huge. We'll think about service availability improvement after the first
part is finished.

Thoughts?

[1]
http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning-td20858.html
[2] https://apacheignite.readme.io/docs/deployment-spi#deploymentspi

Denis


Re: High priority TCP discovery messages

2019-01-30 Thread Denis Mekhanikov
Yakov,

> You can put hard limit and process enqued MetricsUpdate message
> if last one of the kind was processed more than metricsUpdFreq millisecs
ago.
 Makes sense. I'll try implementing it.

> I would suggest we allow queue overflow for 1 min, but if situation does
not go to normal then node
> should fire a special event and then kill itself.
Let's start with a warning in log and see, how they correlate with problems
with network/GC.
I'd like to make sure we don't kill innocents.

Anton,

> Maybe, better case it to have special "discovery like" channel (with ring
or analog) for metrics like messages
I don't think, that creating another data channel is reasonable. It will
require additional network connections and more complex configuration.
But splitting pings and metrics into different types of messages, as it was
before, and moving metrics distribution to communication
makes sense to me. Some kind of a gossip protocol could be used for it.

> Anyway, Why are fighting with duplicates inside the queue instead of
> fighting with new message initial creation while previous not yet
processed
> on the cluster?

A situation, when multiple metrics update messages exist in the cluster, is
normal.
Node availability check is based on the fact, that it receives fresh
metrics once in metricsUpdateFreq ms.
If you make a coordinator wait for a previous metrics update message to be
delivered before issuing a new one,
then this frequency will depend on the number of nodes in the cluster,
since time of one round-trip with differ on different topologies.

Alex,

I didn't check it yet. Theoretically, nodes will fail a bit more often,
when their discovery worker queues are flooded with messages.
This change definitely requires extensive testing.

I think you can make metric update messages have a regular priority
separately from fixing the issue, that I described.

Denis

вт, 29 янв. 2019 г. в 20:44, Alexey Goncharuk :

> Folks,
>
> Did we already check that omitting hearbeat priority does not break
> discovery? I am currently working on another issue with discovery and
> skipping hearbeat priority would help a lot in my case.
>
> --AG
>
> пт, 11 янв. 2019 г. в 23:21, Yakov Zhdanov :
>
> > > How big the message worker's queue may grow until it becomes a problem?
> >
> > Denis, you never know. Imagine node may be flooded with messages because
> of
> > the increased timeouts and network problems. I remember some cases with
> > hundreds of messages in queue on large topologies. Please, no O(n)
> > approaches =)
> >
> > > So, we may never come to a point, when an actual
> > TcpDiscoveryMetricsUpdateMessage is processed.
> >
> > Good catch! You can put hard limit and process enqued MetricsUpdate
> message
> > if last one of the kind was processed more than metricsUpdFreq millisecs
> > ago.
> >
> > Denis, also note - initial problem is message queue growth. When we
> choose
> > to skip messages it means that node cannot process certain messages and
> > most probably experiencing problems. We need to think of killing such
> > nodes. I would suggest we allow queue overflow for 1 min, but if
> situation
> > does not go to normal then node should fire a special event and then kill
> > itself. Thoughts?
> >
> > --Yakov
> >
>


[jira] [Created] (IGNITE-11062) Calculating Compute Usage section contains confusing numbers

2019-01-24 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-11062:
-

 Summary: Calculating Compute Usage section contains confusing 
numbers
 Key: IGNITE-11062
 URL: https://issues.apache.org/jira/browse/IGNITE-11062
 Project: Ignite
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7
Reporter: Denis Mekhanikov
Assignee: Prachi Garg


[Calculating Compute 
Usage|https://apacheignite.readme.io/docs/capacity-planning#section-calculating-compute-usage]
 section on the capacity planning page contains pretty confusing results. One 
may get an impression, that SQL queries are faster than cache API, and SQL 
should be used everywhere it's possible. This is pretty far from the truth, so 
it's better to rework this section.

Plus the provided link doesn't contain the mentioned results. It's better to 
specify a link to the following page and get benchmark results from it: 
[https://www.gridgain.com/resources/benchmarks/gridgain-benchmarks-results]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10959) Memory leaks in continuous query handlers

2019-01-16 Thread Denis Mekhanikov (JIRA)
Denis Mekhanikov created IGNITE-10959:
-

 Summary: Memory leaks in continuous query handlers
 Key: IGNITE-10959
 URL: https://issues.apache.org/jira/browse/IGNITE-10959
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7
Reporter: Denis Mekhanikov
Assignee: Denis Mekhanikov
 Fix For: 2.8
 Attachments: CacheContinuousQueryMemoryUsageTest.java

Continuous query handlers don't clear internal data structures after cache 
events are processed.

A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: High priority TCP discovery messages

2019-01-11 Thread Denis Mekhanikov
I like the idea to make all messages be processed with equal priority.
It will make nodes with overgrown discovery message queues die more often,
though. But maybe, this is how it's supposed to work.

Denis

пт, 11 янв. 2019 г. в 16:26, Denis Mekhanikov :

> Yakov,
>
> Sounds good. But there is a flaw in the procedure, that you described.
> If we have a TcpDiscoveryMetricsUpdateMessage in a queue, and a newer one
> arrives, then we will consider the existing one obsolete and won't process
> it. The newest metrics update message will be moved to the queue's tail,
> thus delaying the moment, when it will be processed.
> So, we may never come to a point, when an actual
> TcpDiscoveryMetricsUpdateMessage is processed.
> But if we replace an existing message with a newer one, and save the old
> position in a queue, then this approach will work. It will require a more
> complex synchronization, though, so it will still lead to some overhead.
>
> How big the message worker's queue may grow until it becomes a problem? If
> it's 20 elements max, then linear time check is not that bad.
> BTW, RingMessageWorker#addMessage method checks for duplicating messages
> in the queue for some message types, including all custom discovery
> messages, which is done in linear time.
>
> Denis
>
> пт, 11 янв. 2019 г. в 00:54, Yakov Zhdanov :
>
>> Denis, what if we remove priority difference for messages and always add
>> new to the end of the queue?
>>
>> As far as traversing the queue - I don't like O(n) approaches =). So, with
>> adding all messages to the end of the queue (removing prio difference) I
>> would suggest that we save latest 1st lap message and latest 2nd lap
>> message and process metrics message in message worker thread in queue
>> order
>> if they are latest and skip the otherwise.
>>
>> Does this make sense?
>>
>> --Yakov
>>
>


Re: High priority TCP discovery messages

2019-01-11 Thread Denis Mekhanikov
Yakov,

Sounds good. But there is a flaw in the procedure, that you described.
If we have a TcpDiscoveryMetricsUpdateMessage in a queue, and a newer one
arrives, then we will consider the existing one obsolete and won't process
it. The newest metrics update message will be moved to the queue's tail,
thus delaying the moment, when it will be processed.
So, we may never come to a point, when an actual
TcpDiscoveryMetricsUpdateMessage is processed.
But if we replace an existing message with a newer one, and save the old
position in a queue, then this approach will work. It will require a more
complex synchronization, though, so it will still lead to some overhead.

How big the message worker's queue may grow until it becomes a problem? If
it's 20 elements max, then linear time check is not that bad.
BTW, RingMessageWorker#addMessage method checks for duplicating messages in
the queue for some message types, including all custom discovery messages,
which is done in linear time.

Denis

пт, 11 янв. 2019 г. в 00:54, Yakov Zhdanov :

> Denis, what if we remove priority difference for messages and always add
> new to the end of the queue?
>
> As far as traversing the queue - I don't like O(n) approaches =). So, with
> adding all messages to the end of the queue (removing prio difference) I
> would suggest that we save latest 1st lap message and latest 2nd lap
> message and process metrics message in message worker thread in queue order
> if they are latest and skip the otherwise.
>
> Does this make sense?
>
> --Yakov
>


Re: Time to remove automated messages from the devlist?

2019-01-10 Thread Denis Mekhanikov
+1

I think, a separate list for JIRA notification is needed,
as they are more important, than other ones.
notificati...@ignite.apache.org may still aggregate all automatically
generated
messages from all sources.

So, I'm for stopping sending JIRA messages to the dev list, and sending them
to notifications and issues lists instead.

Denis

чт, 10 янв. 2019 г. в 18:14, Павлухин Иван :

> +1 for moving JIRA notifications out of dev-list. No strict opinion
> which list should be a destination for them, I am fine with both
> options.
>
> By the way Community Resources page [1] refers to 2 another lists
> iss...@ignite.apache.org and comm...@ignite.apache.org (but
> notificati...@ignite.apache.org is not listed there). Does anyone know
> why these lists are needed? Does anyone use any of them?
>
> [1] https://ignite.apache.org/community/resources.html
>
> чт, 10 янв. 2019 г. в 17:57, Alexey Kuznetsov :
> >
> > +1 for  j...@ignite.apache.org
> >
> > On Thu, Jan 10, 2019 at 6:55 PM Dmitriy Pavlov 
> wrote:
> >
> > > Hi Igniters,
> > >
> > > After removal of GitHub Comments from the list I have (a very
> subjective)
> > > feeling, that there became more human-human interaction, which is
> > > definitely more important that opportunity to control new JIRA tickets
> > > using the list.
> > >
> > > I suggest coming back to the idea of moving JIRA to a separate list.
> Please
> > > share your vision on this topic. Should it be j...@ignite.apache.org
> or we
> > > should reuse notificati...@ignite.apache.org
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > ср, 21 нояб. 2018 г. в 15:25, Dmitriy Pavlov :
> > >
> > > > Please start a vote according to
> > > > https://www.apache.org/foundation/voting.html
> > > > Anyone can start a vote, you don't need to be a PMC.
> > > >
> > > > I don't feel it is a very important issue to remove notifications
> from
> > > the
> > > > list, as it can be easily filtered out using mail setup. But if
> someone
> > > > feels it is really disturbing, please go ahead. I'm ok with GitHub
> > > > redirection, but I will not drive this topic.
> > > >
> > > > ср, 21 нояб. 2018 г. в 11:40, Павлухин Иван :
> > > >
> > > >> Dmitriy, let's proceed with it.
> > > >> вт, 20 нояб. 2018 г. в 23:20, Dmitriy Pavlov :
> > > >> >
> > > >> > One more thing I want to emphasize here. We can't just remove
> > > messages,
> > > >> it
> > > >> > _must_ be sent to some list, which is why we need some additional
> > > list,
> > > >> > e.g. notifications@ for this.
> > > >> >
> > > >> > So only one option to proceed here is to run a formal vote on list
> > > >> creation
> > > >> > and redirection of github/gitbox messages to a new list.
> > > >> >
> > > >> > пн, 19 нояб. 2018 г. в 18:23, Dmitriy Pavlov  >:
> > > >> >
> > > >> > > Denis, we need because contributors do not announce their
> > > >> > > intent/designs/etc manually. It is the best way ever? No, of
> course.
> > > >> > >
> > > >> > > We have consensus on PR removal, so let's do it and see results.
> > > >> > >
> > > >> > > пн, 19 нояб. 2018 г. в 18:11, Denis Mekhanikov <
> > > dmekhani...@gmail.com
> > > >> >:
> > > >> > >
> > > >> > >> Dmitriy,
> > > >> > >>
> > > >> > >> If a person wants to track all new tickets, then he may go to
> JIRA,
> > > >> create
> > > >> > >> a filter for Ignite tickets
> > > >> > >> and subscribe to it. JIRA has a pretty flexible configuration
> of
> > > >> filters
> > > >> > >> and subscriptions, so you can
> > > >> > >> specify exactly what issues you are interested in, and how
> often
> > > you
> > > >> want
> > > >> > >> to receive these emails.
> > > >> > >> This is much more convenient and more flexible than filtering
> > > emails
> > > >> from
> > > >> > >> a
> > > >> > >> bot.
> > > >> > >>
> > > >> > >> So,

High priority TCP discovery messages

2019-01-10 Thread Denis Mekhanikov
A bit of background:
When TcpDiscoverySpi is used, TcpDiscoveryMetricsUpdateMessage is sent by a
coordinator once in metricsUpdateFrequency, which is 2 seconds by default.
It serves as a ping message, which ensures, that the ring is connected, and
all nodes are alive. These messages have a high priority, i.e., they are
put into the head of the RightMessageWorker's queue instead of its tail.

Now consider a situation, when a single link between two nodes in the ring
works slowly. It may receive and deliver all messages, but with a certain
delay. This situation is possible, when network is unstable or one of the
nodes experiences a lot of short GC pauses.
It leads to a growing message queue on the node, that stands before the
slow link. The worst part is that if high priority messages are generated
faster than they are processed, then other messages won't even get a chance
to be processed. Thus, no nodes will be kicked out of the cluster, but no
useful progress will happen. Partition map exchange may hang for this
reason, and the reason won't be obvious from the logs.

JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-10808
I made a draft of the fix for this problem:
https://github.com/apache/ignite/pull/5771
The PR also contains a test, that reproduces this situation.

The patch makes sure, that only a single TcpDiscoveryMetricsUpdateMessage
of one kind is stored in the queue, i.e., if a newer message comes to the
node, then the old one is discarded. It also checks, that regular messages
are also processed. If the last processed message had a high priority, then
new high-priority message won't be put to the head of the queue, but to the
tail instead.
The fix addresses the described problem, but increases the utilization of
the discovery threads a bit.

What do you think of this fix? Do you have better ideas, how to improve the
heartbeat mechanism to avoid such situations?

Denis


Re: Service grid redesign

2018-12-19 Thread Denis Mekhanikov
Guys,

I finished my code review. The pool request looks good to me.

Does anybody else want to look at the changes?
There are a few points, that we didn't meet an agreement on,
though they don't affect the behaviour in any way:

   - *Class naming. * See the discussion above.
   - *Unnecessary task object cleaning. *
   IMO, ServicesDeploymentTask#clear() method doesn't do anything useful,
   and it should be removed.
   By the moment, when this method is called, the task object is removed
   from all collections anyway, so it's ready for garbage collection.
   Removing data from it doesn't help anybody.
   -
*Unnecessary tests. *ServiceInfoSelfTest and
   ServicesDeploymentProcessIdSelfTest look excessive to me.
   I don't see any point in testing an interface implementation, that only
   saves some objects and returns them from certain methods.
   - Interface for events with servicesDeploymentActions() method.
   Take a look at the discussion:
   
https://github.com/apache/ignite/pull/4434/files/30e69d9a53ce6ea16c4e9d15354e94360caa719d#r239442342

Also solution with *DiscoveryCustomEvent#nullifyingCustomMsgLock* looks
clumsy to me.
The problem with nullifying of *DiscoveryCustomEvent#customMsg* field can
be solved
by making *ServiceDiscoveryListener* a high priority listener.

Or *DiscoveryCustomEvent#customMessage()* method could be marked
synchronized and
*GridEventStorageManager#notifyListeners(..)* method could synchronize on
the event object.
But this solution is the same, it's just a matter of taste.

If anybody wants to look the the code of the PR, please consider these
points as well.

Denis

ср, 19 дек. 2018 г. в 17:37, Nikolay Izhikov :

> Denis,
>
> I don't think that differences with your and my naming is huge :)
> And, it's definetely a matter of taste.
>
> If there is no any other issues with PR let's rename and move on! :)
>
> ср, 19 дек. 2018 г. в 17:32, Vyacheslav Daradur :
>
> > > We have IgniteServiceProcessor and GridServiceProcessor with singular
> > "Service"
> >
> > Maybe we should rename new 'IgniteServiceProcessor' to
> > 'IgniteServicesProcessor'?
> >
> > > And ServiceSingleDeploymentsResults name doesn't make sense to me.
> > > "Single deployments" doesn't sound right.
> >
> > 'Single' means 'single node', maybe we should use one of the following:
> > - 'ServicesSingleNodeDeploymentsResults'
> > - 'ServicesNodeDeploymentsResults'
> > - 'ServicesInstanceDeploymentsResults'
> >
> > On Wed, Dec 19, 2018 at 4:26 PM Denis Mekhanikov 
> > wrote:
> > >
> > > Slava,
> > > I think, it's better to replace word "Change" with "Request".
> > >
> > > Nik,
> > > We have IgniteServiceProcessor and GridServiceProcessor with singular
> > > "Service",
> > > ServicesDeploymentManager and ServicesDeploymentTask with plural
> > "Services"
> > > for some reason.
> > > So, you need to remember, where Service and where Services is used.
> > > I think, we should unify these names.
> > > And ServiceSingleDeploymentsResults name doesn't make sense to me.
> > > "Single deployments" doesn't sound right.
> > >
> > > ServicesFullDeploymentsMessage is derived
> > > from GridDhtPartitionsFullMessage.
> > > It doesn't really reflect its function. This message is supposed to
> mark
> > > the point in time, when deployment is finished.
> > >
> > > Denis
> > >
> > >
> > > пт, 14 дек. 2018 г. в 11:30, Vyacheslav Daradur :
> > >
> > > > >*1. Testing of the cache-based implementation of the service grid.*
> > > > > I think, we should make a test suite, that will test the old
> > > > implementation
> > > > > until we remove it from the project.
> > > >
> > > > Agree. This is exactly what should be done as the first step once
> > > > phase 1 will be merged.
> > > > I think all tests in the package:
> > > > "org.apache.ignite.internal.processors.service" should be moved to
> > > > separate test-suite and new build-plan should be added on TC and
> > > > included in RunAll.
> > > >
> > > > > *2. DynamicServiceChangeRequest.*
> > > > > I think, this class should be splat into two.
> > > >
> > > > Personally, I agree, but I have faced opposition at the design step.
> > > > I changed to the following structure:
> > > >
> > > > abstract class ServiceAbstractChange implements Serializable {
> > > > protected 

Re: Service grid redesign

2018-12-19 Thread Denis Mekhanikov
Slava,
I think, it's better to replace word "Change" with "Request".

Nik,
We have IgniteServiceProcessor and GridServiceProcessor with singular
"Service",
ServicesDeploymentManager and ServicesDeploymentTask with plural "Services"
for some reason.
So, you need to remember, where Service and where Services is used.
I think, we should unify these names.
And ServiceSingleDeploymentsResults name doesn't make sense to me.
"Single deployments" doesn't sound right.

ServicesFullDeploymentsMessage is derived
from GridDhtPartitionsFullMessage.
It doesn't really reflect its function. This message is supposed to mark
the point in time, when deployment is finished.

Denis


пт, 14 дек. 2018 г. в 11:30, Vyacheslav Daradur :

> >*1. Testing of the cache-based implementation of the service grid.*
> > I think, we should make a test suite, that will test the old
> implementation
> > until we remove it from the project.
>
> Agree. This is exactly what should be done as the first step once
> phase 1 will be merged.
> I think all tests in the package:
> "org.apache.ignite.internal.processors.service" should be moved to
> separate test-suite and new build-plan should be added on TC and
> included in RunAll.
>
> > *2. DynamicServiceChangeRequest.*
> > I think, this class should be splat into two.
>
> Personally, I agree, but I have faced opposition at the design step.
> I changed to the following structure:
>
> abstract class ServiceAbstractChange implements Serializable {
> protected final IgniteUuid srvcId;
> }
>
> class ServiceDeploymentChange extends ServiceAbstractChange {
> ServiceConfiguration cfg;
> }
>
> class ServiceUndeploymentChange extends ServiceAbstractChange { }
>
> I hope that further reviewers will agree with us.
>
> > *3. Naming.*
>
> About "Services" -> "Service" and "Deployments" -> "Deployment"
> Personally, I agree with Nikolay, because it's more descriptive since
> manages several services, not single.
> But, I understand Denis's point of view, we have a lot of classes with
> "Service" prefix in naming and "Services" looks a bit alien.
>
> > *DynamicServicesChangeRequestBatchMessage -> DynamicServiceChangeRequest*
> Prefix "Dynamic" has no sense anymore since we reworked message
> structure as in p.2. so "ServiceChangeBatchRequest" will be better
> name.
>
> > *ServicesSingleDeploymentsMessage -> ServiceDeploymentResponse*
> It's not a response and is not sent to the sender. This message is
> sent to the coordinator and contains *single node* deployments.
>
> > *ServicesFullDeploymentsMessage -> ServiceDeploymentFinishMessage*
> This should be named similar way as the previous one, but the message
> contains deployments of *full set of nodes*.
>
>
> On Fri, Dec 14, 2018 at 10:58 AM Nikolay Izhikov 
> wrote:
> >
> > Hello, Denis.
> >
> > Great news.
> >
> > > *1. Testing of the cache-based implementation of the service grid.*
> > > I think, we should make a test suite, that will test the old
> implementation> until we> remove it from the project.
> >
> > Aggree. Let's do it.
> >
> > > *2. DynamicServiceChangeRequest.*
> > > I think, this class should be splat into two.
> >
> > Agree. Lets's do it.
> >
> > > *ServicesDeploymentManager*, *ServicesDeploymentTask *and all other
> classes> with Services word in them.
> > > I think, they would look better if we use a singular word *Service
> *instead.
> > > Same for *Deployments*.
> >
> > Personally, I want that names as clearly as possible reflects class
> content for reader.
> > If we deploy *several* services then it has to be Service*S*.
> >
> > Same for deployment - if this message will initiate single deployment
> process then it should use deployment.
> > otherwise - deployments.
> >
> > So my opinion - it's better to keep current naming.
> >
> > В Чт, 13/12/2018 в 19:36 +0300, Denis Mekhanikov пишет:
> > > Guys,
> > >
> > > I've been looking through the PR by Vyacheslav for past few weeks.
> > > Slava, great job! You've done an impressive amount of work.
> > >
> > > I posted my comments to the PR and had a few calls with Slava.
> > > I am close to finishing my review.
> > > There are some points, that I'd like to settle in this discussion to
> avoid
> > > controversy.
> > >
> > > *1. Testing of the cache-based implementation of the service grid.*
> > > I think, we should make a test s

Re: Continuous queries and duplicates

2018-12-14 Thread Denis Mekhanikov
Guys, FYI:

Partition counters are already a part of the public API.
The following method reveals this information:
CacheQueryEntryEvent#getPartitionUpdateCounter()

I also think, that this kind of information shouldn't be accessible by user,
but I don't see, how to prevent the duplication problem with it neither.

Denis

чт, 13 дек. 2018 г. в 23:40, Vladimir Ozerov :

> [1]
>
> http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html
>
> On Thu, Dec 13, 2018 at 11:38 PM Vladimir Ozerov 
> wrote:
>
> > Denis,
> >
> > Not really. They are used to ensure that ordering of notifications is
> > consistent with ordering of updates, so that when a key K is updated to
> V1,
> > then V2, then V3, you never observe V1 -> V3 -> V2. It also solves
> > duplicate notification problem in case of node failures, when the same
> > update is delivered twice.
> >
> > However, partition counters are unable to solve duplicates problem in
> > general. Essentially, the question is how to get consistent view on some
> > data plus all notifications which happened afterwards. There are only two
> > ways to achieve this - either lock entries during initial query, or take
> a
> > kind of consistent data snapshot. The former was never implemented in
> > Ignite - our Scan and SQL queries do not user locking. The latter is
> > achievable in theory with MVCC. I raised that question earlier [1] (see
> > p.2), and we came to conclusion that it might be a good feature for the
> > product. It is not implemented that way for MVCC now, but most probably
> is
> > not extraordinary difficult to implement.
> >
> > Vladimir.
> >
> > [1]
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html#a33998
> >
> > On Thu, Dec 13, 2018 at 11:17 PM Denis Magda  wrote:
> >
> >> Vladimir,
> >>
> >> The partition counter is supposed to be used internally to solve the
> >> duplication issue. Does it sound like a right approach then?
> >>
> >> What would be an approach for SQL queries? Not sure the partition
> counter
> >> is applicable.
> >>
> >> --
> >> Denis
> >>
> >> On Thu, Dec 13, 2018 at 11:16 AM Vladimir Ozerov 
> >> wrote:
> >>
> >> > Partition counter is internal implemenattion detail, which has no
> >> sensible
> >> > meaning to end users. It should not be exposed through public API.
> >> >
> >> > On Thu, Dec 13, 2018 at 10:14 PM Denis Magda 
> wrote:
> >> >
> >> > > Hello Piotr,
> >> > >
> >> > > That's a known problem and I thought a JIRA ticket already exists.
> >> > However,
> >> > > failed to locate it. The ticket for the improvement should be
> created
> >> as
> >> > a
> >> > > result of this conversation.
> >> > >
> >> > > Speaking of an initial query type, I would differentiate from
> >> ScanQueries
> >> > > and SqlQueries. For the former, it sounds reasonable to apply the
> >> > > partitionCounter logic. As for the latter, Vladimir Ozerov will it
> be
> >> > > addressed as part of MVCC/Transactional SQL activities?
> >> > >
> >> > > Btw, Piotr what's your initial query type?
> >> > >
> >> > > --
> >> > > Denis
> >> > >
> >> > > On Thu, Dec 13, 2018 at 3:28 AM Piotr Romański <
> >> piotr.roman...@gmail.com
> >> > >
> >> > > wrote:
> >> > >
> >> > > > Hi, as suggested by Ilya here:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> http://apache-ignite-users.70518.x6.nabble.com/Continuous-queries-and-duplicates-td25314.html
> >> > > > I'm resending it to the developers list.
> >> > > >
> >> > > > From that thread we know that there might be duplicates between
> >> initial
> >> > > > query results and listener entries received as part of continuous
> >> > query.
> >> > > > That means that users need to manually dedupe data.
> >> > > >
> >> > > > In my opinion the manual deduplication in some use cases may lead
> to
> >> > > > possible memory problems on the client side. In order to remove
> >> > > duplicated
> >> > > > notifications which we are receiving in the local listener, we
> need
> >> to
> >> > > keep
> >> > > > all initial query results in memory (or at least their unique
> ids).
> >> > > > Unfortunately, there is no way (is there?) to find a point in time
> >> when
> >> > > we
> >> > > > can be sure that no dups will arrive anymore. That would mean that
> >> we
> >> > > need
> >> > > > to keep that data indefinitely and use it every time a new
> >> notification
> >> > > > arrives. In case of multiple continuous queries run from a single
> >> JVM,
> >> > > this
> >> > > > might eventually become a memory or performance problem. I can see
> >> the
> >> > > > following possible improvements to Ignite:
> >> > > >
> >> > > > 1. The deduplication between initial query and incoming
> notification
> >> > > could
> >> > > > be done fully in Ignite. As far as I know there is already the
> >> > > > updateCounter and partition id for all 

Re: Service grid redesign

2018-12-13 Thread Denis Mekhanikov
Guys,

I've been looking through the PR by Vyacheslav for past few weeks.
Slava, great job! You've done an impressive amount of work.

I posted my comments to the PR and had a few calls with Slava.
I am close to finishing my review.
There are some points, that I'd like to settle in this discussion to avoid
controversy.

*1. Testing of the cache-based implementation of the service grid.*
I think, we should make a test suite, that will test the old implementation
until we
remove it from the project.

*2. DynamicServiceChangeRequest.*
I think, this class should be splat into two.
I don't see any point in having a single class with "*flags"* field, that
shows, what action it actually represents.
Usage of *deploy(), markDeploy(...), undeploy(), markUndeploy(...)* looks
wrong.
Why not have a separate message type for each action instead?

*3. Naming.*
I suggest renaming the following classes:
*ServicesDeploymentManager*, *ServicesDeploymentTask *and all other classes
with Services word in them.
I think, they would look better if we use a singular word *Service *instead.
Same for *Deployments*.
I propose the following class names:

*ServicesDeploymentManager -> ServiceDeploymentManager*
*ServicesDeploymentActions -> ServiceDeploymentActions*
*ServicesDeploymentTask -> ServiceDeploymentTask*
*ServicesCommonDiscoveryData -> ServiceCommonDiscoveryData*
*ServicesJoinNodeDiscoveryData -> ServiceJoiningNodeDiscoveryData*

*DynamicServicesChangeRequestBatchMessage -> DynamicServiceChangeRequest*
*ServicesSingleDeploymentsMessage -> ServiceDeploymentResponse*
*ServicesFullDeploymentsMessage -> ServiceDeploymentFinishMessage*

*ServiceSingleDeploymentsResults -> ServiceSingleDeploymentResult*
*ServiceFullDeploymentsResults -> ServiceFullDeploymentResult*

Let's do this as the final step of the code review to avoid repeated
renaming.

Denis

чт, 6 дек. 2018 г. в 15:21, Denis Mekhanikov :

> Alexey,
>
> I don't see any problem in letting services work on a deactivated cluster.
> All services need is discovery messages and compute tasks.
> Both of these features are available at all times.
>
> But it should be configurable. Services may need caches for their work,
> so it's better to undeploy such services on cluster deactivation.
> We may introduce a new property in ServiceConfiguration.
>
> I think, this topic deserves a separate discussion.
> Could you start another thread?
>
> Denis
>
> чт, 6 дек. 2018 г. в 13:27, Alexey Kuznetsov :
>
>> Hi,   Vyacheslav!
>>
>> I'm thinking about to use Services API to implement Web Agent as a cluster
>> singleton service.
>> It will improve Web Console UX, because it will not needed to start
>> separate java program.
>> Just start cluster with Web agent enabled on cluster configuration.
>>
>> But in order to do this, I need that services should:
>>   1) Work when cluster NOT ACTIVE.
>>   2) Auto restart with cluster (when cluster was restarted).
>>
>> Could we support mentioned features on "Service Grid redesign - phase 2" ?
>>
>> Please let me know.
>>
>> --
>> Alexey Kuznetsov
>>
>


Re: Use of marshaller at node startup routine (need advice)

2018-12-12 Thread Denis Mekhanikov
Slava,

Interface *Service *extends *Serializable.* So, all services are supposed
to be serializable by the JdkMarshaller.
Usage of *BinaryMarshaller* or *OptimizedMarshaller* makes sense only from
performance point of view.
But I don't think, that we should try too hard to minimize the performance
impact of services serialization,
since it doesn't happen too often.

There are some tests like *IgniteServiceConfigVariationsFullApiTest*, that
check, that services, which are not
serializable, can be successfully deployed. I think, these tests should be
removed.
It's reasonable to require serializability, since all *Services* are marked
as *Serializable*, as I already mentioned.

Slava and I discussed a possibility to choose a marshaller, depending on
the node state.
If a node is already connected to the cluster, then it could use a binary
marshaller,
otherwise JDK marshaller could be used.
I think, if we decide to do so, then it will complicate logic and confuse
users.
This problem exists only for static services. They are not different from
dynamic ones,
except for the way of configuration, and a moment of deployment.
I don't see, why different constraints should be applied to them.

So, I'm for using the *JdkMarshaller* regardless of the service type or a
node state.

Denis

пт, 7 дек. 2018 г. в 15:57, Vyacheslav Daradur :

> Igniters, I need your advice about the following problem:
>
> It is necessary to serialize an object (just convert an object to
> bytes array) for including it in joining node data (DiscoveryDataBag)
> *at node startup routine*.
>
> The marshalling hangs If we use 'BinaryMarshaller' or
> 'OptimizedMarshaler' because class can't be registered in
> MarshallerContextImpl#registerClassName -> transport.proposeMapping on
> account of the request which can't be sent through discovery-spi at
> the moment.
>
> Also, 'JdkMarshaller' can't be used because it imposes limits on
> objects that should implement 'Serializable' interface. But this
> restriction is unacceptable for the case.
>
> As a workaround solution, an external library, like KRYO, can be used.
>
> What tools also available in the project to solve this problem?
>
> --
> Best Regards, Vyacheslav D.
>


Re: [Result][VOTE] Creation dedicated list for github notifiacations

2018-12-11 Thread Denis Mekhanikov
Great news!

Is it possible to move all GitHub messages from the archive of the
developers list
to the newly-created one?

Denis

вт, 11 дек. 2018 г. в 10:05, Dmitriy Pavlov :

> Hi Igniters,
>
> Infra changed notifications, so now GitHub emails are being sent to
> notifications@
>
> Web UI: https://lists.apache.org/list.html?notificati...@ignite.apache.org
>
> You may subscribe to the list notifications-subscr...@ignite.apache.org
>
> Sincerely,
> Dmitriy Pavlov
>
> ср, 5 дек. 2018 г. в 12:57, Eduard Shangareev  >:
>
> > https://issues.apache.org/jira/browse/INFRA-17351
> > A ticket was created.
> >
> > On Fri, Nov 30, 2018 at 12:04 AM Denis Magda  wrote:
> >
> > > A request has been submitted.
> > >
> > > --
> > > Denis
> > >
> > > On Thu, Nov 29, 2018 at 11:45 AM Dmitriy Pavlov 
> > > wrote:
> > >
> > > > Denis, could you please create a new list for Apache Ignite, e.g.
> > > > notificati...@ignite.apache.org  ?
> > > >
> > > > Only PMC Chair can create a list https://infra.apache.org/
> > > >
> > > > Create list feature has been restricted to ASF members and PMC chairs
> > > only.
> > > > Thank you in advance.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > чт, 29 нояб. 2018 г. в 21:44, Eduard Shangareev <
> > > > eduard.shangar...@gmail.com>:
> > > >
> > > >> Igniters,
> > > >> The result is successful.
> > > >>
> > > >> No "-1".
> > > >> 11 "+1".
> > > >> 2 "0".
> > > >>
> > > >> Vote thread:
> > > >>
> > > >>
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/VOTE-Creation-dedicated-list-for-github-notifiacations-td38485.html
> > > >>
> > > >
> > >
> >
>


Re: Service grid redesign

2018-12-06 Thread Denis Mekhanikov
Alexey,

I don't see any problem in letting services work on a deactivated cluster.
All services need is discovery messages and compute tasks.
Both of these features are available at all times.

But it should be configurable. Services may need caches for their work,
so it's better to undeploy such services on cluster deactivation.
We may introduce a new property in ServiceConfiguration.

I think, this topic deserves a separate discussion.
Could you start another thread?

Denis

чт, 6 дек. 2018 г. в 13:27, Alexey Kuznetsov :

> Hi,   Vyacheslav!
>
> I'm thinking about to use Services API to implement Web Agent as a cluster
> singleton service.
> It will improve Web Console UX, because it will not needed to start
> separate java program.
> Just start cluster with Web agent enabled on cluster configuration.
>
> But in order to do this, I need that services should:
>   1) Work when cluster NOT ACTIVE.
>   2) Auto restart with cluster (when cluster was restarted).
>
> Could we support mentioned features on "Service Grid redesign - phase 2" ?
>
> Please let me know.
>
> --
> Alexey Kuznetsov
>


Re: Set 'TcpDiscoveryVmIpFinder' as default IP finder for tests instead of 'TcpDiscoveryMulticastIpFinder'

2018-12-05 Thread Denis Mekhanikov
Andrey,

Multi-JVM tests may also use a static IP finder, but it should use some
specific port range instead of being shared.
Something like 127.0.0.1:48500..48509 would do.

Denis

ср, 5 дек. 2018 г. в 18:34, Vyacheslav Daradur :

> I filled a task [1].
>
> >> Slava, do you think Platforms tests can be fixed as well or one more
> ticket
> should be created?
>
> I'll try to fix them within one ticket, it should be investigated a bit
> deeper.
>
> I'll inform about the task's progress in this thread later.
>
> Thanks!
>
> [1] https://issues.apache.org/jira/browse/IGNITE-10555
> On Wed, Dec 5, 2018 at 6:28 PM Andrey Mashenkov
>  wrote:
> >
> > Slava,
> > +1 for your proposal.
> > Is there any ticket for this?
> >
> > Denis,
> > I've just read in nabble thread you suggest to allow multicast finder for
> > multiJVM tests
> > and I'd think we shouldn't use multicast in test at all (excepts
> multicast
> > Ip finder self tests of course),
> > but e.g. add an assertion to force user to create ipfinder properly.
> >
> >
> > Also, we have a ticket for similar issue in 'examples' module.
> > Seems, there are some issues with Platforms module integration.
> > Slava, do you think Platforms tests can be fixed as well or one more
> ticket
> > should be created?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-6826
> >
> > On Wed, Dec 5, 2018 at 5:55 PM Denis Mekhanikov 
> > wrote:
> >
> > > Slava,
> > >
> > > These are exactly my thoughts, so I fully support you here.
> > > I already wrote about it:
> > >
> > >
> http://apache-ignite-developers.2346864.n4.nabble.com/IP-finder-in-tests-td33322.html
> > > But I kind of abandoned this activity. Feel free to take over it.
> > >
> > > Denis
> > >
> > > ср, 5 дек. 2018 г. в 17:22, Vladimir Ozerov :
> > >
> > > > Huge +1
> > > >
> > > > On Wed, Dec 5, 2018 at 5:09 PM Vyacheslav Daradur <
> daradu...@gmail.com>
> > > > wrote:
> > > >
> > > > > Igniters,
> > > > >
> > > > > I've found that the project's test framework uses
> > > > > 'TcpDiscoveryMulticastIpFinder' as default IP finder for tests and
> > > > > there are a lot of tests written by Ignite's experts that override
> it
> > > > > to 'TcpDiscoveryVmIpFinder'.
> > > > >
> > > > > Most of our tests starting Ignite nodes in the same JVM, that
> allows
> > > > > us using shared 'TcpDiscoveryVmIpFinder'.
> > > > >
> > > > > I think that using of 'TcpDiscoveryMulticastIpFinder' may be useful
> > > > > only in platforms tests, BTW multi-JVM tests use the tuned
> > > > > 'TcpDiscoveryVmIpFinder'.
> > > > >
> > > > > I see the following main advantages of using
> 'TcpDiscoveryVmIpFinder':
> > > > > * reducing possible conflicts in the development environment, when
> > > > > nodes from different clusters may find each other;
> > > > > * speedup of nodes initial discovery, especially on Windows;
> > > > > * avoiding of overwriting 'getConfiguration' and copypasta only to
> set
> > > > > up static IP finder in tests;
> > > > >
> > > > > So, I'd suggest changing the default IP finder in tests to
> > > > > 'TcpDiscoveryVmIpFinder' as the first step and remove related
> > > > > boilerplate as the second step.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav D.
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
>
>
>
> --
> Best Regards, Vyacheslav D.
>


Re: Set 'TcpDiscoveryVmIpFinder' as default IP finder for tests instead of 'TcpDiscoveryMulticastIpFinder'

2018-12-05 Thread Denis Mekhanikov
Slava,

These are exactly my thoughts, so I fully support you here.
I already wrote about it:
http://apache-ignite-developers.2346864.n4.nabble.com/IP-finder-in-tests-td33322.html
But I kind of abandoned this activity. Feel free to take over it.

Denis

ср, 5 дек. 2018 г. в 17:22, Vladimir Ozerov :

> Huge +1
>
> On Wed, Dec 5, 2018 at 5:09 PM Vyacheslav Daradur 
> wrote:
>
> > Igniters,
> >
> > I've found that the project's test framework uses
> > 'TcpDiscoveryMulticastIpFinder' as default IP finder for tests and
> > there are a lot of tests written by Ignite's experts that override it
> > to 'TcpDiscoveryVmIpFinder'.
> >
> > Most of our tests starting Ignite nodes in the same JVM, that allows
> > us using shared 'TcpDiscoveryVmIpFinder'.
> >
> > I think that using of 'TcpDiscoveryMulticastIpFinder' may be useful
> > only in platforms tests, BTW multi-JVM tests use the tuned
> > 'TcpDiscoveryVmIpFinder'.
> >
> > I see the following main advantages of using 'TcpDiscoveryVmIpFinder':
> > * reducing possible conflicts in the development environment, when
> > nodes from different clusters may find each other;
> > * speedup of nodes initial discovery, especially on Windows;
> > * avoiding of overwriting 'getConfiguration' and copypasta only to set
> > up static IP finder in tests;
> >
> > So, I'd suggest changing the default IP finder in tests to
> > 'TcpDiscoveryVmIpFinder' as the first step and remove related
> > boilerplate as the second step.
> >
> > What do you think?
> >
> > --
> > Best Regards, Vyacheslav D.
> >
>


Re: [DISCUSSION] Performance issue with cluster-wide cache metrics distribution

2018-12-04 Thread Denis Mekhanikov
Alex,

Did you measure the impact of metrics collection? What is the overhead you
are trying to avoid?

Just to make it clear, MetricUpdateMessage-s are used as heartbeats.
So they are sent anyways, even if no metrics are distributed between nodes.

Denis

вт, 4 дек. 2018 г. в 12:46, Alex Plehanov :

> Hi Igniters,
>
> In the current implementation, cache metrics are collected on each node and
> sent across the whole cluster with discovery message
> (TcpDiscoveryMetricsUpdateMessage) with configured frequency
> (MetricsUpdateFrequency, 2 seconds by default) even if no one requested
> them.
> If there are a lot of caches and a lot of nodes in the cluster, metrics
> update message (which contain each metric for each cache on each node) can
> reach a critical size.
>
> Also frequently collecting all cache metrics have a negative performance
> impact (some of them just get values from AtomicLong, but some of them need
> an iteration over all cache partitions).
> The only way now to disable cache metrics collecting and sending with
> discovery message is to disable statistics for each cache. But this also
> makes impossible to request some of cache metrics locally (for the current
> node only). Requesting a limited set of cache metrics on the current node
> doesn't have such performance impact as the frequent collecting of all
> cache metrics, but sometimes it's enough for diagnostic purposes.
>
> As a workaround I have filled and implemented ticket [1], which introduces
> new system property to disable cache metrics sending with
> TcpDiscoveryMetricsUpdateMessage (in case this property is set, the message
> will contain only node metrics). But system property is not good for a
> permanent solution. Perhaps it's better to move such property to public API
> (to IgniteConfiguration for example).
>
> Also maybe we should change cache metrics distributing strategy? For
> example, collect metrics by request via communication SPI or subscribe to a
> limited set of cache/metrics, etc.
>
> Thoughts?
>
> [1]: https://issues.apache.org/jira/browse/IGNITE-10172
>


Re: [VOTE] Creation dedicated list for github notifiacations

2018-11-27 Thread Denis Mekhanikov
+1
I'm for making the dev list readable without filters of any kind.

On Tue, Nov 27, 2018, 15:14 Maxim Muzafarov  +1
>
> Let's have a look at how it will be.
>
> On Tue, 27 Nov 2018 at 14:48 Seliverstov Igor 
> wrote:
>
> > +1
> >
> > вт, 27 нояб. 2018 г. в 14:45, Юрий :
> >
> > > +1
> > >
> > > вт, 27 нояб. 2018 г. в 11:22, Andrey Mashenkov <
> > andrey.mashen...@gmail.com
> > > >:
> > >
> > > > +1
> > > >
> > > > On Tue, Nov 27, 2018 at 10:12 AM Sergey Chugunov <
> > > > sergey.chugu...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Plus this dedicated list should be properly documented in wiki,
> > > > mentioning
> > > > > it in How to Contribute [1] or in Make Teamcity Green Again [2]
> would
> > > be
> > > > a
> > > > > good idea.
> > > > >
> > > > > [1]
> > > https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/Make+Teamcity+Green+Again
> > > > >
> > > > > On Tue, Nov 27, 2018 at 9:51 AM Павлухин Иван  >
> > > > wrote:
> > > > >
> > > > > > +1
> > > > > > вт, 27 нояб. 2018 г. в 09:22, Dmitrii Ryabov <
> > somefire...@gmail.com
> > > >:
> > > > > > >
> > > > > > > 0
> > > > > > > вт, 27 нояб. 2018 г. в 02:33, Alexey Kuznetsov <
> > > > akuznet...@apache.org
> > > > > >:
> > > > > > > >
> > > > > > > > +1
> > > > > > > > Do not forget notification from GitBox too!
> > > > > > > >
> > > > > > > > On Tue, Nov 27, 2018 at 2:20 AM Zhenya
> > >  > > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1, already make it by filers.
> > > > > > > > >
> > > > > > > > > > This was discussed already [1].
> > > > > > > > > >
> > > > > > > > > > So, I want to complete this discussion with moving
> outside
> > > > > dev-list
> > > > > > > > > > GitHub-notification to dedicated list.
> > > > > > > > > >
> > > > > > > > > > Please start voting.
> > > > > > > > > >
> > > > > > > > > > +1 - to accept this change.
> > > > > > > > > > 0 - you don't care.
> > > > > > > > > > -1 - to decline this change.
> > > > > > > > > >
> > > > > > > > > > This vote will go for 72 hours.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Time-to-remove-automated-messages-from-the-devlist-td37484i20.html
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Alexey Kuznetsov
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Ivan Pavlukhin
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrey V. Mashenkov
> > > >
> > >
> > >
> > > --
> > > Живи с улыбкой! :D
> > >
> >
> --
> --
> Maxim Muzafarov
>


Re: How to deprecate unused Ignite's system property properly? (IGNITE-7441 Drop IGNITE_SERVICES_COMPATIBILITY_MODE system property)

2018-11-22 Thread Denis Mekhanikov
Vyacheslav,

You are right. This property is not used anywhere, so you can safely remove
it.
I don't think, there is any need in deprecation. You can just go ahead and
drop it,
since it doesn't have any effect.

Denis

чт, 22 нояб. 2018 г. в 15:47, Vyacheslav Daradur :

> Hi, Igniters!
>
> Here is Jira issue [1] to drop one of Ignite's system property
> "IGNITE_SERVICES_COMPATIBILITY_MODE" because it is not used.
>
> I looked through git history and related Jira issues, the common
> conclusions:
> - the property was introduced in Ignite 1.7 within the task [2] to use
> LazyServiceConfiguration with premarshaled services instance;
> - Ignite 2.* was released which is incompatible with 1.*;
> - since Ignite 2.3 the property is completely ignored after introduced
> batch deployment mode [3];
>
> Looks like we can just remove the property without introducing any
> compatibility issues.
> Also, related node attribute 'ATTR_SERVICES_COMPATIBILITY_MODE' can be
> safely removed.
>
> So, my question is: can I just remove following properties or I should
> deprecate them and remove all usages?
> IgniteSystemProperties#IGNITE_SERVICES_COMPATIBILITY_MODE
> IgniteNodeAttributes#ATTR_SERVICES_COMPATIBILITY_MODE
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7441
> [2] https://issues.apache.org/jira/browse/IGNITE-3056
> [3] https://issues.apache.org/jira/browse/IGNITE-5145
>
> --
> Best Regards, Vyacheslav D.
>


Re: proposed realization KILL QUERY command

2018-11-22 Thread Denis Mekhanikov
Actually, option with separate parameters was mentioned in another thread
http://apache-ignite-developers.2346864.n4.nabble.com/proposed-design-for-thin-client-SQL-management-and-monitoring-view-running-queries-and-kill-it-tp37713p38056.html

Denis

чт, 22 нояб. 2018 г. в 08:51, Vladimir Ozerov :

> Denis,
>
> Problems with separate parameters are explained above.
>
> чт, 22 нояб. 2018 г. в 3:23, Denis Magda :
>
> > Vladimir,
> >
> > All of the alternatives are reminiscent of mathematical operations. Don't
> > look like a SQL command. What if we use a SQL approach introducing named
> > parameters:
> >
> > KILL QUERY query_id=10 [AND node_id=5]
> >
> > --
> > Denis
> >
> > On Wed, Nov 21, 2018 at 4:11 AM Vladimir Ozerov 
> > wrote:
> >
> > > Denis,
> > >
> > > Space is bad candidate because it is a whitespace. Without whitespaces
> we
> > > can have syntax without quotes at all. Any non-whitespace delimiter
> will
> > > work, though:
> > >
> > > KILL QUERY 45.1
> > > KILL QUERY 45-1
> > > KILL QUERY 45:1
> > >
> > > On Wed, Nov 21, 2018 at 3:06 PM Юрий 
> > wrote:
> > >
> > > > Denis,
> > > >
> > > > Let's consider parameter of KILL QUERY just a string with some query
> > id,
> > > > without any meaning for user. User just need to get the id and pass
> as
> > > > parameter to KILL QUERY command.
> > > >
> > > > Even if query is distributed it have single query id from user
> > > perspective
> > > > and will killed on all nodes. User just need to known one global
> query
> > > id.
> > > >
> > > > How it can works.
> > > > 1)SELECT * from running_queries
> > > > result is
> > > >  query_id | node_id
> > > >   | sql   | schema_name | connection_id | duration
> > > > 123.33 | e0a69cb8-a1a8-45f6-b84d-ead367a0   | SELECT ...  |
> ...
> > > >   |   22 | 23456
> > > > 333.31 | aaa6acb8-a4a5-42f6-f842-ead111b00020 | UPDATE...  |
> > ...
> > > >   |  321| 346
> > > > 2) KILL QUERY '123.33'
> > > >
> > > > So, user need select query_id from running_queries view and use it
> for
> > > KILL
> > > > QUERY command.
> > > >
> > > > I hope it became clearer.
> > > >
> > > >
> > > >
> > > > ср, 21 нояб. 2018 г. в 02:11, Denis Magda :
> > > >
> > > > > Folks,
> > > > >
> > > > > The decimal syntax is really odd - KILL QUERY
> > > > > '[node_order].[query_counter]'
> > > > >
> > > > > Confusing, let's use a space to separate parameters.
> > > > >
> > > > > Also, what if I want to halt a specific query with certain ID?
> Don't
> > > know
> > > > > the node number, just know that the query is distributed and runs
> > > across
> > > > > several machines. Sounds like the syntax still should consider
> > > > > [node_order/id] as an optional parameter.
> > > > >
> > > > > Probably, if you explain to me how an end user will use this
> command
> > > from
> > > > > the very beginning (how do I look for a query id and node id, etc)
> > then
> > > > the
> > > > > things get clearer.
> > > > >
> > > > > --
> > > > > Denis
> > > > >
> > > > > On Tue, Nov 20, 2018 at 1:03 AM Юрий 
> > > > wrote:
> > > > >
> > > > > > Hi Vladimir,
> > > > > >
> > > > > > Thanks for your suggestion to use MANAGEMENT_POOL for processing
> > > > > > cancellation requests.
> > > > > >
> > > > > > About your questions.
> > > > > > 1) I'm going to implements SQL view to provide list of running
> > > queries.
> > > > > The
> > > > > > SQL VIEW has been a little bit discussed earlier. Proposed name
> is
> > > > > > *running_queries* with following columns: query_id, node_id, sql,
> > > > > > schema_name, connection_id, duration. Currently most of the
> > > information
> > > > > can
> > > > > > be  retrieved through cache API, however it doesn't matter, any
> > case
> > > we
> > > > > > need to expose SQL VIEW. Seem's you are right - the part should
> be
> > > > > > implemented firstly.
> > > > > > 2) Fully agree that we need to support all kind of SQL queries
> > > > > > (SLECT/DML/DDL, transactional, non transnational, local,
> > > distributed).
> > > > I
> > > > > > definitely sure that it will possible for all of above, however
> I'm
> > > not
> > > > > > sure about DDL - need to investigate it deeper. Also need to
> > > understand
> > > > > > that canceled DML operation can lead to partially updated data
> for
> > > non
> > > > > > transational caches.
> > > > > >
> > > > > >
> > > > > >
> > > > > > пн, 19 нояб. 2018 г. в 19:17, Vladimir Ozerov <
> > voze...@gridgain.com
> > > >:
> > > > > >
> > > > > > > Hi Yuriy,
> > > > > > >
> > > > > > > I think we can use MANAGEMENT_POOL for this. It is already used
> > for
> > > > > some
> > > > > > > internal Ignite tasks, and it appears to be a good candidate to
> > > > process
> > > > > > > cancel requests.
> > > > > > >
> > > > > > > But there are several things which are not clear enough for me
> at
> > > the
> > > > > > > moment:
> > > > > > > 1) How user is going to get the list of running queries in the
> > > first
> > 

Re: New API for changing configuration of persistent caches

2018-11-22 Thread Denis Mekhanikov
Guys,

I like the idea with the configuration builder more.
We could limit the set of properties by providing only modifiable ones in
the builder interface.
Otherwise only runtime will show whether you tried to modify a proper
setting.
And if we decide to make another property modifiable, then we will just add
more methods to the builder interface,
so users won't need to remember, which properties in which versions are
available for change.

We should think about possibility to change a data region for a cache.
This is a valid use-case, when you realize, that data in your cache
occupies too mush space,
so you want it to become persistent.
Allowing to change data region configuration on nodes in run-time would be
ideal,
but it's outside of the scope of the proposed change.

Denis

чт, 22 нояб. 2018 г. в 11:30, Eduard Shangareev :

> I don't see how you variant handles user-defined objects (factories,
> affinity-functions, interceptors, etc.). Could you describe?
>
> On Thu, Nov 22, 2018 at 10:47 AM Vladimir Ozerov 
> wrote:
>
> > My variant of API avoids cache configuration.
> >
> > One more thing to note - as we found out control.sh cannot dump XML
> > configuration. Currently it returns only subset of properties. And in
> > general case it is impossible to convert CacheConfiguration to Spring
> XML,
> > because Spring XMLis not serialization protocol. So API with
> > CacheConfiguration doesn’t seem to work for control.sh as well.
> >
> > чт, 22 нояб. 2018 г. в 10:05, Eduard Shangareev <
> > eduard.shangar...@gmail.com
> > >:
> >
> > > Vovan,
> > >
> > > We couldn't avoid API with cache configuration.
> > > Almost all of ~70 properties could be changed, some of them are
> instances
> > > of objects or could be user-defined class.
> > > Could you come up with alternatives for user-defined affinity function?
> > >
> > > Also, the race would have a place in other scenarios.
> > >
> > >
> > >
> > > On Thu, Nov 22, 2018 at 8:50 AM Vladimir Ozerov 
> > > wrote:
> > >
> > > > Ed,
> > > >
> > > > We may have API similar to “cache” and “getOrCreateCache”, or may
> not.
> > It
> > > > is up to us to decide. Similarity on it’s own is weak argument.
> > > > Functionality and simplicity - this is what matters.
> > > >
> > > > Approach with cache configuration has three major issues
> > > > 1) It exposes properties which user will not be able to change, so
> > > typical
> > > > user actions would be: try to change property, fail as it is
> > unsupported,
> > > > go reading documentation. Approach with separate POJO is intuitive
> and
> > > > self-documenting.
> > > > 2) It has race condition between config read and config apply, so
> user
> > do
> > > > not know what exactly he changes, unless you change API to something
> > like
> > > > “restartCaches(Tuple...)”,
> > which
> > > > user will need to call in a loop.
> > > > 3) And it is not suitable for non-Java platform, which is a
> > showstopper -
> > > > all API should be available from all platforms unless it is proven to
> > be
> > > > impossible to implement.
> > > >
> > > > Vladimir.
> > > >
> > > > чт, 22 нояб. 2018 г. в 1:06, Eduard Shangareev <
> > > > eduard.shangar...@gmail.com
> > > > >:
> > > >
> > > > > Vovan,
> > > > >
> > > > > Would you argue that we should have the similar API in Java as
> > > > > Ignite.cache(CacheConfiguration) or
> > > > > Ignite.getOrCreateCache(CacheConfiguration)?
> > > > >
> > > > > With a proposed solution, every other API call would rely on it
> > > finally.
> > > > >
> > > > > I am interested in having such feature not arguing about API
> > > > alternatives.
> > > > >
> > > > > We definitely should have the ability to change it via control.sh
> and
> > > > Java
> > > > > API. Everything else is optional from my point of view (at least on
> > the
> > > > > current stage).
> > > > >
> > > > > Moreover, your arguments are more about our format of
> > > CacheConfiguration
> > > > > which couldn't be defined in other languages and clients. So, maybe
> > we
> > > > > should start a discussion about how we should change it in 3.0?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Nov 21, 2018 at 7:45 PM Vladimir Ozerov <
> > voze...@gridgain.com>
> > > > > wrote:
> > > > >
> > > > > > Ed,
> > > > > >
> > > > > > Why do we want to operate on CacheConfiguration so desperately?
> > Your
> > > > > > example raises even more questions:
> > > > > > 1) What to do with thin clients?
> > > > > > 2) What to do with aforementioned race conditions, when cache
> could
> > > be
> > > > > > changed concurrently?
> > > > > > 3) Why such trivial operation from user perspective is only
> > supported
> > > > > from
> > > > > > control.sh and not from the rest API (even Java client nodes will
> > be
> > > > > > affected - remember our plans to remove requirement to have cache
> > > > classes
> > > > > > on client nodes, which is yet to be implemented).
> > > > > >
> > > > > > Compare it to alternative API:
> > > > > >
> > > > > > 1) Native 

  1   2   3   >