Re[2]: Partition map exchange metrics

2019-07-23 Thread Zhenya Stanilovsky
+1 with Anton decisions.


>Среда, 24 июля 2019, 8:44 +03:00 от Anton Vinogradov :
>
>Folks,
>
>It looks like we're trying to implement "extended debug" instead of
>"monitoring".
>It should not be interesting for real admin what phase of PME is in
>progress and so on.
>Interested metrics are
>- total blocked time (will be used for real SLA counting)
>- are we blocked right now (shows we have an SLA degradation right now)
>Duration of the current blocking period can be easily presented using any
>modern monitoring tool by regular checks.
>Initial true will means "period start", precision will be a result of
>checks frequency.
>Anyway, I'm ok to have current metric presented with long, where long is a
>duration, see no reason, but ok :)
>
>All other features you mentioned are useful for code or
>deployment improving and can (should) be taken from logs at the analysis
>phase.
>
>On Tue, Jul 23, 2019 at 7:22 PM Ivan Rakov < ivan.glu...@gmail.com > wrote:
>
>> Folks, let me step in.
>>
>> Nikita, thanks for your suggestions!
>>
>> > 1. initialVersion. Topology version that initiates the exchange.
>> > 2. initTime. Time PME was started.
>> > 3. initEvent. Event that triggered PME.
>> > 4. partitionReleaseTime. Time when a node has finished waiting for all
>> > updates and translations on a previous topology.
>> > 5. sendSingleMessageTime. Time when a node sent a single message.
>> > 6. recieveFullMessageTime. Time when a node received a full message.
>> > 7. finishTime. Time PME was ended.
>> >
>> > When new PME started all these metrics resets.
>> Every metric from Nikita's list looks useful and simple to implement.
>> I think that it would be better to change format of metrics 4, 5, 6 and
>> 7 a bit: we can keep only difference between time of previous event and
>> time of corresponding event. Such metrics would be easier to perceive:
>> they answer to specific questions "how much time did partition release
>> take?" or "how much time did awaiting of distributed phase end take?".
>> Also, if results of 4, 5, 6, 7 will be exported to monitoring system,
>> graphs will show how different stages times change from one PME to another.
>>
>> > When PME cause no blocking, it's a good PME and I see no reason to have
>> > monitoring related to it
>> Agree with Anton here. These metrics should be measured only for true
>> distributed exchange. Saving results for client leave/join PMEs will
>> just complicate monitoring.
>>
>> > I agree with total blocking duration metric but
>> > I still don't understand why instant value indicating that operations are
>> > blocked should be boolean.
>> > Duration time since blocking has started looks more appropriate and
>> useful.
>> > It gives more information while semantic is left the same.
>> Totally agree with Pavel here. Both "accumulated block time" and
>> "current PME block time" metrics are useful. Growth of accumulated
>> metric for specific period of time (should be easy to check via
>> monitoring system graph) will show for how much business operations were
>> blocked in total, and non-zero current metric will show that we are
>> experiencing issues right now. Boolean metric "are we blocked right now"
>> is not needed as it's obviously can be inferred from "current PME block
>> time".
>>
>> Best Regards,
>> Ivan Rakov
>>
>> On 23.07.2019 16:02, Pavel Kovalenko wrote:
>> > Nikita,
>> >
>> > I agree with total blocking duration metric but
>> > I still don't understand why instant value indicating that operations are
>> > blocked should be boolean.
>> > Duration time since blocking has started looks more appropriate and
>> useful.
>> > It gives more information while semantic is left the same.
>> >
>> >
>> >
>> > вт, 23 июл. 2019 г. в 11:42, Nikita Amelchev < nsamelc...@gmail.com >:
>> >
>> >> Folks,
>> >>
>> >> All previous suggestions have some disadvantages. It can be several
>> >> exchanges between two metric updates and fast exchange can rewrite
>> >> previous long exchange.
>> >>
>> >> We can introduce a metric of total blocking duration that will
>> >> accumulate at the end of the exchange. So, users will get actual
>> >> information about how long operations were blocked. Cluster metric
>> >> will be a maximum of local nodes metrics. And we need a boolean metric
>> >> that will indicate realtime status. It needs because of duration
>> >> metric updates at the end of the exchange.
>> >>
>> >> So I propose to change the current metric that not released to the
>> >> totalCacheOperationsBlockingDuration metric and to add the
>> >> isCacheOperationsBlocked metric.
>> >>
>> >> WDYT?
>> >>
>> >> пн, 22 июл. 2019 г. в 09:27, Anton Vinogradov < a...@apache.org >:
>> >>> Nikolay,
>> >>>
>> >>> Still see no reason to replace boolean with long.
>> >>>
>> >>> On Mon, Jul 22, 2019 at 9:19 AM Nikolay Izhikov < nizhi...@apache.org >
>> >> wrote:
>>  Anton.
>> 
>>  1. Value exported based on SPI settings, not in the moment it changed.
>> 
>>  2. Clock synchronisatio

Re: Partition map exchange metrics

2019-07-23 Thread Anton Vinogradov
Folks,

It looks like we're trying to implement "extended debug" instead of
"monitoring".
It should not be interesting for real admin what phase of PME is in
progress and so on.
Interested metrics are
- total blocked time (will be used for real SLA counting)
- are we blocked right now (shows we have an SLA degradation right now)
Duration of the current blocking period can be easily presented using any
modern monitoring tool by regular checks.
Initial true will means "period start", precision will be a result of
checks frequency.
Anyway, I'm ok to have current metric presented with long, where long is a
duration, see no reason, but ok :)

All other features you mentioned are useful for code or
deployment improving and can (should) be taken from logs at the analysis
phase.

On Tue, Jul 23, 2019 at 7:22 PM Ivan Rakov  wrote:

> Folks, let me step in.
>
> Nikita, thanks for your suggestions!
>
> > 1. initialVersion. Topology version that initiates the exchange.
> > 2. initTime. Time PME was started.
> > 3. initEvent. Event that triggered PME.
> > 4. partitionReleaseTime. Time when a node has finished waiting for all
> > updates and translations on a previous topology.
> > 5. sendSingleMessageTime. Time when a node sent a single message.
> > 6. recieveFullMessageTime. Time when a node received a full message.
> > 7. finishTime. Time PME was ended.
> >
> > When new PME started all these metrics resets.
> Every metric from Nikita's list looks useful and simple to implement.
> I think that it would be better to change format of metrics 4, 5, 6 and
> 7 a bit: we can keep only difference between time of previous event and
> time of corresponding event. Such metrics would be easier to perceive:
> they answer to specific questions "how much time did partition release
> take?" or "how much time did awaiting of distributed phase end take?".
> Also, if results of 4, 5, 6, 7 will be exported to monitoring system,
> graphs will show how different stages times change from one PME to another.
>
> > When PME cause no blocking, it's a good PME and I see no reason to have
> > monitoring related to it
> Agree with Anton here. These metrics should be measured only for true
> distributed exchange. Saving results for client leave/join PMEs will
> just complicate monitoring.
>
> > I agree with total blocking duration metric but
> > I still don't understand why instant value indicating that operations are
> > blocked should be boolean.
> > Duration time since blocking has started looks more appropriate and
> useful.
> > It gives more information while semantic is left the same.
> Totally agree with Pavel here. Both "accumulated block time" and
> "current PME block time" metrics are useful. Growth of accumulated
> metric for specific period of time (should be easy to check via
> monitoring system graph) will show for how much business operations were
> blocked in total, and non-zero current metric will show that we are
> experiencing issues right now. Boolean metric "are we blocked right now"
> is not needed as it's obviously can be inferred from "current PME block
> time".
>
> Best Regards,
> Ivan Rakov
>
> On 23.07.2019 16:02, Pavel Kovalenko wrote:
> > Nikita,
> >
> > I agree with total blocking duration metric but
> > I still don't understand why instant value indicating that operations are
> > blocked should be boolean.
> > Duration time since blocking has started looks more appropriate and
> useful.
> > It gives more information while semantic is left the same.
> >
> >
> >
> > вт, 23 июл. 2019 г. в 11:42, Nikita Amelchev :
> >
> >> Folks,
> >>
> >> All previous suggestions have some disadvantages. It can be several
> >> exchanges between two metric updates and fast exchange can rewrite
> >> previous long exchange.
> >>
> >> We can introduce a metric of total blocking duration that will
> >> accumulate at the end of the exchange. So, users will get actual
> >> information about how long operations were blocked. Cluster metric
> >> will be a maximum of local nodes metrics. And we need a boolean metric
> >> that will indicate realtime status. It needs because of duration
> >> metric updates at the end of the exchange.
> >>
> >> So I propose to change the current metric that not released to the
> >> totalCacheOperationsBlockingDuration metric and to add the
> >> isCacheOperationsBlocked metric.
> >>
> >> WDYT?
> >>
> >> пн, 22 июл. 2019 г. в 09:27, Anton Vinogradov :
> >>> Nikolay,
> >>>
> >>> Still see no reason to replace boolean with long.
> >>>
> >>> On Mon, Jul 22, 2019 at 9:19 AM Nikolay Izhikov 
> >> wrote:
>  Anton.
> 
>  1. Value exported based on SPI settings, not in the moment it changed.
> 
>  2. Clock synchronisation - if we export start time, we should also
> >> export
>  node local timestamp.
> 
>  пн, 22 июля 2019 г., 8:33 Anton Vinogradov :
> 
> > Folks,
> >
> > What's the reason for duration counting?
> > AFAIU, it's a monitoring system feature to count the durati

Re: Threadpools and .WithExecute() for C# clients

2019-07-23 Thread Denis Magda
Looping in the dev list.

Pavel, Igor and other C# maintainers, this looks like a valuable extension
of our C# APIs. Shouldn't this be a quick addition to Ignite?

-
Denis


On Mon, Jul 22, 2019 at 3:22 PM Raymond Wilson 
wrote:

> Alexandr,
>
> If .WithExecute is not planned to be made available in the C# client, what
> is the plan to support custom thread pools from the C# side of things?
>
> Thanks,
> Raymond.
>
>
> On Thu, Jul 18, 2019 at 9:28 AM Raymond Wilson 
> wrote:
>
>> The source of inbound requests into Server A is from client applications.
>>
>> Server B is really a cluster of servers that are performing clustered
>> transformations and computations across a data set.
>>
>> I originally used IComputeJob and similar functions which work very well
>> but have the restriction that they return the entire result set from a
>> Server B node in a single response. These result sets can be large (100's
>> of megabytes and larger), which makes life pretty hard for Server A if it
>> has to field multiple incoming responses of this size. So, these types of
>> requests progressively send responses back (using Ignite messaging) to
>> Server A using the Ignite messaging fabric. As Server A receives each part
>> of the overall response it processes it according the business rules
>> relevant to the request.
>>
>> The cluster config and numbers of nodes are not really material to this.
>>
>> Raymond.
>>
>> On Thu, Jul 18, 2019 at 12:26 AM Alexandr Shapkin 
>> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> Can you share a more detailed use case, please?
>>>
>>>
>>>
>>> Right now it’s not clear why do you need a messaging fabric.
>>>
>>> If you are interesting in a progress tracking, then you could try a
>>> CacheAPI or QueryContinious, for example.
>>>
>>>
>>>
>>> What are the sources of inbound requests? Is it a client requests?
>>>
>>>
>>>
>>> What is your cluster config? How many nodes do you have for your
>>> distributed computations?
>>>
>>>
>>>
>>> *From: *Raymond Wilson 
>>> *Sent: *Wednesday, July 17, 2019 1:49 PM
>>> *To: *user 
>>> *Subject: *Re: Threadpools and .WithExecute() for C# clients
>>>
>>>
>>>
>>> Hi Alexandr,
>>>
>>>
>>>
>>> To summarise from the original thread, say I have server A that accepts
>>> requests. It contacts server B in order to help processing those requests.
>>> Server B sends in-progress results to server A using the Ignite messaging
>>> fabric. If the thread pool in server A is saturated with inbound requests,
>>> then there are no available threads to service the messaging fabric traffic
>>> from server B to server A resulting in a deadlock condition.
>>>
>>>
>>>
>>> In the original discussion it was suggested creating a custom thread
>>> pool to handle the Server B to Server A traffic would resolve it.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Raymond.
>>>
>>>
>>>
>>> On Wed, Jul 17, 2019 at 9:48 PM Alexandr Shapkin 
>>> wrote:
>>>
>>> Hi, Raymond!
>>>
>>>
>>>
>>> As far as I can see, there are no plans for porting custom executors
>>> configuration in .NET client right now [1].
>>>
>>>
>>>
>>> Please, remind, why do you need a separate pool instead of a default
>>> PublicPool?
>>>
>>>
>>>
>>> [1] - https://issues.apache.org/jira/browse/IGNITE-6566
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From: *Raymond Wilson 
>>> *Sent: *Wednesday, July 17, 2019 10:58 AM
>>> *To: *user 
>>> *Subject: *Threadpools and .WithExecute() for C# clients
>>>
>>>
>>>
>>> Some time ago I ran into and issue with thread pool exhaustion and
>>> deadlocking in AI 2.2.
>>>
>>>
>>>
>>> This is the original thread:
>>> http://apache-ignite-users.70518.x6.nabble.com/Possible-dead-lock-when-number-of-jobs-exceeds-thread-pool-td17262.html
>>>
>>>
>>>
>>>
>>> At the time .WithExecutor() was not implemented in the C# client so
>>> there was little option but to expand the size of the public thread pool
>>> sufficiently to prevent the deadlocking.
>>>
>>>
>>>
>>> We have been revisiting this issue and see that .WithExecutor() is not
>>> supported in the AI 2.7.5 client.
>>>
>>>
>>>
>>> Can this be supported in the C# client, or is there a workaround in the
>>> .Net environment? that does not require this capability?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Raymond.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>


[jira] [Created] (IGNITE-12008) thick client has all system threads busy indefinitely

2019-07-23 Thread Mahesh Renduchintala (JIRA)
Mahesh Renduchintala created IGNITE-12008:
-

 Summary: thick client has all system threads busy indefinitely
 Key: IGNITE-12008
 URL: https://issues.apache.org/jira/browse/IGNITE-12008
 Project: Ignite
  Issue Type: Bug
  Components: clients
Affects Versions: 2.7
Reporter: Mahesh Renduchintala
 Attachments: config.zip

Please refer to this thread. All logs are attached on this thread.

[http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-thick-client-has-all-system-threads-busy-indefinitely-td28880.html]

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [TC] Move "Queries (Binary Objects Simple Mapper)" job to nightly

2019-07-23 Thread Павлухин Иван
Also there are more similar candidates:
* "Binary Objects (Simple Mapper Basic)" [1] corresponds to "Basic 1" [2]
* "Binary Objects (Simple Mapper Cache Full API)" [3] -- "Cache (Full API)" [4]
* "Binary Objects (Simple Mapper Compute Grid)" [5] -- "Compute (Grid)" [6]

I think they could be moved to nightly as well. Also renaming sounds
as a good idea because current names are misleading.

[1] 
https://ci.ignite.apache.org/admin/editBuildParams.html?id=buildType:IgniteTests24Java8_BinaryObjectsSimpleMapperBasic
[2] 
https://ci.ignite.apache.org/admin/editBuildParams.html?id=buildType:IgniteTests24Java8_Basic1
[3] 
https://ci.ignite.apache.org/admin/editBuildParams.html?id=buildType:IgniteTests24Java8_BinaryObjectsSimpleMapperCacheFullApi
[4] 
https://ci.ignite.apache.org/admin/editBuildParams.html?id=buildType:IgniteTests24Java8_CacheFullApi
[5] 
https://ci.ignite.apache.org/admin/editBuildParams.html?id=buildType:IgniteTests24Java8_BinaryObjectsSimpleMapperComputeGrid
[6] 
https://ci.ignite.apache.org/admin/editBuildParams.html?id=buildType:IgniteTests24Java8_ComputeGrid
пн, 22 июл. 2019 г. в 14:57, Dmitriy Pavlov :
>
> +1 for moving from RunAll to RunAllNighlty
>
> пн, 22 июл. 2019 г. в 12:21, Павлухин Иван :
>
> > Igniters,
> >
> > As you know Ignite RunAll on TC takes significant resources. I noticed
> > that build job "Queries (Binary Objects Simple Mapper)" [1] actually
> > duplicates "Queries 1" [2] job and the same test set using simple name
> > mapper for binary objects. I suppose that we can exclude it from daily
> > RunAll and move to nightly.
> >
> > What do you think?
> >
> > [1]
> > https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_BinaryObjectsSimpleMapperQueries?branch=%3Cdefault%3E&buildTypeTab=overview
> > [2]
> > https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_Queries1?branch=%3Cdefault%3E&buildTypeTab=overview
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >



-- 
Best regards,
Ivan Pavlukhin


Re: Partition map exchange metrics

2019-07-23 Thread Ivan Rakov

Folks, let me step in.

Nikita, thanks for your suggestions!


1. initialVersion. Topology version that initiates the exchange.
2. initTime. Time PME was started.
3. initEvent. Event that triggered PME.
4. partitionReleaseTime. Time when a node has finished waiting for all
updates and translations on a previous topology.
5. sendSingleMessageTime. Time when a node sent a single message.
6. recieveFullMessageTime. Time when a node received a full message.
7. finishTime. Time PME was ended.

When new PME started all these metrics resets.

Every metric from Nikita's list looks useful and simple to implement.
I think that it would be better to change format of metrics 4, 5, 6 and 
7 a bit: we can keep only difference between time of previous event and 
time of corresponding event. Such metrics would be easier to perceive: 
they answer to specific questions "how much time did partition release 
take?" or "how much time did awaiting of distributed phase end take?".
Also, if results of 4, 5, 6, 7 will be exported to monitoring system, 
graphs will show how different stages times change from one PME to another.



When PME cause no blocking, it's a good PME and I see no reason to have
monitoring related to it
Agree with Anton here. These metrics should be measured only for true 
distributed exchange. Saving results for client leave/join PMEs will 
just complicate monitoring.



I agree with total blocking duration metric but
I still don't understand why instant value indicating that operations are
blocked should be boolean.
Duration time since blocking has started looks more appropriate and useful.
It gives more information while semantic is left the same.
Totally agree with Pavel here. Both "accumulated block time" and 
"current PME block time" metrics are useful. Growth of accumulated 
metric for specific period of time (should be easy to check via 
monitoring system graph) will show for how much business operations were 
blocked in total, and non-zero current metric will show that we are 
experiencing issues right now. Boolean metric "are we blocked right now" 
is not needed as it's obviously can be inferred from "current PME block 
time".


Best Regards,
Ivan Rakov

On 23.07.2019 16:02, Pavel Kovalenko wrote:

Nikita,

I agree with total blocking duration metric but
I still don't understand why instant value indicating that operations are
blocked should be boolean.
Duration time since blocking has started looks more appropriate and useful.
It gives more information while semantic is left the same.



вт, 23 июл. 2019 г. в 11:42, Nikita Amelchev :


Folks,

All previous suggestions have some disadvantages. It can be several
exchanges between two metric updates and fast exchange can rewrite
previous long exchange.

We can introduce a metric of total blocking duration that will
accumulate at the end of the exchange. So, users will get actual
information about how long operations were blocked. Cluster metric
will be a maximum of local nodes metrics. And we need a boolean metric
that will indicate realtime status. It needs because of duration
metric updates at the end of the exchange.

So I propose to change the current metric that not released to the
totalCacheOperationsBlockingDuration metric and to add the
isCacheOperationsBlocked metric.

WDYT?

пн, 22 июл. 2019 г. в 09:27, Anton Vinogradov :

Nikolay,

Still see no reason to replace boolean with long.

On Mon, Jul 22, 2019 at 9:19 AM Nikolay Izhikov 

wrote:

Anton.

1. Value exported based on SPI settings, not in the moment it changed.

2. Clock synchronisation - if we export start time, we should also

export

node local timestamp.

пн, 22 июля 2019 г., 8:33 Anton Vinogradov :


Folks,

What's the reason for duration counting?
AFAIU, it's a monitoring system feature to count the durations.
Sine monitoring system checks metrics periodically it will know the
duration by its own log.

On Fri, Jul 19, 2019 at 7:32 PM Pavel Kovalenko 
wrote:


Nikita,

Yes, I mean duration not timestamp. For the metric name, I suggest
"cacheOperationsBlockingDuration", I think it cleaner represents

what

is

blocked during PME.
We can also combine both timestamp

"cacheOperationsBlockingStartTs" and

duration to have better correlation when cache operations were

blocked

and

how much time it's taken.
For instant view (like in JMX bean) a calculated value as you

mentioned

can be used.
For metrics are exported to some backend (IEP-35) a counter can be

used.

The counter is incremented by blocking time after blocking has

ended.

пт, 19 июл. 2019 г. в 19:10, Nikita Amelchev 
:

Pavel,

The main purpose of this metric is

how much time we wait for resuming cache operations

Seems I misunderstood you. Do you mean timestamp or duration here?

What do you think if we change the boolean value of metric to a

long

value that represents time in milliseconds when operations were

blocked?

This time can be calculated as (currentTime -
timeSinceOperationsBlocked) in case of timestamp.

[jira] [Created] (IGNITE-12007) Latest "apacheignite/web-console-backend" docker image is broken

2019-07-23 Thread Igor Belyakov (JIRA)
Igor Belyakov created IGNITE-12007:
--

 Summary: Latest "apacheignite/web-console-backend" docker image is 
broken
 Key: IGNITE-12007
 URL: https://issues.apache.org/jira/browse/IGNITE-12007
 Project: Ignite
  Issue Type: Bug
  Components: UI
Affects Versions: 2.7
Reporter: Igor Belyakov


It's not possible to run docker container by using the latest version of 
"apacheignite/web-console-backend" image.

Next error happens on the start:
{code:java}
npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2019-07-23T14_24_40_353Z-debug.log
npm ERR! path /opt/web-console/package.json
npm ERR! code ENOENT
npm ERR! errno -2
npm ERR! syscall open
npm ERR! enoent ENOENT: no such file or directory, open 
'/opt/web-console/package.json'
npm ERR! enoent This is related to npm not being able to find a file.
npm ERR! enoent{code}
How to reproduce:

Run container by using docker-compose as described here: 
[https://hub.docker.com/r/apacheignite/web-console-backend]

 

Seems like it was broken by the next commit:

[https://github.com/apache/ignite/commit/4c295f8f468ddfce458948c17c13b1748b13e918#diff-ec0d595d738c4207e08ce210624e902aR22]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: StackOverflow question about encryption

2019-07-23 Thread Nikolay Izhikov
Hello, Dmitriy.

You are absolutely right.
I put the same answer on the SO, already.



В Вт, 23/07/2019 в 13:36 +0300, Dmitriy Pavlov пишет:
> Thank you, Stephen. I've checked this question. For now, it seems to me
> that keys are different on 2 different nodes. Why it can happen, it is not
> so clear for me.
> 
> TDE verifies digests of master keys of nodes joining to cluster. These
> digests should be equal. Equal keys give the same digests. The error
> message says digests differ, so it may be one node does not have key and
> have different master key value.
> 
> May be Nikolay has some more ideas about reasons.
> 
> вт, 23 июл. 2019 г. в 13:14, Stephen Darlington <
> stephen.darling...@gridgain.com>:
> 
> > Assume he means this one:
> > https://stackoverflow.com/questions/57124826/apache-ignite-transparent-data-encryption-master-key-digest-differs-node-join
> > 
> > > On 23 Jul 2019, at 11:02, Dmitriy Pavlov  wrote:
> > > 
> > > Hi Vladimir, could you please share link to the question?
> > > 
> > > вт, 23 июл. 2019 г. в 12:47, Vladimir Pligin :
> > > 
> > > > Hi igniters,
> > > > 
> > > > I can see a question on SO
> > > > http://apache-ignite-developers.2346864.n4.nabble.com. The question is
> > > > related to encryption.
> > > > Nikolay Izhikov, as far as know you're the author of the feature and
> > > > unfortunately I don't know anyone else who is familiar with it. Could
> > 
> > you
> > > > please answer? Thanks a lot in advance.
> > > > 
> > > > 
> > > > 
> > > > --
> > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > 
> > 
> > 
> > 


signature.asc
Description: This is a digitally signed message part


Re: Partition map exchange metrics

2019-07-23 Thread Pavel Kovalenko
Nikita,

I agree with total blocking duration metric but
I still don't understand why instant value indicating that operations are
blocked should be boolean.
Duration time since blocking has started looks more appropriate and useful.
It gives more information while semantic is left the same.



вт, 23 июл. 2019 г. в 11:42, Nikita Amelchev :

> Folks,
>
> All previous suggestions have some disadvantages. It can be several
> exchanges between two metric updates and fast exchange can rewrite
> previous long exchange.
>
> We can introduce a metric of total blocking duration that will
> accumulate at the end of the exchange. So, users will get actual
> information about how long operations were blocked. Cluster metric
> will be a maximum of local nodes metrics. And we need a boolean metric
> that will indicate realtime status. It needs because of duration
> metric updates at the end of the exchange.
>
> So I propose to change the current metric that not released to the
> totalCacheOperationsBlockingDuration metric and to add the
> isCacheOperationsBlocked metric.
>
> WDYT?
>
> пн, 22 июл. 2019 г. в 09:27, Anton Vinogradov :
> >
> > Nikolay,
> >
> > Still see no reason to replace boolean with long.
> >
> > On Mon, Jul 22, 2019 at 9:19 AM Nikolay Izhikov 
> wrote:
> >
> > > Anton.
> > >
> > > 1. Value exported based on SPI settings, not in the moment it changed.
> > >
> > > 2. Clock synchronisation - if we export start time, we should also
> export
> > > node local timestamp.
> > >
> > > пн, 22 июля 2019 г., 8:33 Anton Vinogradov :
> > >
> > > > Folks,
> > > >
> > > > What's the reason for duration counting?
> > > > AFAIU, it's a monitoring system feature to count the durations.
> > > > Sine monitoring system checks metrics periodically it will know the
> > > > duration by its own log.
> > > >
> > > > On Fri, Jul 19, 2019 at 7:32 PM Pavel Kovalenko 
> > > > wrote:
> > > >
> > > > > Nikita,
> > > > >
> > > > > Yes, I mean duration not timestamp. For the metric name, I suggest
> > > > > "cacheOperationsBlockingDuration", I think it cleaner represents
> what
> > > is
> > > > > blocked during PME.
> > > > > We can also combine both timestamp
> "cacheOperationsBlockingStartTs" and
> > > > > duration to have better correlation when cache operations were
> blocked
> > > > and
> > > > > how much time it's taken.
> > > > > For instant view (like in JMX bean) a calculated value as you
> mentioned
> > > > > can be used.
> > > > > For metrics are exported to some backend (IEP-35) a counter can be
> > > used.
> > > > > The counter is incremented by blocking time after blocking has
> ended.
> > > > >
> > > > > пт, 19 июл. 2019 г. в 19:10, Nikita Amelchev  >:
> > > > >
> > > > >> Pavel,
> > > > >>
> > > > >> The main purpose of this metric is
> > > > >> >> how much time we wait for resuming cache operations
> > > > >>
> > > > >> Seems I misunderstood you. Do you mean timestamp or duration here?
> > > > >> >> What do you think if we change the boolean value of metric to a
> > > long
> > > > >> value that represents time in milliseconds when operations were
> > > blocked?
> > > > >>
> > > > >> This time can be calculated as (currentTime -
> > > > >> timeSinceOperationsBlocked) in case of timestamp.
> > > > >>
> > > > >> Duration will be more understandable. It'll be something like
> > > > >> getCurrentBlockingPmeDuration. But I haven't come up with a better
> > > > >> name yet.
> > > > >>
> > > > >> пт, 19 июл. 2019 г. в 18:30, Pavel Kovalenko  >:
> > > > >> >
> > > > >> > Nikita,
> > > > >> >
> > > > >> > I think getCurrentPmeDuration doesn't show useful information.
> The
> > > > main
> > > > >> PME side effect for end-users is blocking cache operations. Not
> all
> > > PME
> > > > >> time blocks it.
> > > > >> > What information gives to an end-user timestamp of
> > > > >> "timeSinceOperationsBlocked"? For what analysis it can be used and
> > > how?
> > > > >> >
> > > > >> > пт, 19 июл. 2019 г. в 17:48, Nikita Amelchev <
> nsamelc...@gmail.com
> > > >:
> > > > >> >>
> > > > >> >> Hi Pavel,
> > > > >> >>
> > > > >> >> This time already can be obtained from the
> getCurrentPmeDuration
> > > and
> > > > >> >> new isOperationsBlockedByPme metrics.
> > > > >> >>
> > > > >> >> As an alternative solution, I can rework recently added
> > > > >> >> getCurrentPmeDuration metric (not released yet). Seems for
> users it
> > > > >> >> useless in case of non-blocking PME.
> > > > >> >> Lets name it timeSinceOperationsBlocked. It'll be timestamp
> when
> > > > >> >> blocking started (minimal value of cluster nodes) and 0 if
> blocking
> > > > >> >> ends (there is no running PME).
> > > > >> >>
> > > > >> >> WDYT?
> > > > >> >>
> > > > >> >> пт, 19 июл. 2019 г. в 15:56, Pavel Kovalenko <
> jokse...@gmail.com>:
> > > > >> >> >
> > > > >> >> > Hi Nikita,
> > > > >> >> >
> > > > >> >> > Thank you for working on this. What do you think if we
> change the
> > > > >> boolean
> > > > >> >> > value of metric to a long value that represents time in
> 

Re: StackOverflow question about encryption

2019-07-23 Thread Dmitriy Pavlov
Thank you, Stephen. I've checked this question. For now, it seems to me
that keys are different on 2 different nodes. Why it can happen, it is not
so clear for me.

TDE verifies digests of master keys of nodes joining to cluster. These
digests should be equal. Equal keys give the same digests. The error
message says digests differ, so it may be one node does not have key and
have different master key value.

May be Nikolay has some more ideas about reasons.

вт, 23 июл. 2019 г. в 13:14, Stephen Darlington <
stephen.darling...@gridgain.com>:

> Assume he means this one:
> https://stackoverflow.com/questions/57124826/apache-ignite-transparent-data-encryption-master-key-digest-differs-node-join
>
> > On 23 Jul 2019, at 11:02, Dmitriy Pavlov  wrote:
> >
> > Hi Vladimir, could you please share link to the question?
> >
> > вт, 23 июл. 2019 г. в 12:47, Vladimir Pligin :
> >
> >> Hi igniters,
> >>
> >> I can see a question on SO
> >> http://apache-ignite-developers.2346864.n4.nabble.com. The question is
> >> related to encryption.
> >> Nikolay Izhikov, as far as know you're the author of the feature and
> >> unfortunately I don't know anyone else who is familiar with it. Could
> you
> >> please answer? Thanks a lot in advance.
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >>
>
>
>


[jira] [Created] (IGNITE-12006) Threads may be parked for indefinite time during throttling after spurious wakeups

2019-07-23 Thread Sergey Antonov (JIRA)
Sergey Antonov created IGNITE-12006:
---

 Summary: Threads may be parked for indefinite time during 
throttling after spurious wakeups
 Key: IGNITE-12006
 URL: https://issues.apache.org/jira/browse/IGNITE-12006
 Project: Ignite
  Issue Type: Bug
Reporter: Sergey Antonov
Assignee: Sergey Antonov


In the log we see the following behavior:

{noformat}
2019-07-04 06:29:03.649[WARN 
][sys-#328%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#328%NODE%xyzGridNodeName% for timeout(ms)=16335
2019-07-04 06:29:03.649[WARN 
][sys-#326%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#326%NODE%xyzGridNodeName% for timeout(ms)=13438
2019-07-04 06:29:03.649[WARN 
][sys-#277%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#277%NODE%xyzGridNodeName% for timeout(ms)=11609
2019-07-04 06:29:03.649[WARN 
][sys-#331%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#331%NODE%xyzGridNodeName% for timeout(ms)=18009
2019-07-04 06:29:03.649[WARN 
][sys-#321%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#321%NODE%xyzGridNodeName% for timeout(ms)=15557
2019-07-04 06:29:03.650[WARN 
][sys-#307%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#307%NODE%xyzGridNodeName% for timeout(ms)=27938
2019-07-04 06:29:03.649[WARN 
][sys-#316%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#316%NODE%xyzGridNodeName% for timeout(ms)=12189
2019-07-04 06:29:03.649[WARN 
][sys-#311%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#311%NODE%xyzGridNodeName% for timeout(ms)=11056
2019-07-04 06:29:03.650[WARN 
][sys-#295%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#295%NODE%xyzGridNodeName% for timeout(ms)=20848
2019-07-04 06:29:03.649[WARN 
][sys-#290%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#290%NODE%xyzGridNodeName% for timeout(ms)=14816
2019-07-04 06:29:03.649[WARN 
][sys-#332%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#332%NODE%xyzGridNodeName% for timeout(ms)=14110
2019-07-04 06:29:03.649[WARN 
][sys-#298%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#298%NODE%xyzGridNodeName% for timeout(ms)=10028
2019-07-04 06:29:03.650[WARN 
][sys-#304%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#304%NODE%xyzGridNodeName% for timeout(ms)=19855
2019-07-04 06:29:03.650[WARN 
][sys-#331%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#331%NODE%xyzGridNodeName% for timeout(ms)=41277
2019-07-04 06:29:03.650[WARN 
][sys-#291%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#291%NODE%xyzGridNodeName% for timeout(ms)=17151
2019-07-04 06:29:03.650[WARN 
][sys-#308%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#308%NODE%xyzGridNodeName% for timeout(ms)=39312
2019-07-04 06:29:03.650[WARN 
][sys-#322%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#322%NODE%xyzGridNodeName% for timeout(ms)=43341
2019-07-04 06:29:03.650[WARN 
][sys-#306%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#306%NODE%xyzGridNodeName% for timeout(ms)=21890
2019-07-04 06:29:03.650[WARN 
][sys-#315%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#315%NODE%xyzGridNodeName% for timeout(ms)=18909
2019-07-04 06:29:03.650[WARN 
][sys-#321%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#321%NODE%xyzGridNodeName% for timeout(ms)=74129
2019-07-04 06:29:03.650[WARN 
][sys-#305%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#305%NODE%xyzGridNodeName% for timeout(ms)=26608
2019-07-04 06:29:03.650[WARN 
][sys-#309%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#309%NODE%xyzGridNodeName% for timeout(ms)=77835
2019-07-04 06:29:03.650[WARN 
][sys-#291%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#291%NODE%xyzGridNodeName% for timeout(ms)=90104
2019-07-04 06:29:03.650[WARN 
][sys-#325%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#325%NODE%xyzGridNodeName% for timeout(ms)=85813
2019-07-04 06:29:03.650[WARN 
][sys-#314%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#314%NODE%xyzGridNodeName% for timeout(ms)=81727
2019-07-04 06:29:03.650[WARN 
][sys-#338%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] Parking 
thread=sys-#338%NODE%xyzGridNodeName% for timeout(ms)=99340
2019-07-04 06:29:03.650[WARN 
][sys-#332%NODE%xyzGridNodeName%][o.a.i.i.p.c.

Re: [DISCUSSION] Ignite 3.0 and to be removed list

2019-07-23 Thread Alexey Zinoviev
I have a few ideas, maybe somebody will support me
1. Exclude Spatial Indexes from API for removal (I don't know internal
issues, but is I'd like this kind of index)
2. Exclude Storm, Flume, Flink from Integrations for Discontinuation
because I've ready to try support them (or dive in this question) I think
no so many work to support them or move to the separate module like
BigDataTools Integrations
3. Annotations based configuration of SQL - we should be careful with that,
I suppose it's useful feature
4. Ignite Messaging should be combined together with Kafka/different MQ
integration into one module for messaging support

What do you think guys?

пн, 22 июл. 2019 г. в 22:51, Denis Magda :

> Igniters,
>
> I did the first run through the wishlist and selected integrations and APIs
> for discontinuation. My suggestion would be to use IEP-36 (Modularization)
> page for the final list that we'll send to the user list for feedback:
>
>- Integrations for discontinuation:
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-36%3A+Modularization#IEP-36:Modularization-IntegrationsforDiscontinuation
>- APIs for removal:
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-36%3A+Modularization#IEP-36:Modularization-APIsforRemoval
>
> Please check those lists and let us know if you have any arguments against
> discontinuation/removal of X. Also, if you believe that something listed in
> the wishlist should be added to the EIP then let's discuss that.
> Personally, I see the whishlist as a page with ideas while the IEP a final
> plan for action.
>
>
> -
> Denis
>
>
> On Mon, Jul 22, 2019 at 12:05 AM Vyacheslav Daradur 
> wrote:
>
> > I think all agreed items should be marked @Deprecated in the code
> > base, so we will be able to remove them transparently for the
> > end-users.
> >
> > On Mon, Jul 22, 2019 at 9:32 AM Павлухин Иван 
> wrote:
> > >
> > > Alex,
> > >
> > > I already added a couple of items to wishlist [1].
> > >
> > > Yes, I agree that the process should be iterative. But I am confused
> > > on what stage we are in a current interation? I suppose that Denis is
> > > going to present a list of removal candidates which we as developers
> > > agreed on. And should not we have that list already available
> > > somewhere as a document? Now I see an infromation scattered in this
> > > thread and the wishlist [1]. And it is not easy to me to realize where
> > > we are now.
> > >
> > > [1]
> >
> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0+Wishlist
> > >
> > > чт, 18 июл. 2019 г. в 18:14, Alexey Goncharuk <
> > alexey.goncha...@gmail.com>:
> > > >
> > > > Ivan,
> > > >
> > > > The list is not final, we can still discuss and add more points to be
> > > > cleaned in 3.0. The more clear and understandable the API will be,
> the
> > > > better. This thread was intended to draft the removal scope for 3.0
> > and to
> > > > understand which portions will be definitely removed.
> > > >
> > > >
> > > > ср, 17 июл. 2019 г. в 15:26, Павлухин Иван :
> > > >
> > > > > Also, I did not quite get the point about JSR107 (JCache). From
> time
> > > > > to time I see on user-list threads where Ignite is used along with
> > > > > Spring annotation-based cache integration. I suppose it requires
> > > > > JCache interfaces. What is crucially wrong with supporting it?
> > > > >
> > > > > ср, 17 июл. 2019 г. в 15:19, Павлухин Иван :
> > > > > >
> > > > > > Folks,
> > > > > >
> > > > > > Sorry if I am repeating something. I checked a page [1] and have
> > not
> > > > > > found several items.
> > > > > > 1. I thought that there was an agreement of dropping OLD service
> > grid,
> > > > > > was not it?
> > > > > > 2. Also IndexingSpi seems to me as a candidate for removal.
> > > > > >
> > > > > > Should I add those items to the page? Or is there another page
> > > > > > containing items to be removed that we agreed on?
> > > > > >
> > > > > > [1]
> > > > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0+Wishlist
> > > > > >
> > > > > > ср, 17 июл. 2019 г. в 02:00, Denis Magda :
> > > > > > >
> > > > > > > Alex, Igniters, sorry for a delay. Got swamped with other
> duties.
> > > > > > >
> > > > > > > Does it wait till the next week? I'll make sure to dedicate
> some
> > time
> > > > > for
> > > > > > > that. Or if we'd like to run faster then I'll appreciate if
> > someone
> > > > > else
> > > > > > > steps in and prepares a list this week. I'll help to review and
> > > > > solidify it.
> > > > > > >
> > > > > > > -
> > > > > > > Denis
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jul 16, 2019 at 7:58 AM Alexey Goncharuk <
> > > > > alexey.goncha...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Denis,
> > > > > > > >
> > > > > > > > Are we ready to present the list to the user list?
> > > > > > > >
> > > > > > > > вт, 2 июл. 2019 г. в 00:27, Denis Magda :
> > > > > > > >
> > > > > > > > > I wouldn't kick off dozens of voting discussions. Instead,
>

Re: StackOverflow question about encryption

2019-07-23 Thread Stephen Darlington
Assume he means this one: 
https://stackoverflow.com/questions/57124826/apache-ignite-transparent-data-encryption-master-key-digest-differs-node-join

> On 23 Jul 2019, at 11:02, Dmitriy Pavlov  wrote:
> 
> Hi Vladimir, could you please share link to the question?
> 
> вт, 23 июл. 2019 г. в 12:47, Vladimir Pligin :
> 
>> Hi igniters,
>> 
>> I can see a question on SO
>> http://apache-ignite-developers.2346864.n4.nabble.com. The question is
>> related to encryption.
>> Nikolay Izhikov, as far as know you're the author of the feature and
>> unfortunately I don't know anyone else who is familiar with it. Could you
>> please answer? Thanks a lot in advance.
>> 
>> 
>> 
>> --
>> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>> 




Re: StackOverflow question about encryption

2019-07-23 Thread Dmitriy Pavlov
Hi Vladimir, could you please share link to the question?

вт, 23 июл. 2019 г. в 12:47, Vladimir Pligin :

> Hi igniters,
>
> I can see a question on SO
> http://apache-ignite-developers.2346864.n4.nabble.com. The question is
> related to encryption.
> Nikolay Izhikov, as far as know you're the author of the feature and
> unfortunately I don't know anyone else who is familiar with it. Could you
> please answer? Thanks a lot in advance.
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


StackOverflow question about encryption

2019-07-23 Thread Vladimir Pligin
Hi igniters,

I can see a question on SO
http://apache-ignite-developers.2346864.n4.nabble.com. The question is
related to encryption.
Nikolay Izhikov, as far as know you're the author of the feature and
unfortunately I don't know anyone else who is familiar with it. Could you
please answer? Thanks a lot in advance.



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: Partition map exchange metrics

2019-07-23 Thread Nikita Amelchev
Folks,

All previous suggestions have some disadvantages. It can be several
exchanges between two metric updates and fast exchange can rewrite
previous long exchange.

We can introduce a metric of total blocking duration that will
accumulate at the end of the exchange. So, users will get actual
information about how long operations were blocked. Cluster metric
will be a maximum of local nodes metrics. And we need a boolean metric
that will indicate realtime status. It needs because of duration
metric updates at the end of the exchange.

So I propose to change the current metric that not released to the
totalCacheOperationsBlockingDuration metric and to add the
isCacheOperationsBlocked metric.

WDYT?

пн, 22 июл. 2019 г. в 09:27, Anton Vinogradov :
>
> Nikolay,
>
> Still see no reason to replace boolean with long.
>
> On Mon, Jul 22, 2019 at 9:19 AM Nikolay Izhikov  wrote:
>
> > Anton.
> >
> > 1. Value exported based on SPI settings, not in the moment it changed.
> >
> > 2. Clock synchronisation - if we export start time, we should also export
> > node local timestamp.
> >
> > пн, 22 июля 2019 г., 8:33 Anton Vinogradov :
> >
> > > Folks,
> > >
> > > What's the reason for duration counting?
> > > AFAIU, it's a monitoring system feature to count the durations.
> > > Sine monitoring system checks metrics periodically it will know the
> > > duration by its own log.
> > >
> > > On Fri, Jul 19, 2019 at 7:32 PM Pavel Kovalenko 
> > > wrote:
> > >
> > > > Nikita,
> > > >
> > > > Yes, I mean duration not timestamp. For the metric name, I suggest
> > > > "cacheOperationsBlockingDuration", I think it cleaner represents what
> > is
> > > > blocked during PME.
> > > > We can also combine both timestamp "cacheOperationsBlockingStartTs" and
> > > > duration to have better correlation when cache operations were blocked
> > > and
> > > > how much time it's taken.
> > > > For instant view (like in JMX bean) a calculated value as you mentioned
> > > > can be used.
> > > > For metrics are exported to some backend (IEP-35) a counter can be
> > used.
> > > > The counter is incremented by blocking time after blocking has ended.
> > > >
> > > > пт, 19 июл. 2019 г. в 19:10, Nikita Amelchev :
> > > >
> > > >> Pavel,
> > > >>
> > > >> The main purpose of this metric is
> > > >> >> how much time we wait for resuming cache operations
> > > >>
> > > >> Seems I misunderstood you. Do you mean timestamp or duration here?
> > > >> >> What do you think if we change the boolean value of metric to a
> > long
> > > >> value that represents time in milliseconds when operations were
> > blocked?
> > > >>
> > > >> This time can be calculated as (currentTime -
> > > >> timeSinceOperationsBlocked) in case of timestamp.
> > > >>
> > > >> Duration will be more understandable. It'll be something like
> > > >> getCurrentBlockingPmeDuration. But I haven't come up with a better
> > > >> name yet.
> > > >>
> > > >> пт, 19 июл. 2019 г. в 18:30, Pavel Kovalenko :
> > > >> >
> > > >> > Nikita,
> > > >> >
> > > >> > I think getCurrentPmeDuration doesn't show useful information. The
> > > main
> > > >> PME side effect for end-users is blocking cache operations. Not all
> > PME
> > > >> time blocks it.
> > > >> > What information gives to an end-user timestamp of
> > > >> "timeSinceOperationsBlocked"? For what analysis it can be used and
> > how?
> > > >> >
> > > >> > пт, 19 июл. 2019 г. в 17:48, Nikita Amelchev  > >:
> > > >> >>
> > > >> >> Hi Pavel,
> > > >> >>
> > > >> >> This time already can be obtained from the getCurrentPmeDuration
> > and
> > > >> >> new isOperationsBlockedByPme metrics.
> > > >> >>
> > > >> >> As an alternative solution, I can rework recently added
> > > >> >> getCurrentPmeDuration metric (not released yet). Seems for users it
> > > >> >> useless in case of non-blocking PME.
> > > >> >> Lets name it timeSinceOperationsBlocked. It'll be timestamp when
> > > >> >> blocking started (minimal value of cluster nodes) and 0 if blocking
> > > >> >> ends (there is no running PME).
> > > >> >>
> > > >> >> WDYT?
> > > >> >>
> > > >> >> пт, 19 июл. 2019 г. в 15:56, Pavel Kovalenko :
> > > >> >> >
> > > >> >> > Hi Nikita,
> > > >> >> >
> > > >> >> > Thank you for working on this. What do you think if we change the
> > > >> boolean
> > > >> >> > value of metric to a long value that represents time in
> > > milliseconds
> > > >> when
> > > >> >> > operations were blocked?
> > > >> >> > Since we have not only JMX and now metrics are periodically
> > > exported
> > > >> to
> > > >> >> > some backend it can give a more clear picture of how much time we
> > > >> wait for
> > > >> >> > resuming cache operations instead of instant boolean indicator.
> > > >> >> >
> > > >> >> > пт, 19 июл. 2019 г. в 14:41, Nikita Amelchev <
> > nsamelc...@gmail.com
> > > >:
> > > >> >> >
> > > >> >> > > Anton, Nikolay,
> > > >> >> > >
> > > >> >> > > Thanks for the support.
> > > >> >> > >
> > > >> >> > > For now, we have the getCurrentPmeDuration() metric that does
> > not