Re: Hostname Dns Check on Connection Ignite server

2024-03-28 Thread Vladimir Steshin

    Hi, Yunus!

    There is a fix for similar case: 
https://issues.apache.org/jira/browse/IGNITE-21630


If the hostname is a clear IP, it shouldn't be resolved any more.

On 28.03.2024 16:48, Yunus Emre Tekin wrote:

Hello,

We are using apache ignite in our projects. We realized that while a client
is connected to a server, server is making a dns check for clients
hostname. There is no such option to disable this. I think it would be
good, if we add such feature.

Have a nice day



[DISCUSSION] IEP-114: Custom Metrics

2024-01-17 Thread Vladimir Steshin
Hi, Igniters! I'd like to propose a new feature - Custom Metrics. This 
would allow a user to register his own measurement and monitoring values.
Short IEP description: 1. Custom metrics are convenient to gather 
different application metrics using the same client/platform/API. Or 
when the existing metrics have no values that user would like to get. 2. 
Custom metrics are just the New Metric System. We only expose the APIs.
3. Custom metrics can be registered, for example, with services, tasks, 
computations.
4. Names of user's own metrics are required to start with "custom.". 5. 
Custom metrics aren't stored.

IEP-114 [1] has more details. Please share your thoughts. Thanks!
[1] 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-114+Custom+metrics

Re: New Apache Ignite PMC member: Nikita Amelchev

2023-11-22 Thread Vladimir Steshin

    Wow! Congrats!

On 22.11.2023 13:17, Pavel Pereslegin wrote:

Congratulations, Nikita!

ср, 22 нояб. 2023 г. в 13:11, Aleksandr Pakhomov :

Congratulations!



On 21 Nov 2023, at 18:18, Dmitriy Pavlov  wrote:

Hello Igniters,

The Project Management Committee (PMC) for Apache Ignite
has invited Nikita Amelchev to become a member of PMC and we are pleased
to announce that he has accepted.

We appreciate his constant efforts in improving Apache Ignite code, as well
as efforts in preparing 2 major releases.

A PMC member helps manage and guide the direction of the project.

Congratulations on your new role! Keep the pace!

Best Regards
Dmitriy Pavlov
on behalf of Apache Ignite PMC


Re: [DISCUSSION] Service awareness for thin client.

2023-10-26 Thread Vladimir Steshin

 Sure:

IEP : 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-112+Thin+Client+Service+Awareness


Ticket: https://issues.apache.org/jira/browse/IGNITE-20656

On 26.10.2023 18:15, Pavel Tupitsyn wrote:

Vladimir, could you please share the new IEP link here, and also link it to
the ticket?

On Thu, Oct 26, 2023 at 1:30 PM Vladimir Steshin  wrote:


  Roman, hi.

Done. Thanks!

On 26.10.2023 13:25, Roman Puchkovskiy wrote:

Hi Vladimir. Sorry to intervene, but we have a clash with IEP numbers,
there is already IEP-110 in Ignite 3, it was created on August, 1:


https://cwiki.apache.org/confluence/display/IGNITE/IEP-110%3A+Schema+synchronization%3A+basic+schema+changes

Is it possible to pick another number, while your IEP is fresh?

чт, 26 окт. 2023 г. в 14:05, Vladimir Steshin :

   All right. Pavel, thank you.

IEP:


https://cwiki.apache.org/confluence/display/IGNITE/IEP-110+Thin+Client+Service+Awareness

Ticket: https://issues.apache.org/jira/browse/IGNITE-20656

On 25.10.2023 11:04, Pavel Tupitsyn wrote:

Looks good to me

On Tue, Oct 24, 2023 at 1:50 PM Vladimir Steshin 

wrote:

We've privately discussed with Mikhail Petrov and Alexey

Plekhanov.

To us, #2 seems OK with the exceptions that a dedicated request would

be

better for transferring the service topology. And it should be

processed

by the client instead of every service proxy.

So, the suggested solution is:
1) Bring a new feature to the thin client protocol.
2) Require the partition awareness flag enabled.
3) Obtain service topology with a dedicated request by the client and
provide it to the service proxies.
4) Initiate the topology update with: first service invocation,

cluster

topology change, some timeout (only if service is invoked).

Cons:
 - Some delay of the topology obtaining. The invocation redirects

are

still possible when service migrates.
 - No sign of service cancel/deploy on the client side. We have to
update by a timeout too.
 - The topology is probably kept by client while it exists even if

is

not in use any more.

If the suggestion looks reasonable, I'm ready to implement, create

IEP.

On 17.10.2023 18:28, Vladimir Steshin wrote:

   They barely can guarantee. If client miss service instance

node,

the request is just redirected. But I talk about the most reliable

way

to keep actual service topology. If we watch cluster topology change
event, we have to take in account cases like:

- Client request service, gets its topology

- The service is canceled and redeployed to another nodes. No cluster
topology change, no sign of it on the client side.

- Client continue service requesting and misses instance node forever
or often.

If we provide, for example, version or hash of client topology

version

in every service call request, we always get actual service topology
just by comparing on server side. Independently of why and when
service redeploys. Isn't it simple and safe?

On 17.10.2023 15:52, Pavel Tupitsyn wrote:

None of the described approaches provides 100% guarantee of hitting

the

primary node in all conditions.
And it is fine to miss a few requests. I don't see a reason to

increase

complexity trying to optimize a rare use case.

On Tue, Oct 17, 2023 at 2:49 PM  wrote:


What if topology change event preceedes service redeployment and

service

mapping change? There a possibility for client to save new topology

version

before services are actually redeployed. If we rely on actual

change

of the

service mapping (redeployment), there is no such problem.

On 17.10.2023 13:44, Pavel Tupitsyn  wrote:

I think if it's good enough for cache partition awareness, then

it's

good

enough for services. Topology changes are not that frequent.

On Tue, Oct 17, 2023 at 12:22 PM  wrote:


Hi, Pavel.

1. Correct.
2. Yes, client watches ClientFlag.AFFINITY_TOPOLOGY_CHANGED flag

and

sends

additional ClientOperation.CLUSTER_GROUP_GET_NODE_ENDPOINTS to

get

new

cluster topology. Thus, the topology updates with some delay. We

could

watch this event somehow in the service proxy. But direct service

topology

version in the call responses should work faster if service is

being

requested. Or you think this is not significant?


On 17.10.2023 11:13, Pavel Tupitsyn

wrote:

Hi Vladimir,

1. A topology of a deployed service can change only when the

cluster

topology changes.
2. We already have a topology change flag in every server

response.

Therefore, the client can request the topology once per

service, and

refresh it when cluster topology changes, right?


On Mon, Oct 16, 2023 at 8:17 PM Vladimir Steshin<

vlads...@gmail.com

wrote:

Hi Igniters! I propose to add the /service awareness feature

to the

thin

client/. I remember a couple of users asked of it. Looks nice

to

have

and simple to implement. Similar to the partition awareness.
Reason:
A service can be deployed only on one or few nodes. Currently,

the

thin

client chooses one or a random n

Re: [DISCUSSION] Service awareness for thin client.

2023-10-26 Thread Vladimir Steshin

    Roman, hi.

Done. Thanks!

On 26.10.2023 13:25, Roman Puchkovskiy wrote:

Hi Vladimir. Sorry to intervene, but we have a clash with IEP numbers,
there is already IEP-110 in Ignite 3, it was created on August, 1:
https://cwiki.apache.org/confluence/display/IGNITE/IEP-110%3A+Schema+synchronization%3A+basic+schema+changes

Is it possible to pick another number, while your IEP is fresh?

чт, 26 окт. 2023 г. в 14:05, Vladimir Steshin :

  All right. Pavel, thank you.

IEP:
https://cwiki.apache.org/confluence/display/IGNITE/IEP-110+Thin+Client+Service+Awareness

Ticket: https://issues.apache.org/jira/browse/IGNITE-20656

On 25.10.2023 11:04, Pavel Tupitsyn wrote:

Looks good to me

On Tue, Oct 24, 2023 at 1:50 PM Vladimir Steshin  wrote:


   We've privately discussed with Mikhail Petrov and Alexey Plekhanov.
To us, #2 seems OK with the exceptions that a dedicated request would be
better for transferring the service topology. And it should be processed
by the client instead of every service proxy.

So, the suggested solution is:
1) Bring a new feature to the thin client protocol.
2) Require the partition awareness flag enabled.
3) Obtain service topology with a dedicated request by the client and
provide it to the service proxies.
4) Initiate the topology update with: first service invocation, cluster
topology change, some timeout (only if service is invoked).

Cons:
- Some delay of the topology obtaining. The invocation redirects are
still possible when service migrates.
- No sign of service cancel/deploy on the client side. We have to
update by a timeout too.
- The topology is probably kept by client while it exists even if is
not in use any more.

If the suggestion looks reasonable, I'm ready to implement, create IEP.

On 17.10.2023 18:28, Vladimir Steshin wrote:

  They barely can guarantee. If client miss service instance node,
the request is just redirected. But I talk about the most reliable way
to keep actual service topology. If we watch cluster topology change
event, we have to take in account cases like:

- Client request service, gets its topology

- The service is canceled and redeployed to another nodes. No cluster
topology change, no sign of it on the client side.

- Client continue service requesting and misses instance node forever
or often.

If we provide, for example, version or hash of client topology version
in every service call request, we always get actual service topology
just by comparing on server side. Independently of why and when
service redeploys. Isn't it simple and safe?

On 17.10.2023 15:52, Pavel Tupitsyn wrote:

None of the described approaches provides 100% guarantee of hitting the
primary node in all conditions.
And it is fine to miss a few requests. I don't see a reason to increase
complexity trying to optimize a rare use case.

On Tue, Oct 17, 2023 at 2:49 PM  wrote:


What if topology change event preceedes service redeployment and

service

mapping change? There a possibility for client to save new topology

version

before services are actually redeployed. If we rely on actual change

of the

service mapping (redeployment), there is no such problem.

On 17.10.2023 13:44, Pavel Tupitsyn  wrote:

I think if it's good enough for cache partition awareness, then it's

good

enough for services. Topology changes are not that frequent.

On Tue, Oct 17, 2023 at 12:22 PM  wrote:


Hi, Pavel.

1. Correct.
2. Yes, client watches ClientFlag.AFFINITY_TOPOLOGY_CHANGED flag and

sends

additional ClientOperation.CLUSTER_GROUP_GET_NODE_ENDPOINTS to get

new

cluster topology. Thus, the topology updates with some delay. We

could

watch this event somehow in the service proxy. But direct service

topology

version in the call responses should work faster if service is being
requested. Or you think this is not significant?


On 17.10.2023 11:13, Pavel Tupitsyn  wrote:

Hi Vladimir,

1. A topology of a deployed service can change only when the cluster
topology changes.
2. We already have a topology change flag in every server response.

Therefore, the client can request the topology once per service, and
refresh it when cluster topology changes, right?


On Mon, Oct 16, 2023 at 8:17 PM Vladimir Steshin
wrote:

Hi Igniters! I propose to add the /service awareness feature to the

thin

client/. I remember a couple of users asked of it. Looks nice to

have

and simple to implement. Similar to the partition awareness.
Reason:
A service can be deployed only on one or few nodes. Currently, the

thin

client chooses one or a random node to invoke a service. Then, the
service call can be always or often redirected to other server

node.

I

think we would need: - Bring a new feature to the thin client

protocol

(no protocol version change). - Require the partition awareness

flag

enabled (it creates required connections to the cluster). -

Transfer

the

service topology in the service call response (server node /already
holds /needed service topology).
- Keep

Re: [DISCUSSION] Service awareness for thin client.

2023-10-26 Thread Vladimir Steshin

    All right. Pavel, thank you.

IEP: 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-110+Thin+Client+Service+Awareness


Ticket: https://issues.apache.org/jira/browse/IGNITE-20656

On 25.10.2023 11:04, Pavel Tupitsyn wrote:

Looks good to me

On Tue, Oct 24, 2023 at 1:50 PM Vladimir Steshin  wrote:


  We've privately discussed with Mikhail Petrov and Alexey Plekhanov.
To us, #2 seems OK with the exceptions that a dedicated request would be
better for transferring the service topology. And it should be processed
by the client instead of every service proxy.

So, the suggested solution is:
1) Bring a new feature to the thin client protocol.
2) Require the partition awareness flag enabled.
3) Obtain service topology with a dedicated request by the client and
provide it to the service proxies.
4) Initiate the topology update with: first service invocation, cluster
topology change, some timeout (only if service is invoked).

Cons:
   - Some delay of the topology obtaining. The invocation redirects are
still possible when service migrates.
   - No sign of service cancel/deploy on the client side. We have to
update by a timeout too.
   - The topology is probably kept by client while it exists even if is
not in use any more.

If the suggestion looks reasonable, I'm ready to implement, create IEP.

On 17.10.2023 18:28, Vladimir Steshin wrote:

 They barely can guarantee. If client miss service instance node,
the request is just redirected. But I talk about the most reliable way
to keep actual service topology. If we watch cluster topology change
event, we have to take in account cases like:

- Client request service, gets its topology

- The service is canceled and redeployed to another nodes. No cluster
topology change, no sign of it on the client side.

- Client continue service requesting and misses instance node forever
or often.

If we provide, for example, version or hash of client topology version
in every service call request, we always get actual service topology
just by comparing on server side. Independently of why and when
service redeploys. Isn't it simple and safe?

On 17.10.2023 15:52, Pavel Tupitsyn wrote:

None of the described approaches provides 100% guarantee of hitting the
primary node in all conditions.
And it is fine to miss a few requests. I don't see a reason to increase
complexity trying to optimize a rare use case.

On Tue, Oct 17, 2023 at 2:49 PM  wrote:


What if topology change event preceedes service redeployment and

service

mapping change? There a possibility for client to save new topology

version

before services are actually redeployed. If we rely on actual change

of the

service mapping (redeployment), there is no such problem.

On 17.10.2023 13:44, Pavel Tupitsyn  wrote:

I think if it's good enough for cache partition awareness, then it's

good

enough for services. Topology changes are not that frequent.

On Tue, Oct 17, 2023 at 12:22 PM  wrote:


Hi, Pavel.

1. Correct.
2. Yes, client watches ClientFlag.AFFINITY_TOPOLOGY_CHANGED flag and

sends

additional ClientOperation.CLUSTER_GROUP_GET_NODE_ENDPOINTS to get

new

cluster topology. Thus, the topology updates with some delay. We

could

watch this event somehow in the service proxy. But direct service

topology

version in the call responses should work faster if service is being
requested. Or you think this is not significant?


On 17.10.2023 11:13, Pavel Tupitsyn  wrote:

Hi Vladimir,

1. A topology of a deployed service can change only when the cluster
topology changes.
2. We already have a topology change flag in every server response.

Therefore, the client can request the topology once per service, and
refresh it when cluster topology changes, right?


On Mon, Oct 16, 2023 at 8:17 PM Vladimir Steshin
wrote:

Hi Igniters! I propose to add the /service awareness feature to the

thin

client/. I remember a couple of users asked of it. Looks nice to

have

and simple to implement. Similar to the partition awareness.
Reason:
A service can be deployed only on one or few nodes. Currently, the

thin

client chooses one or a random node to invoke a service. Then, the
service call can be always or often redirected to other server

node.

I

think we would need: - Bring a new feature to the thin client

protocol

(no protocol version change). - Require the partition awareness

flag

enabled (it creates required connections to the cluster). -

Transfer

the

service topology in the service call response (server node /already
holds /needed service topology).
- Keep the service topology in the client service proxy. If that is

ok,

my question is /how to update service topology on the client/?
I see the options: 1) Add a version to the service topology on the
server node and on the client service proxy. Add actual service

topology

to the service call response if actual>client.
/Pros/: Always most actual service top. version
/Cons/: Requires holding and syncing top. version on server nodes

o

Re: [DISCUSSION] Service awareness for thin client.

2023-10-24 Thread Vladimir Steshin
    We've privately discussed with Mikhail Petrov and Alexey Plekhanov. 
To us, #2 seems OK with the exceptions that a dedicated request would be 
better for transferring the service topology. And it should be processed 
by the client instead of every service proxy.


So, the suggested solution is:
1) Bring a new feature to the thin client protocol.
2) Require the partition awareness flag enabled.
3) Obtain service topology with a dedicated request by the client and 
provide it to the service proxies.
4) Initiate the topology update with: first service invocation, cluster 
topology change, some timeout (only if service is invoked).


Cons:
 - Some delay of the topology obtaining. The invocation redirects are 
still possible when service migrates.
 - No sign of service cancel/deploy on the client side. We have to 
update by a timeout too.
 - The topology is probably kept by client while it exists even if is 
not in use any more.


If the suggestion looks reasonable, I'm ready to implement, create IEP.

On 17.10.2023 18:28, Vladimir Steshin wrote:


    They barely can guarantee. If client miss service instance node, 
the request is just redirected. But I talk about the most reliable way 
to keep actual service topology. If we watch cluster topology change 
event, we have to take in account cases like:


- Client request service, gets its topology

- The service is canceled and redeployed to another nodes. No cluster 
topology change, no sign of it on the client side.


- Client continue service requesting and misses instance node forever 
or often.


If we provide, for example, version or hash of client topology version 
in every service call request, we always get actual service topology 
just by comparing on server side. Independently of why and when 
service redeploys. Isn't it simple and safe?


On 17.10.2023 15:52, Pavel Tupitsyn wrote:

None of the described approaches provides 100% guarantee of hitting the
primary node in all conditions.
And it is fine to miss a few requests. I don't see a reason to increase
complexity trying to optimize a rare use case.

On Tue, Oct 17, 2023 at 2:49 PM  wrote:


What if topology change event preceedes service redeployment and service
mapping change? There a possibility for client to save new topology version
before services are actually redeployed. If we rely on actual change of the
service mapping (redeployment), there is no such problem.

On 17.10.2023 13:44, Pavel Tupitsyn  wrote:

I think if it's good enough for cache partition awareness, then it's good
enough for services. Topology changes are not that frequent.

On Tue, Oct 17, 2023 at 12:22 PM  wrote:


Hi, Pavel.

1. Correct.
2. Yes, client watches ClientFlag.AFFINITY_TOPOLOGY_CHANGED flag and

sends

additional ClientOperation.CLUSTER_GROUP_GET_NODE_ENDPOINTS to get new
cluster topology. Thus, the topology updates with some delay. We could
watch this event somehow in the service proxy. But direct service

topology

version in the call responses should work faster if service is being
requested. Or you think this is not significant?


On 17.10.2023 11:13, Pavel Tupitsyn  wrote:

Hi Vladimir,

1. A topology of a deployed service can change only when the cluster
topology changes.
2. We already have a topology change flag in every server response.

Therefore, the client can request the topology once per service, and
refresh it when cluster topology changes, right?


On Mon, Oct 16, 2023 at 8:17 PM Vladimir Steshin

wrote:

Hi Igniters! I propose to add the /service awareness feature to the

thin

client/. I remember a couple of users asked of it. Looks nice to have
and simple to implement. Similar to the partition awareness.
Reason:
A service can be deployed only on one or few nodes. Currently, the

thin

client chooses one or a random node to invoke a service. Then, the
service call can be always or often redirected to other server node.

I

think we would need: - Bring a new feature to the thin client

protocol

(no protocol version change). - Require the partition awareness flag
enabled (it creates required connections to the cluster). - Transfer

the

service topology in the service call response (server node /already
holds /needed service topology).
- Keep the service topology in the client service proxy. If that is

ok,

my question is /how to update service topology on the client/?
I see the options: 1) Add a version to the service topology on the
server node and on the client service proxy. Add actual service

topology

to the service call response if actual>client.
/Pros/: Always most actual service top. version
/Cons/: Requires holding and syncing top. version on server nodes

only

for the thin clients.
2) Add the actual service topology to the service call response only

if

service is not deployed on the current node. The client invalidates
received service topology every N invocations and/or every N seconds
(/code constants/).
/Pros/: Simple.
/Cons/: Actual topology delays. Not the best l

Re: [DISCUSSION] Service awareness for thin client.

2023-10-17 Thread Vladimir Steshin
    They barely can guarantee. If client miss service instance node, 
the request is just redirected. But I talk about the most reliable way 
to keep actual service topology. If we watch cluster topology change 
event, we have to take in account cases like:


- Client request service, gets its topology

- The service is canceled and redeployed to another nodes. No cluster 
topology change, no sign of it on the client side.


- Client continue service requesting and misses instance node forever or 
often.


If we provide, for example, version or hash of client topology version 
in every service call request, we always get actual service topology 
just by comparing on server side. Independently of why and when service 
redeploys. Isn't it simple and safe?


On 17.10.2023 15:52, Pavel Tupitsyn wrote:

None of the described approaches provides 100% guarantee of hitting the
primary node in all conditions.
And it is fine to miss a few requests. I don't see a reason to increase
complexity trying to optimize a rare use case.

On Tue, Oct 17, 2023 at 2:49 PM  wrote:


What if topology change event preceedes service redeployment and service
mapping change? There a possibility for client to save new topology version
before services are actually redeployed. If we rely on actual change of the
service mapping (redeployment), there is no such problem.

On 17.10.2023 13:44, Pavel Tupitsyn  wrote:

I think if it's good enough for cache partition awareness, then it's good
enough for services. Topology changes are not that frequent.

On Tue, Oct 17, 2023 at 12:22 PM  wrote:


Hi, Pavel.

1. Correct.
2. Yes, client watches ClientFlag.AFFINITY_TOPOLOGY_CHANGED flag and

sends

additional ClientOperation.CLUSTER_GROUP_GET_NODE_ENDPOINTS to get new
cluster topology. Thus, the topology updates with some delay. We could
watch this event somehow in the service proxy. But direct service

topology

version in the call responses should work faster if service is being
requested. Or you think this is not significant?


On 17.10.2023 11:13, Pavel Tupitsyn  wrote:

Hi Vladimir,

1. A topology of a deployed service can change only when the cluster
topology changes.
2. We already have a topology change flag in every server response.

Therefore, the client can request the topology once per service, and
refresh it when cluster topology changes, right?


On Mon, Oct 16, 2023 at 8:17 PM Vladimir Steshin

wrote:

Hi Igniters! I propose to add the /service awareness feature to the

thin

client/. I remember a couple of users asked of it. Looks nice to have
and simple to implement. Similar to the partition awareness.
Reason:
A service can be deployed only on one or few nodes. Currently, the

thin

client chooses one or a random node to invoke a service. Then, the
service call can be always or often redirected to other server node.

I

think we would need: - Bring a new feature to the thin client

protocol

(no protocol version change). - Require the partition awareness flag
enabled (it creates required connections to the cluster). - Transfer

the

service topology in the service call response (server node /already
holds /needed service topology).
- Keep the service topology in the client service proxy. If that is

ok,

my question is /how to update service topology on the client/?
I see the options: 1) Add a version to the service topology on the
server node and on the client service proxy. Add actual service

topology

to the service call response if actual>client.
/Pros/: Always most actual service top. version
/Cons/: Requires holding and syncing top. version on server nodes

only

for the thin clients.
2) Add the actual service topology to the service call response only

if

service is not deployed on the current node. The client invalidates
received service topology every N invocations and/or every N seconds
(/code constants/).
/Pros/: Simple.
/Cons/: Actual topology delays. Not the best load balancing.
3) Send from client a hash for the known service nodes UUIDs in every
service call request. Add actual service topology to the service call
response if the server's hash is not equal.
/Pros/: Simple. Always most actual service topology.
/Cons/: Costs some CPU sometimes.
WDYT?





[DISCUSSION] Service awareness for thin client.

2023-10-16 Thread Vladimir Steshin
Hi Igniters! I propose to add the /service awareness feature to the thin 
client/. I remember a couple of users asked of it. Looks nice to have 
and simple to implement. Similar to the partition awareness.

Reason:
A service can be deployed only on one or few nodes. Currently, the thin 
client chooses one or a random node to invoke a service. Then, the 
service call can be always or often redirected to other server node. I 
think we would need: - Bring a new feature to the thin client protocol 
(no protocol version change). - Require the partition awareness flag 
enabled (it creates required connections to the cluster). - Transfer the 
service topology in the service call response (server node /already 
holds /needed service topology).
- Keep the service topology in the client service proxy. If that is ok, 
my question is /how to update service topology on the client/?
I see the options: 1) Add a version to the service topology on the 
server node and on the client service proxy. Add actual service topology 
to the service call response if actual>client.

/Pros/: Always most actual service top. version
/Cons/: Requires holding and syncing top. version on server nodes only 
for the thin clients.
2) Add the actual service topology to the service call response only if 
service is not deployed on the current node. The client invalidates 
received service topology every N invocations and/or every N seconds 
(/code constants/).

/Pros/: Simple.
/Cons/: Actual topology delays. Not the best load balancing.
3) Send from client a hash for the known service nodes UUIDs in every 
service call request. Add actual service topology to the service call 
response if the server's hash is not equal.

/Pros/: Simple. Always most actual service topology.
/Cons/: Costs some CPU sometimes.
WDYT?


Re: [VOTE] Release Apache Ignite 2.15.0 RC0

2023-04-28 Thread Vladimir Steshin

+ 1

26.04.2023 22:51, Alex Plehanov пишет:

Dear Community,

The release candidate is ready.

I have uploaded release candidate to
https://dist.apache.org/repos/dist/dev/ignite/2.15.0-rc0/
https://dist.apache.org/repos/dist/dev/ignite/packages_2.15.0-rc0/

The following staging can be used for testing:
https://repository.apache.org/content/repositories/orgapacheignite-1558/

Tag name is 2.15.0-rc0:
https://gitbox.apache.org/repos/asf?p=ignite.git;a=tag;h=refs/tags/2.15.0-rc0

2.15.0 most important changes:
* Implemented incremental snapshots.
* Java thin client improvements (logging, connections balancing, events
listening, endpoints discovery)
* Calcite based SQL engine improvements (memory quotas, index scans
optimisations).
* Reworked permission management for system tasks.
* Removed deprecated functionality (daemon nodes, visorcmd, legacy JMX
beans).

RELEASE NOTES:
https://gitbox.apache.org/repos/asf?p=ignite.git;a=blob_plain;f=RELEASE_NOTES.txt;hb=ignite-2.15

Complete list of resolved issues:
https://issues.apache.org/jira/issues/?jql=(project%20%3D%20'Ignite'%20AND%20fixVersion%20is%20not%20empty%20AND%20fixVersion%20in%20('2.15'))%20AND%20(component%20is%20EMPTY%20OR%20component%20not%20in%20(documentation))%20and%20status%20in%20('CLOSED'%2C%20'RESOLVED')%20AND%20resolution%20in(Fixed%2C%20Done%2C%20Implemented%2C%20Delivered)%20ORDER%20BY%20priority



DEVNOTES
https://gitbox.apache.org/repos/asf?p=ignite.git;a=blob_plain;f=DEVNOTES.txt;hb=ignite-2.15

The vote is formal, see voting guidelines https://www.apache.org/foundation/
voting.html

+1 - to accept Apache Ignite 2.15.0-rc0
0 - don't care either way
-1 - DO NOT accept Apache Ignite Ignite 2.15.0-rc0 (explain why)

See notes on how to verify release here
https://www.apache.org/info/verification.html
and
https://cwiki.apache.org/confluence/display/IGNITE/Release+Process#ReleaseProcess-P5.VotingonReleaseandReleaseVerification

This vote will be open till Tue May 2, 2023, 07:00 UTC.
https://www.timeanddate.com/countdown/vote?iso=20230502T07=0=VOTE+on+the+Apache+Ignite+Release+2.15.0+RC0=sanserif



Re: Deprecated GridSslContextFactory removal

2023-04-26 Thread Vladimir Steshin

+1

24.04.2023 15:49, Николай Ижиков пишет:

+1


24 апр. 2023 г., в 14:56, Taras Ledkov  написал(а):

+1 for remove


Re: Apache Ignite 2.15 RELEASE [Time, Scope, Manager]

2023-03-29 Thread Vladimir Steshin

+1

29.03.2023 21:56, Alex Plehanov пишет:

Dear Ignite Community!

I suggest starting Apache Ignite 2.15 release activities.

We've accumulated more than two hundred resolved issues [1] with new
features and bug fixes which are waiting for release.
The major changes related to the proposed release:
- Incremental snapshots.
- Java thin client improvements (logging, connections balancing,
 events listening, endpoints discovery)
- Calcite based SQL engine improvements (memory quotas, index scans
optimisations).
- Reworked permission management for system tasks.
- Removed some deprecated functionality (daemon nodes, visorcmd, legacy JMX
beans)
etc.

I want to propose myself to be the release manager of the planning release.

I propose the following timeline:

Scope Freeze: April 08, 2023
Code Freeze: April 15, 2023
Voting Date: April 22, 2023
Release Date: April 29, 2023

[1].
https://issues.apache.org/jira/issues/?jql=(project%20%3D%20%27Ignite%27%20AND%20fixVersion%20in%20(2.15))%20AND%20(component%20is%20EMPTY%20OR%20component%20not%20in%20(documentation))%20and%20status%20in%20(%27CLOSED%27%2C%20%27RESOLVED%27)

WDYT?



Re: [VOTE] Release bug fix release pyignite-0.6.1-rc0

2023-02-15 Thread Vladimir Steshin

+ 1

15.02.2023 11:21, Ivan Daschinsky пишет:

Dear Igniters!

This is a patch release that contains an important fix for users of
pyignite

https://issues.apache.org/jira/browse/IGNITE-18788


Release candidate binaries for subj are uploaded and ready for vote
You can find them here:
https://dist.apache.org/repos/dist/dev/ignite/pyignite/0.6.1-rc0

If you follow the link above, you will find source packages (*.tar.gz and
*.zip)
and binary packages (wheels) for windows (amd64), linux (x86_64) amd mac os
(x86_64)
for pythons 37, 38, 39, 310 and 311. Also, there are sha512 and gpg
signatures.
Code signing keys can be found here -- https://downloads.apache.org/ignite
/KEYS
Here you can find instructions how to verify packages
https://www.apache.org/info/verification.html

You can install binary package for specific version of python using pip
For example do this on linux for python 3.8

pip install

pyignite-0.6.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl

You can build and install package from source using this command:

pip install pyignite-0.6.1.zip

You can build wheel on your platform using this command:

pip wheel --no-deps pyignite-0.6.1.zip

For building C module, you should have python headers and C compiler
installed.
(i.e. for ubuntu sudo apt install build-essential python3-dev)
In Mac OS X xcode-tools and python from homebrew are the best option.

In order to check whether C module works, use following:

from pyignite import _cutils
print(_cutils.hashcode('test'))
3556498

You can find documentation here:
https://apache-ignite-binary-protocol-client.readthedocs.io/en/0.6.1.rc0/

You can find examples here (to check them, you should start ignite locally):
https://apache-ignite-binary-protocol-client.readthedocs.io/en/0.6.1.rc0/examples.html
Also, examples can be found in source archive in examples subfolder.
docker-compose.yml is supplied in order to start ignite quickly. (Use
`docker-compose up -d` to start 3 nodes cluster and `docker-compose
down` to shut down it)

Release notes:
https://gitbox.apache.org/repos/asf?p=ignite-python-thin-client.git;a=blob;f=RELEASE_NOTES.txt;h=86448e9ce51d7223ac49cf4f95da70d3d365e8c1;hb=0d86f44e86270f4d578afbce41aa2d6c424d2615

Git release tag was created:
https://gitbox.apache.org/repos/asf?p=ignite-python-thin-client.git;a=tag;h=b0ce094d7a2db3fb07471be7b37ff9edab4180a8

The vote is formal, see voting guidelines
https://www.apache.org/foundation/voting.html

+1 - to accept pyignite-0.6.1-rc0
0 - don't care either way
-1 - DO NOT accept pyignite-0.6.1-rc0

The vote finishes at 02/17/2021 15:00 UTC



Re: [VOTE] Release pyignite 0.6.0.rc0

2022-11-11 Thread Vladimir Steshin

    Good job. +1

11.11.2022 16:40, Ivan Daschinsky пишет:

Dear Igniters!

Release candidate binaries for subj are uploaded and ready for vote
You can find them here:
https://dist.apache.org/repos/dist/dev/ignite/pyignite/0.6.0.rc0

If you follow the link above, you will find source packages (*.zip)
and binary packages (wheels) for windows (amd64), mac os x (amd64) and
linux (x86_64)
for pythons 37, 38, 39, 310 and 311. Also, there are sha512 and gpg
signatures.
Code signing keys can be found here --
https://downloads.apache.org/ignite/KEYS
Here you can find instructions how to verify packages
https://www.apache.org/info/verification.html

You can install binary package for specific version of python using pip
For example do this on linux for python 3.8

pip

install 
pyignite-0.6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl

You can build and install package from source using this command:

pip install pyignite-0.6.0.zip

You can build wheel on your platform using this command:

pip wheel --no-deps pyignite-0.6.0.zip

For building C module, you should have python headers and C compiler
installed.
(i.e. for ubuntu sudo apt install build-essential python3-dev)
In Mac OS X xcode-tools and python from homebrew are the best option.

In order to check whether C module works, use following:

from pyignite import _cutils
print(_cutils.hashcode('test'))
3556498

You can find documentation here:
https://apache-ignite-binary-protocol-client.readthedocs.io/en/0.6.0.rc0/

You can find examples here (to check them, you should start ignite locally):
https://apache-ignite-binary-protocol-client.readthedocs.io/en/0.6.0.rc0/examples.html
Also, examples can be found in source archive in examples subfolder.
docker-compose.yml is supplied in order to start ignite quickly. (Use
`docker-compose up -d` to start 3 nodes cluster and `docker-compose
down` to shut down it)

Release notes:
https://gitbox.apache.org/repos/asf?p=ignite-python-thin-client.git;a=blob_plain;f=RELEASE_NOTES.txt;hb=a0600047e7b29fc23350f77d4b087cfb55032d72

Git release tag was created:
https://gitbox.apache.org/repos/asf?p=ignite-python-thin-client.git;a=commit;h=a0600047e7b29fc23350f77d4b087cfb55032d72

The vote is formal, see voting guidelines
https://www.apache.org/foundation/voting.html

+1 - to accept pyignite-0.6.0.rc0
0 - don't care either way
-1 - DO NOT accept pyignite-0.6.0.rc0

The vote finishes at 11/15/2022 15:00 UTC



Re: [ANNOUNCE] Welcome Mikhail Petrov as a new Committer

2022-11-11 Thread Vladimir Steshin

Mikhail, congratulations!

11.11.2022 11:20, Maxim Muzafarov пишет:

The Project Management Committee (PMC) for Apache Ignite has invited
Mikhail Petrov to become a committer and we are pleased to announce
that they have accepted.

Mikhail Petrov is an active contributor and community member, he made
significant additions to Ignite and Ignite Extensions code bases, this
client support for Spring Data, Spring Transactions, tracing of SQL
queries etc.

Being a committer enables easier contribution to the project since
there is no need to go via the patch submission process. This should
enable better productivity.

Please join in welcoming Mikhail Petrov, and congratulating him on the
new role in the Apache Ignite Community!


Best Regards,
Maxim Muzafarov
on behalf of Apache Ignite PMC

Re: [DISCUSSION] Add DataStreamer's default per-node-batch-setting for PDS.

2022-11-02 Thread Vladimir Steshin

    Hi, Stan.


    Thank you for the answer.

>>>  "your data streamer queue size is something like"
You are right about writes queue on primary node. It has just some fixed 
size. But based on number of the CPUs. (x8). Even for my laptop I get 
16x8=128 batches. I wonder why so much by default for persistence.


>>> "Can you check the heap dump in your tests to see what actually 
occupies most of the heap?"
The backup nodes collect `GridDhtAtomicSingleUpdateRequest`with key/data 
`byte[]`. That's where we don't wait for in this case.


    I thought we might slightly adjust the default setting at least to 
make simple test more reliable. As a user, I wouldn't like that I just 
take a tool/product just to try/research and it fails quick. But yes, 
user still has the related setting `perNodeParallelOperations()`


WDYT?

30.10.2022 21:24, Stanislav Lukyanov пишет:

Hi Vladimir,

I think this is potentially an issue but I don't think this is about PDS at all.

The description is a bit vague, I have to say. AFAIU what you see is that when 
the caches are persistent the streamer writes data faster than the nodes 
(especially, backup nodes) process the writes.
Therefore, the nodes accumulate the writes in the queues, the queues grow, and 
then you might go OOM.

The solution to just have lesser queues when there is persistent (and therefore 
it's more likely the queues will reach the max size) is not the best one, in my 
opinion.
If the default max queue size is too large, it should be less always, 
regardless of why the queues grow.

Furthermore, I have a feeling that what gives you OOM isn't the data streamer 
queue... AFAIR your data streamer queue size is something like (entrySize * 
bufferSize * perNodeParallelOperations),
which for 1 kb entries and 16 threads gives (1kb * 512 * 16 * 8) = 64mb which 
is usually peanuts for server Java.

Can you check the heap dump in your tests to see what actually occupies most of 
the heap?

Thanks,
Stan


On 28 Oct 2022, at 11:54, Vladimir Steshin  wrote:

 Hi Folks,

 I found that Datastreamer may consume heap or use increased heap amount 
when loading into a persistent cache.
This may happen with streamer's 'allowOverwite'==true and the cache is in 
PRIMARY_SYNC mode.

 What I don't like here is that the case looks simple. Not the defaults, 
but user might meet the issue just in a trival test, trying/researching the 
streamer.

 Streamer has related 'perNodeParallelOperations()' which helps. But 
addinional DFLT_PARALLEL_PERSISTENT_OPS_MULTIPLIER might be set for PDS.

 My question are:
1) Is it an issue at all? Need to fix? A minor?
2) Should we bring additional default DFLT_PARALLEL_PERSISTENT_OPS_MULTIPLIER 
for PDS because it reduces heap consumption?
3) Better solution is backpressure. But does it worth the case?

Ticket:https://issues.apache.org/jira/browse/IGNITE-17735
PR:https://github.com/apache/ignite/pull/10343

[DISCUSSION] Add DataStreamer's default per-node-batch-setting for PDS.

2022-10-28 Thread Vladimir Steshin

    Hi Folks,

    I found that Datastreamer may consume heap or use increased heap 
amount when loading into a persistent cache.
This may happen with streamer's 'allowOverwite'==true and the cache is 
in PRIMARY_SYNC mode.


    What I don't like here is that the case looks simple. Not the 
defaults, but user might meet the issue just in a trival test, 
trying/researching the streamer.


    Streamer has related 'perNodeParallelOperations()' which helps. But 
addinional DFLT_PARALLEL_PERSISTENT_OPS_MULTIPLIER might be set for PDS.


    My question are:
1) Is it an issue at all? Need to fix? A minor?
2) Should we bring additional default 
DFLT_PARALLEL_PERSISTENT_OPS_MULTIPLIER for PDS because it reduces heap 
consumption?

3) Better solution is backpressure. But does it worth the case?

Ticket: https://issues.apache.org/jira/browse/IGNITE-17735
PR: https://github.com/apache/ignite/pull/10343

Unmarshalable result from the MX beans

2022-02-03 Thread Vladimir Steshin

    Hi, Igniters.

    We've found that invocation results from some MX beans might not be 
properly represented on clients like jconsole [1]. In the exception 
cases. We pass Ignite exceptions or JMException with Ignite exception 
causes to the client. And the client fails to unmarshal unknown 
exceptions instead of showing failure reason.


Examples:

 * /IgniteMXBean.clusterState()/ can throw /IgniteException/
 * /IgniteMXBean.executeTask()/ can throw /JMException/ holding
   /IgniteException/
 * /IgniteClusterMXBean.tag()/ can throw /JMException/ holding
   /IgniteException/


Solutions might be:

1. For existing /JMExceptions/ let's keep only the error message and
   hold no cause which can't be deserialized.
2. #1 + Let's throw just /RuntimeException/ with the error message
   instead of /IgniteException/ (for example in
   /IgniteMXBean.clusterState()/).
3. #1 + Let's use and declare similar /'throws JMException'/ at methods
   like /IgniteMXBean.clusterState()/. But the user code that uses
   mx-bean public API might not be compiled with this change.
4. Deprecate these exception-throwing methods and bring new ones
   returning string-result like "OK" or "failed: cause". But
   /IgniteMXBean/ has already had several change-cluster-state methods
   including deprecated.


My suggestion is #3. WDYT?


[1] https://issues.apache.org/jira/browse/IGNITE-16416



Re: Proxy (GridServiceProxy) for local services if required

2022-01-25 Thread Vladimir Steshin
    Similar question about .NET. I'm doing the sub-ticket with metrics 
of the platform services. Should we return proxy every tome by 
/IServices.GerServiceProxy()/ and probably deprecate 
/IServices.GetService()/. Same problems with variate the behavior of 
/GerServiceProxy()/ and statistics corruption (if enabled) with 
/GetService()/.


24.01.2022 22:27, Valentin Kulichenko пишет:

Vladimir,

Agreed. My point is that the actual issue we want to fix is broken metrics.
Thus, we should have a single ticket/patch *about the broken metrics* that
incorporates all required changes, as well as relevant documentation
updates.

I don't think we need a vote. I will also change mine to +1 in the existing
thread.

-Val

On Mon, Jan 24, 2022 at 11:23 AM Vladimir Steshin
wrote:


  Sure. This is a simple change. These 2 fixes could go together. And
the doc already has notes of service statistics traits. Ok, might be
even better.

  I'm not sure whether we need extra voting for deprecating
'service()' and, as earlier and making 'serviceProxy()' return proxy
every time. Several active contributors have said their 'yes' for both
in the thread.

Several active contributors have already said their 'yes'.

23.01.2022 03:50, Valentin Kulichenko пишет:

If we already agreed to deprecate the service() method, then I'm ok with
the change. But I think we should do both fixes together and also clearly
document that using service() can lead to incorrect metrics.

-Val

On Fri, Jan 21, 2022 at 1:13 AM Vladimir Steshin

wrote:

   Valentin, there are 2 notable issues:

1) Variate behavior of `/serviceProxy()/` depending on user setting. It
can return both proxy and reference.

2) Service metrics are corrupted by invocations through `/service()/`.


   How we can fix:

1) Just return proxy every time.

2) Deprecate `/service()/`. We've already discussed that here: [1].
Please see my previous message in the thread.



[1]https://www.mail-archive.com/dev@ignite.apache.org/msg44062.html


On 20.01.2022 22:48, Valentin Kulichenko wrote:

So the proposed change will not actually fix the issue with metrics,
because it's still possible to get a local instance via the service()
method. At the same time, the change removes an existing performance
optimization.

Let's figure out how to fix the actual problem. If the *only* way to

have

metrics is to have a proxy, then this should be the only way to

interact

with a service. In that case, we need to do something with the

service()

method (deprecate it?). Or probably there are other ways to fix

metrics?

-Val

On Thu, Jan 20, 2022 at 3:32 AM Vladimir Steshin

wrote:

 Yes. Invocations from direct reference are not measured. This

method

in the javadoc:


/* NOTE: Statistics are collected only with service proxies
obtaining by methods like/

/* {@link IgniteServices#serviceProxy(String, Class, boolean)} and

won't

work for direct reference of local/

/* services which you can get by, for example, {@link
IgniteServices#service(String)}./


On 20.01.2022 00:20, Valentin Kulichenko wrote:

BTW, there is also the service() method that can only return an

instance

and never returns a proxy. Does it corrupt the metrics as well?

-Val

On Wed, Jan 19, 2022 at 1:09 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


Maxim,

The reason I'm asking is that I don't really understand how client

side

mechanics affect server side metrics (number of executions and their
durations). I feel that we might be fixing a wrong problem.

Could you elaborate on why we count metrics incorrectly when
the serviceProxy() returns an instance of a local service instead of

an

actual proxy?

-Val

On Tue, Jan 18, 2022 at 11:32 PM Maksim Timonin<

timoninma...@apache.org

wrote:


Hi, guys!


this is not a good idea to change the behavior of serviceProxy()

depending on statistics

I think that the patch doesn't change the behavior of

`serviceProxy()`.

This method promises a proxy and it actually returns it. The fact

that

`serviceProxy()` can return non-proxy objects is an internal Ignite
optimization, and users should not rely on this, there is a

separate

method
`service()` for that.


What are the metrics that are being affected by this?

Only service metrics, that calculates duration of service

execution.

Check

this ticket [1]

[1]https://issues.apache.org/jira/browse/IGNITE-12464


On Wed, Jan 19, 2022 at 1:22 AM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


What are the metrics that are being affected by this?

-Val

On Tue, Jan 18, 2022 at 3:31 AM Вячеслав Коптилин <
slava.kopti...@gmail.com>
wrote:


Hello Igniters,

IMHO, this is not a good idea to change the behavior of

serviceProxy()

depending on statistics (enabled/disabled). It seems

counterintuitive

to

me.
Perhaps, we need to introduce a new method that should always

return

a

proxy to the user service.

Thanks,
Slava.


вт, 28 дек. 2021 г. в 13:57, Pavel Per

Re: Proxy (GridServiceProxy) for local services if required

2022-01-24 Thread Vladimir Steshin
>>>Did I correctly understand that: we need to revert the previous 
patch, update it with the deprecation, and submit it again to master?


Not sure at all. For what? The patch could be improved. But it is 
independent, brings new features with documented limitations. If we want 
to deprecate something, let's do it. No need, imho, to revert anything. 
Next version like 2.13 would contain all there fixes combined.


24.01.2022 21:35, Maksim Timonin пишет:

Hi guys,


If we already agreed to deprecate the service() method, then I'm ok with

the change

As I can see in the previous discussion that Vladimir mentioned [1], there
was a proposal for deprecating by Vladimir and Denis Mekhanikov, but there
wasn't a final decision. So I tried to see the `#service()` from a
different angle and tried to be a user's advocate. But I found only a
few weak arguments for remaining this functionality, and also nobody
supported them :) So, if there are no objections let's finally deprecate it.


But I think we should do both fixes together and also clearly document

that using service() can lead to incorrect metrics

Did I correctly understand that: we need to revert the previous patch,
update it with the deprecation, and submit it again to master?



On Sun, Jan 23, 2022 at 3:51 AM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


If we already agreed to deprecate the service() method, then I'm ok with
the change. But I think we should do both fixes together and also clearly
document that using service() can lead to incorrect metrics.

-Val

On Fri, Jan 21, 2022 at 1:13 AM Vladimir Steshin 
wrote:


  Valentin, there are 2 notable issues:

1) Variate behavior of `/serviceProxy()/` depending on user setting. It
can return both proxy and reference.

2) Service metrics are corrupted by invocations through `/service()/`.


  How we can fix:

1) Just return proxy every time.

2) Deprecate `/service()/`. We've already discussed that here: [1].
Please see my previous message in the thread.



[1] https://www.mail-archive.com/dev@ignite.apache.org/msg44062.html


On 20.01.2022 22:48, Valentin Kulichenko wrote:

So the proposed change will not actually fix the issue with metrics,
because it's still possible to get a local instance via the service()
method. At the same time, the change removes an existing performance
optimization.

Let's figure out how to fix the actual problem. If the *only* way to

have

metrics is to have a proxy, then this should be the only way to

interact

with a service. In that case, we need to do something with the

service()

method (deprecate it?). Or probably there are other ways to fix

metrics?

-Val

On Thu, Jan 20, 2022 at 3:32 AM Vladimir Steshin

wrote:

Yes. Invocations from direct reference are not measured. This

method

in the javadoc:


/* NOTE: Statistics are collected only with service proxies
obtaining by methods like/

/* {@link IgniteServices#serviceProxy(String, Class, boolean)} and

won't

work for direct reference of local/

/* services which you can get by, for example, {@link
IgniteServices#service(String)}./


On 20.01.2022 00:20, Valentin Kulichenko wrote:

BTW, there is also the service() method that can only return an

instance

and never returns a proxy. Does it corrupt the metrics as well?

-Val

On Wed, Jan 19, 2022 at 1:09 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


Maxim,

The reason I'm asking is that I don't really understand how client

side

mechanics affect server side metrics (number of executions and their
durations). I feel that we might be fixing a wrong problem.

Could you elaborate on why we count metrics incorrectly when
the serviceProxy() returns an instance of a local service instead of

an

actual proxy?

-Val

On Tue, Jan 18, 2022 at 11:32 PM Maksim Timonin<

timoninma...@apache.org

wrote:


Hi, guys!


this is not a good idea to change the behavior of serviceProxy()

depending on statistics

I think that the patch doesn't change the behavior of

`serviceProxy()`.

This method promises a proxy and it actually returns it. The fact

that

`serviceProxy()` can return non-proxy objects is an internal Ignite
optimization, and users should not rely on this, there is a

separate

method
`service()` for that.


What are the metrics that are being affected by this?

Only service metrics, that calculates duration of service

execution.

Check

this ticket [1]

[1]https://issues.apache.org/jira/browse/IGNITE-12464


On Wed, Jan 19, 2022 at 1:22 AM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


What are the metrics that are being affected by this?

-Val

On Tue, Jan 18, 2022 at 3:31 AM Вячеслав Коптилин <
slava.kopti...@gmail.com>
wrote:


Hello Igniters,

IMHO, this is not a good idea to change the behavior of

serviceProxy()

depending on statistics (enabled/disabled). It seems

counterintuitive

to

me.
Perhaps, we need to introduce a new method that should always


Re: Proxy (GridServiceProxy) for local services if required

2022-01-24 Thread Vladimir Steshin
    Sure. This is a simple change. These 2 fixes could go together. And 
the doc already has notes of service statistics traits. Ok, might be 
even better.


    I'm not sure whether we need extra voting for deprecating 
'service()' and, as earlier and making 'serviceProxy()' return proxy 
every time. Several active contributors have said their 'yes' for both 
in the thread.


Several active contributors have already said their 'yes'.

23.01.2022 03:50, Valentin Kulichenko пишет:

If we already agreed to deprecate the service() method, then I'm ok with
the change. But I think we should do both fixes together and also clearly
document that using service() can lead to incorrect metrics.

-Val

On Fri, Jan 21, 2022 at 1:13 AM Vladimir Steshin  wrote:


  Valentin, there are 2 notable issues:

1) Variate behavior of `/serviceProxy()/` depending on user setting. It
can return both proxy and reference.

2) Service metrics are corrupted by invocations through `/service()/`.


  How we can fix:

1) Just return proxy every time.

2) Deprecate `/service()/`. We've already discussed that here: [1].
Please see my previous message in the thread.



[1] https://www.mail-archive.com/dev@ignite.apache.org/msg44062.html


On 20.01.2022 22:48, Valentin Kulichenko wrote:

So the proposed change will not actually fix the issue with metrics,
because it's still possible to get a local instance via the service()
method. At the same time, the change removes an existing performance
optimization.

Let's figure out how to fix the actual problem. If the *only* way to have
metrics is to have a proxy, then this should be the only way to interact
with a service. In that case, we need to do something with the service()
method (deprecate it?). Or probably there are other ways to fix metrics?

-Val

On Thu, Jan 20, 2022 at 3:32 AM Vladimir Steshin

wrote:

Yes. Invocations from direct reference are not measured. This method
in the javadoc:


/* NOTE: Statistics are collected only with service proxies
obtaining by methods like/

/* {@link IgniteServices#serviceProxy(String, Class, boolean)} and won't
work for direct reference of local/

/* services which you can get by, for example, {@link
IgniteServices#service(String)}./


On 20.01.2022 00:20, Valentin Kulichenko wrote:

BTW, there is also the service() method that can only return an

instance

and never returns a proxy. Does it corrupt the metrics as well?

-Val

On Wed, Jan 19, 2022 at 1:09 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


Maxim,

The reason I'm asking is that I don't really understand how client

side

mechanics affect server side metrics (number of executions and their
durations). I feel that we might be fixing a wrong problem.

Could you elaborate on why we count metrics incorrectly when
the serviceProxy() returns an instance of a local service instead of

an

actual proxy?

-Val

On Tue, Jan 18, 2022 at 11:32 PM Maksim Timonin<

timoninma...@apache.org

wrote:


Hi, guys!


this is not a good idea to change the behavior of serviceProxy()

depending on statistics

I think that the patch doesn't change the behavior of

`serviceProxy()`.

This method promises a proxy and it actually returns it. The fact

that

`serviceProxy()` can return non-proxy objects is an internal Ignite
optimization, and users should not rely on this, there is a separate
method
`service()` for that.


What are the metrics that are being affected by this?

Only service metrics, that calculates duration of service execution.

Check

this ticket [1]

[1]https://issues.apache.org/jira/browse/IGNITE-12464


On Wed, Jan 19, 2022 at 1:22 AM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


What are the metrics that are being affected by this?

-Val

On Tue, Jan 18, 2022 at 3:31 AM Вячеслав Коптилин <
slava.kopti...@gmail.com>
wrote:


Hello Igniters,

IMHO, this is not a good idea to change the behavior of

serviceProxy()

depending on statistics (enabled/disabled). It seems

counterintuitive

to

me.
Perhaps, we need to introduce a new method that should always

return

a

proxy to the user service.

Thanks,
Slava.


вт, 28 дек. 2021 г. в 13:57, Pavel Pereslegin:


Hi!

Agree with Maxim.

It seems to me quite normal to return a proxy for a local instance

in

the case when the user has explicitly enabled statistics

collection

in

the service settings. Those. by default, we do not change the

behavior

and if the collection of metrics is not needed, a local instance

will

be returned. And I also think the javadoc should be changed to

reflect

the new behavior.

So, I'm for 1 + 3.

вт, 28 дек. 2021 г. в 10:51, Maksim Timonin <

timoninma...@apache.org>:

Hi!

I agree that users shouldn't expect a non-proxy when invoking the
`IgniteServices#serviceProxy()` method. I think it's up to Ignite

to

return

a non-proxy instance here as possible optimisation. But users

have to

use

interfaces in any case. There is the `Ignite

Re: Proxy (GridServiceProxy) for local services if required

2022-01-21 Thread Vladimir Steshin

    Valentin, there are 2 notable issues:

1) Variate behavior of `/serviceProxy()/` depending on user setting. It 
can return both proxy and reference.


2) Service metrics are corrupted by invocations through `/service()/`.


    How we can fix:

1) Just return proxy every time.

2) Deprecate `/service()/`. We've already discussed that here: [1]. 
Please see my previous message in the thread.




[1] https://www.mail-archive.com/dev@ignite.apache.org/msg44062.html


On 20.01.2022 22:48, Valentin Kulichenko wrote:

So the proposed change will not actually fix the issue with metrics,
because it's still possible to get a local instance via the service()
method. At the same time, the change removes an existing performance
optimization.

Let's figure out how to fix the actual problem. If the *only* way to have
metrics is to have a proxy, then this should be the only way to interact
with a service. In that case, we need to do something with the service()
method (deprecate it?). Or probably there are other ways to fix metrics?

-Val

On Thu, Jan 20, 2022 at 3:32 AM Vladimir Steshin  wrote:


   Yes. Invocations from direct reference are not measured. This method
in the javadoc:


/* NOTE: Statistics are collected only with service proxies
obtaining by methods like/

/* {@link IgniteServices#serviceProxy(String, Class, boolean)} and won't
work for direct reference of local/

/* services which you can get by, for example, {@link
IgniteServices#service(String)}./


On 20.01.2022 00:20, Valentin Kulichenko wrote:

BTW, there is also the service() method that can only return an instance
and never returns a proxy. Does it corrupt the metrics as well?

-Val

On Wed, Jan 19, 2022 at 1:09 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


Maxim,

The reason I'm asking is that I don't really understand how client side
mechanics affect server side metrics (number of executions and their
durations). I feel that we might be fixing a wrong problem.

Could you elaborate on why we count metrics incorrectly when
the serviceProxy() returns an instance of a local service instead of an
actual proxy?

-Val

On Tue, Jan 18, 2022 at 11:32 PM Maksim Timonin
Hi, guys!


this is not a good idea to change the behavior of serviceProxy()

depending on statistics

I think that the patch doesn't change the behavior of `serviceProxy()`.
This method promises a proxy and it actually returns it. The fact that
`serviceProxy()` can return non-proxy objects is an internal Ignite
optimization, and users should not rely on this, there is a separate
method
`service()` for that.


What are the metrics that are being affected by this?

Only service metrics, that calculates duration of service execution.

Check

this ticket [1]

[1]https://issues.apache.org/jira/browse/IGNITE-12464


On Wed, Jan 19, 2022 at 1:22 AM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


What are the metrics that are being affected by this?

-Val

On Tue, Jan 18, 2022 at 3:31 AM Вячеслав Коптилин <
slava.kopti...@gmail.com>
wrote:


Hello Igniters,

IMHO, this is not a good idea to change the behavior of

serviceProxy()

depending on statistics (enabled/disabled). It seems counterintuitive

to

me.
Perhaps, we need to introduce a new method that should always return

a

proxy to the user service.

Thanks,
Slava.


вт, 28 дек. 2021 г. в 13:57, Pavel Pereslegin:


Hi!

Agree with Maxim.

It seems to me quite normal to return a proxy for a local instance

in

the case when the user has explicitly enabled statistics collection

in

the service settings. Those. by default, we do not change the

behavior

and if the collection of metrics is not needed, a local instance

will

be returned. And I also think the javadoc should be changed to

reflect

the new behavior.

So, I'm for 1 + 3.

вт, 28 дек. 2021 г. в 10:51, Maksim Timonin <

timoninma...@apache.org>:

Hi!

I agree that users shouldn't expect a non-proxy when invoking the
`IgniteServices#serviceProxy()` method. I think it's up to Ignite

to

return

a non-proxy instance here as possible optimisation. But users

have to

use

interfaces in any case. There is the `IgniteServices#service()`

method

for

explicit return of local instances.

With enabling of metrics we can break users that explicitly
use `#serviceProxy` (proxy!), and then explicitly cast it to an
implementation class. In this case such users will get a runtime

exception.

I think we can write a good javadoc for
`ServiceConfiguration#setEnableMetrics()`, it should mention that

it

works

only with proxy, and it doesn't collect metrics with non-proxy

usages

with

`IgniteService#service()`.

So, I propose to proceed with two solutions - 1 and 3: fix docs

for

`#serviceProxy()` and provide detailed javadocs
for `ServiceConfiguration#setEnableMetrics()`.

If some users will enable metrics (even with such docs!) and will

be

using

casting proxy(!) to an implementation, then they will get a

runtime


Re: Proxy (GridServiceProxy) for local services if required

2022-01-21 Thread Vladimir Steshin

    Maxim, hi

    I'd like to correct you slightly. Of course, I did benchmarks. But 
there was no doubt proxied invocation is slower. But is the job slower? 
As you can see, trivial operations `cache.getAndPut()` takes the same 
with proxy compared to the direct reference. Is there noted optimization 
at all? I doubt `service()` is usefull. Maybe for tasks like counters.


So, I believe we could deprecate getting service through `service()` and 
promote using `serviceProxy()`


On 21.01.2022 11:33, Maksim Timonin wrote:

Hi guys,


because it's still possible to get a local instance via the service()

method

I think that there are some cases why a user doesn't want to have a proxy:
1. the user actually cares about proxy performance. (Vladimir measured it
here [1] and found that proxy actually affects some simple cases of
services usage. But do we actually need measure such cases?)
2. the user wants to work with implementation classes over interfaces for
any reason. (Bad practice in any case, I think).

 From my point of view those cases are very rare. May I miss some cases
here? There was a commit [2] that provides this optimization in
`serviceProxy()`. But I don't see any motivation in the commit message,
also the ticket is GG private. Which case is it covered?

If there are no obvious cases then it looks like we can safely deprecate
the `service()` method. But if there are some, then we can remain the
`service()` method for backward compatibility for users who actually
understand what they are doing. With mention that it's impossible to
measure it.

[1] https://www.mail-archive.com/dev@ignite.apache.org/msg43793.html
[2] https://github.com/apache/ignite/commit/ef6b8aa3

On Thu, Jan 20, 2022 at 10:49 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


So the proposed change will not actually fix the issue with metrics,
because it's still possible to get a local instance via the service()
method. At the same time, the change removes an existing performance
optimization.

Let's figure out how to fix the actual problem. If the *only* way to have
metrics is to have a proxy, then this should be the only way to interact
with a service. In that case, we need to do something with the service()
method (deprecate it?). Or probably there are other ways to fix metrics?

-Val

On Thu, Jan 20, 2022 at 3:32 AM Vladimir Steshin 
wrote:


   Yes. Invocations from direct reference are not measured. This method
in the javadoc:


/* NOTE: Statistics are collected only with service proxies
obtaining by methods like/

/* {@link IgniteServices#serviceProxy(String, Class, boolean)} and won't
work for direct reference of local/

/* services which you can get by, for example, {@link
IgniteServices#service(String)}./


On 20.01.2022 00:20, Valentin Kulichenko wrote:

BTW, there is also the service() method that can only return an

instance

and never returns a proxy. Does it corrupt the metrics as well?

-Val

On Wed, Jan 19, 2022 at 1:09 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


Maxim,

The reason I'm asking is that I don't really understand how client

side

mechanics affect server side metrics (number of executions and their
durations). I feel that we might be fixing a wrong problem.

Could you elaborate on why we count metrics incorrectly when
the serviceProxy() returns an instance of a local service instead of

an

actual proxy?

-Val

On Tue, Jan 18, 2022 at 11:32 PM Maksim Timonin<

timoninma...@apache.org

wrote:


Hi, guys!


this is not a good idea to change the behavior of serviceProxy()

depending on statistics

I think that the patch doesn't change the behavior of

`serviceProxy()`.

This method promises a proxy and it actually returns it. The fact

that

`serviceProxy()` can return non-proxy objects is an internal Ignite
optimization, and users should not rely on this, there is a separate
method
`service()` for that.


What are the metrics that are being affected by this?

Only service metrics, that calculates duration of service execution.

Check

this ticket [1]

[1]https://issues.apache.org/jira/browse/IGNITE-12464


On Wed, Jan 19, 2022 at 1:22 AM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


What are the metrics that are being affected by this?

-Val

On Tue, Jan 18, 2022 at 3:31 AM Вячеслав Коптилин <
slava.kopti...@gmail.com>
wrote:


Hello Igniters,

IMHO, this is not a good idea to change the behavior of

serviceProxy()

depending on statistics (enabled/disabled). It seems

counterintuitive

to

me.
Perhaps, we need to introduce a new method that should always

return

a

proxy to the user service.

Thanks,
Slava.


вт, 28 дек. 2021 г. в 13:57, Pavel Pereslegin:


Hi!

Agree with Maxim.

It seems to me quite normal to return a proxy for a local instance

in

the case when the user has explicitly enabled statistics

collection

in

the service settings. Those. by default, we do not change the

behavior

and if the collecti

Re: Proxy (GridServiceProxy) for local services if required

2022-01-21 Thread Vladimir Steshin

    Valentin, hello again.

    Previous version of the ticket suggested deprecation of 
/IgniteServices#service()/. Reasons:


1) Doubt in utility of faster invocation. Proxy call local reference if 
possible. Technically it is slower of course. But how does it affect 
real cases, not empty methods, real jobs? The benchmarks are in the old 
discussion [1].


2) With #1, a service is something that resides in the grid and might 
not exist on local node at all. Why do we focus on small optimization 
for certain case of 'service is available locally'.


3) We need and already have the proxy.


[1] https://www.mail-archive.com/dev@ignite.apache.org/msg43793.html

On 20.01.2022 00:20, Valentin Kulichenko wrote:

BTW, there is also the service() method that can only return an instance
and never returns a proxy. Does it corrupt the metrics as well?

-Val

On Wed, Jan 19, 2022 at 1:09 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


Maxim,

The reason I'm asking is that I don't really understand how client side
mechanics affect server side metrics (number of executions and their
durations). I feel that we might be fixing a wrong problem.

Could you elaborate on why we count metrics incorrectly when
the serviceProxy() returns an instance of a local service instead of an
actual proxy?

-Val

On Tue, Jan 18, 2022 at 11:32 PM Maksim Timonin
wrote:


Hi, guys!


this is not a good idea to change the behavior of serviceProxy()

depending on statistics

I think that the patch doesn't change the behavior of `serviceProxy()`.
This method promises a proxy and it actually returns it. The fact that
`serviceProxy()` can return non-proxy objects is an internal Ignite
optimization, and users should not rely on this, there is a separate
method
`service()` for that.


What are the metrics that are being affected by this?

Only service metrics, that calculates duration of service execution. Check
this ticket [1]

[1]https://issues.apache.org/jira/browse/IGNITE-12464


On Wed, Jan 19, 2022 at 1:22 AM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


What are the metrics that are being affected by this?

-Val

On Tue, Jan 18, 2022 at 3:31 AM Вячеслав Коптилин <
slava.kopti...@gmail.com>
wrote:


Hello Igniters,

IMHO, this is not a good idea to change the behavior of serviceProxy()
depending on statistics (enabled/disabled). It seems counterintuitive

to

me.
Perhaps, we need to introduce a new method that should always return a
proxy to the user service.

Thanks,
Slava.


вт, 28 дек. 2021 г. в 13:57, Pavel Pereslegin:


Hi!

Agree with Maxim.

It seems to me quite normal to return a proxy for a local instance

in

the case when the user has explicitly enabled statistics collection

in

the service settings. Those. by default, we do not change the

behavior

and if the collection of metrics is not needed, a local instance

will

be returned. And I also think the javadoc should be changed to

reflect

the new behavior.

So, I'm for 1 + 3.

вт, 28 дек. 2021 г. в 10:51, Maksim Timonin <

timoninma...@apache.org>:

Hi!

I agree that users shouldn't expect a non-proxy when invoking the
`IgniteServices#serviceProxy()` method. I think it's up to Ignite

to

return

a non-proxy instance here as possible optimisation. But users

have to

use

interfaces in any case. There is the `IgniteServices#service()`

method

for

explicit return of local instances.

With enabling of metrics we can break users that explicitly
use `#serviceProxy` (proxy!), and then explicitly cast it to an
implementation class. In this case such users will get a runtime

exception.

I think we can write a good javadoc for
`ServiceConfiguration#setEnableMetrics()`, it should mention that

it

works

only with proxy, and it doesn't collect metrics with non-proxy

usages

with

`IgniteService#service()`.

So, I propose to proceed with two solutions - 1 and 3: fix docs

for

`#serviceProxy()` and provide detailed javadocs
for `ServiceConfiguration#setEnableMetrics()`.

If some users will enable metrics (even with such docs!) and will

be

using

casting proxy(!) to an implementation, then they will get a

runtime

exception. But I believe that it is an obvious failure, and it

should

be

fixed on the user side.





On Mon, Dec 27, 2021 at 10:26 PM Vladimir Steshin <

vlads...@gmail.com>

wrote:


Hi, Igniters.


I'd like to suggest modifying

`/IgniteServices.serviceProxy(String

name,

Class svcItf, boolean sticky)/` so that it may return

proxy

even for local service. Motivation: service metrics [1]. To

measure

method call we need to wrap service somehow. Also, the method

name

says

`proxy`. For local one we return direct instance. Misleading.


Solutions:

1) Let’s return a proxy (`/GridServiceProxy/`) once we need.

Let’s

change the javadoc to say `@return Proxy over service’. Simply

works.

Pros: invocation like `/MyServiceImpl svc =

serviceProxy(«myservice»,

MyService.class)`/ would fail. Wierd

[VOTE] Service proxy for local service by 'IgniteServices#serviceProxy()'

2022-01-20 Thread Vladimir Steshin

    Hi, Igniters.


    Should we return a proxy even for local services by 
'IgniteServices#serviceProxy()'?


*I vote +1*, let's return proxy.


    This question has recently raised again. Before the service 
metrics, we returned direct instance for local services. With service 
metrics enabled, we return proxy. With the metrics disabled, we return 
direct reference.


Pros:
    1) Would match the method name - 'proxy'. Looks reasonable.
    2) Giving proxy every time, we won't change behavior depending on 
user setting like 'ServiceConfiguration#setStatisticsEnabled()'
    3) There is a dedicated method for direct reference of local 
service: 'IgniteServices#service()'


Cons:
    4) Will break declarations like `MyServiceImpl svc = 
serviceProxy(«myservice», IMyService.class);` without any change.

Re: Proxy (GridServiceProxy) for local services if required

2022-01-20 Thread Vladimir Steshin
 Yes. Invocations from direct reference are not measured. This method 
in the javadoc:



/* NOTE: Statistics are collected only with service proxies 
obtaining by methods like/


/* {@link IgniteServices#serviceProxy(String, Class, boolean)} and won't 
work for direct reference of local/


/* services which you can get by, for example, {@link 
IgniteServices#service(String)}./



On 20.01.2022 00:20, Valentin Kulichenko wrote:

BTW, there is also the service() method that can only return an instance
and never returns a proxy. Does it corrupt the metrics as well?

-Val

On Wed, Jan 19, 2022 at 1:09 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


Maxim,

The reason I'm asking is that I don't really understand how client side
mechanics affect server side metrics (number of executions and their
durations). I feel that we might be fixing a wrong problem.

Could you elaborate on why we count metrics incorrectly when
the serviceProxy() returns an instance of a local service instead of an
actual proxy?

-Val

On Tue, Jan 18, 2022 at 11:32 PM Maksim Timonin
wrote:


Hi, guys!


this is not a good idea to change the behavior of serviceProxy()

depending on statistics

I think that the patch doesn't change the behavior of `serviceProxy()`.
This method promises a proxy and it actually returns it. The fact that
`serviceProxy()` can return non-proxy objects is an internal Ignite
optimization, and users should not rely on this, there is a separate
method
`service()` for that.


What are the metrics that are being affected by this?

Only service metrics, that calculates duration of service execution. Check
this ticket [1]

[1]https://issues.apache.org/jira/browse/IGNITE-12464


On Wed, Jan 19, 2022 at 1:22 AM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:


What are the metrics that are being affected by this?

-Val

On Tue, Jan 18, 2022 at 3:31 AM Вячеслав Коптилин <
slava.kopti...@gmail.com>
wrote:


Hello Igniters,

IMHO, this is not a good idea to change the behavior of serviceProxy()
depending on statistics (enabled/disabled). It seems counterintuitive

to

me.
Perhaps, we need to introduce a new method that should always return a
proxy to the user service.

Thanks,
Slava.


вт, 28 дек. 2021 г. в 13:57, Pavel Pereslegin:


Hi!

Agree with Maxim.

It seems to me quite normal to return a proxy for a local instance

in

the case when the user has explicitly enabled statistics collection

in

the service settings. Those. by default, we do not change the

behavior

and if the collection of metrics is not needed, a local instance

will

be returned. And I also think the javadoc should be changed to

reflect

the new behavior.

So, I'm for 1 + 3.

вт, 28 дек. 2021 г. в 10:51, Maksim Timonin <

timoninma...@apache.org>:

Hi!

I agree that users shouldn't expect a non-proxy when invoking the
`IgniteServices#serviceProxy()` method. I think it's up to Ignite

to

return

a non-proxy instance here as possible optimisation. But users

have to

use

interfaces in any case. There is the `IgniteServices#service()`

method

for

explicit return of local instances.

With enabling of metrics we can break users that explicitly
use `#serviceProxy` (proxy!), and then explicitly cast it to an
implementation class. In this case such users will get a runtime

exception.

I think we can write a good javadoc for
`ServiceConfiguration#setEnableMetrics()`, it should mention that

it

works

only with proxy, and it doesn't collect metrics with non-proxy

usages

with

`IgniteService#service()`.

So, I propose to proceed with two solutions - 1 and 3: fix docs

for

`#serviceProxy()` and provide detailed javadocs
for `ServiceConfiguration#setEnableMetrics()`.

If some users will enable metrics (even with such docs!) and will

be

using

casting proxy(!) to an implementation, then they will get a

runtime

exception. But I believe that it is an obvious failure, and it

should

be

fixed on the user side.





On Mon, Dec 27, 2021 at 10:26 PM Vladimir Steshin <

vlads...@gmail.com>

wrote:


Hi, Igniters.


I'd like to suggest modifying

`/IgniteServices.serviceProxy(String

name,

Class svcItf, boolean sticky)/` so that it may return

proxy

even for local service. Motivation: service metrics [1]. To

measure

method call we need to wrap service somehow. Also, the method

name

says

`proxy`. For local one we return direct instance. Misleading.


Solutions:

1) Let’s return a proxy (`/GridServiceProxy/`) once we need.

Let’s

change the javadoc to say `@return Proxy over service’. Simply

works.

Pros: invocation like `/MyServiceImpl svc =

serviceProxy(«myservice»,

MyService.class)`/ would fail. Wierd usage to me. But possible.

2) Introduce a new method with forced-proxy flag
like`IgniteSerives.serviceProxy(…, boolead focedProxy)`. And

add a

warning to other service-obtain methods like: «`/You enabled

service

metrics but it doesn’t work for local service instanse. Use

forced-

Re: Proxy (GridServiceProxy) for local services if required

2022-01-20 Thread Vladimir Steshin

    Valentin, hi.

"Could you elaborate on why we count metrics incorrectly when
theserviceProxy() returns an instance of a local service instead of an
actual proxy?"

Local instance executes direct calls without any wrapping and doesn't 
measure itself. That's why we bring proxy by 'serviceProxy()' even for 
local services with enabled statistics.


On 20.01.2022 00:09, Valentin Kulichenko wrote:

Could you elaborate on why we count metrics incorrectly when
the serviceProxy() returns an instance of a local service instead of an
actual proxy?

Re: Proxy (GridServiceProxy) for local services if required

2022-01-20 Thread Vladimir Steshin

    Slava, hi.

There are many 'give-service' methods. I doubt we need one more.

User should consider exact type of the service handle, instance or 
proxy. It's up to Ignite to decide.


There is a service interface for user. But yes, we might just return 
proxy every time.



On 18.01.2022 14:30, Вячеслав Коптилин wrote:

Hello Igniters,

IMHO, this is not a good idea to change the behavior of serviceProxy()
depending on statistics (enabled/disabled). It seems counterintuitive to me.
Perhaps, we need to introduce a new method that should always return a
proxy to the user service.

Thanks,
Slava.


вт, 28 дек. 2021 г. в 13:57, Pavel Pereslegin :


Hi!

Agree with Maxim.

It seems to me quite normal to return a proxy for a local instance in
the case when the user has explicitly enabled statistics collection in
the service settings. Those. by default, we do not change the behavior
and if the collection of metrics is not needed, a local instance will
be returned. And I also think the javadoc should be changed to reflect
the new behavior.

So, I'm for 1 + 3.

вт, 28 дек. 2021 г. в 10:51, Maksim Timonin :

Hi!

I agree that users shouldn't expect a non-proxy when invoking the
`IgniteServices#serviceProxy()` method. I think it's up to Ignite to

return

a non-proxy instance here as possible optimisation. But users have to use
interfaces in any case. There is the `IgniteServices#service()` method

for

explicit return of local instances.

With enabling of metrics we can break users that explicitly
use `#serviceProxy` (proxy!), and then explicitly cast it to an
implementation class. In this case such users will get a runtime

exception.

I think we can write a good javadoc for
`ServiceConfiguration#setEnableMetrics()`, it should mention that it

works

only with proxy, and it doesn't collect metrics with non-proxy usages

with

`IgniteService#service()`.

So, I propose to proceed with two solutions - 1 and 3: fix docs for
`#serviceProxy()` and provide detailed javadocs
for `ServiceConfiguration#setEnableMetrics()`.

If some users will enable metrics (even with such docs!) and will be

using

casting proxy(!) to an implementation, then they will get a runtime
exception. But I believe that it is an obvious failure, and it should be
fixed on the user side.





On Mon, Dec 27, 2021 at 10:26 PM Vladimir Steshin 
wrote:


Hi, Igniters.


I'd like to suggest modifying `/IgniteServices.serviceProxy(String

name,

Class svcItf, boolean sticky)/` so that it may return proxy
even for local service. Motivation: service metrics [1]. To measure
method call we need to wrap service somehow. Also, the method name says
`proxy`. For local one we return direct instance. Misleading.


Solutions:

1) Let’s return a proxy (`/GridServiceProxy/`) once we need. Let’s
change the javadoc to say `@return Proxy over service’. Simply works.

Pros: invocation like `/MyServiceImpl svc = serviceProxy(«myservice»,
MyService.class)`/ would fail. Wierd usage to me. But possible.

2) Introduce a new method with forced-proxy flag
like`IgniteSerives.serviceProxy(…, boolead focedProxy)`. And add a
warning to other service-obtain methods like: «`/You enabled service
metrics but it doesn’t work for local service instanse. Use

forced-proxy

serviceProxy(…, boolean forcedProxy)/` Pros: we’ve already have about
5-6 methods to get services in `/IgniteServices/`. I doubt we need one
more.

3) Fix the documentation so that it tells the service metrics would

work

only with proxy. Pros: service metrics just won’t work for local
services. Suddenly.


My vote is for #1: let's use a proxy. WDYT?


[1] https://issues.apache.org/jira/browse/IGNITE-12464




Re: [VOTE] @Nullable/@NotNull annotation usage in Ignite 3

2022-01-13 Thread Vladimir Steshin

+1 for option 2

13.01.2022 13:25, Alexander Polovtcev пишет:

Dear Igniters,

In this thread
 we've
discussed possible approaches to using null-related annotations. As the
result, the following approaches were proposed:

1. Use both @Nullable and @NotNull annotations everywhere;
2. Use only @Nullable;
3. Use only @NotNull;
4. Do not use* @*Nullable nor @NotNull.

I would like to propose a vote: please post the corresponding number of the
option you like. The voting will commence on Thursday, January 20 at 11:00
UTC.



Re: [VOTE] Release Apache Ignite 2.12.0 RC2

2022-01-10 Thread Vladimir Steshin

+1

10.01.2022 15:52, Nikita Amelchev пишет:

Dear Community,

The release candidate (2.12.0-rc2) is ready.

I have uploaded a release candidate to:
https://dist.apache.org/repos/dist/dev/ignite/2.12.0-rc2/
https://dist.apache.org/repos/dist/dev/ignite/packages_2.12.0-rc2/

The following staging can be used for testing:
https://repository.apache.org/content/repositories/orgapacheignite-1539

Tag name is 2.12.0-rc2:
https://gitbox.apache.org/repos/asf?p=ignite.git;a=commit;h=refs/tags/2.12.0-rc2

RELEASE_NOTES:
https://gitbox.apache.org/repos/asf?p=ignite.git;a=blob_plain;f=RELEASE_NOTES.txt;hb=ignite-2.12

Complete list of resolved issues:
https://issues.apache.org/jira/issues/?jql=(project%20%3D%20%27Ignite%27%20AND%20fixVersion%20is%20not%20empty%20AND%20fixVersion%20in%20(%272.12%27))%20and%20status%20in%20(%27CLOSED%27%2C%20%27RESOLVED%27)%20ORDER%20BY%20priority

DEVNOTES:
https://gitbox.apache.org/repos/asf?p=ignite.git;a=blob_plain;f=DEVNOTES.txt;hb=ignite-2.12

Additional checks have been performed (available for users included
into the release group on TeamCity).

TC [Check RC: Licenses, compile, chksum]
https://ci2.ignite.apache.org/buildConfiguration/ignite2_Release_ApacheIgniteReleaseJava8_PrepareVote4CheckRcLicensesChecksum/6266150?showRootCauses=false=true

TC [2] Compare w/ Previous Release
https://ci2.ignite.apache.org/buildConfiguration/ignite2_Release_ApacheIgniteReleaseJava8_IgniteRelease72CheckFileConsistency/6266148?showRootCauses=false=true

The vote is formal, see voting guidelines
https://www.apache.org/foundation/voting.html

+1 - to accept Apache Ignite 2.12.0-rc2
0 - don't care either way
-1 - DO NOT accept Apache Ignite Ignite 2.12.0-rc2 (explain why)

See notes on how to verify release here
https://www.apache.org/info/verification.html
and
https://cwiki.apache.org/confluence/display/IGNITE/Release+Process#ReleaseProcess-P5.VotingonReleaseandReleaseVerification

This vote will be open until Thu Jan 13, 2022, 16:00 UTC. Please,
write me down the thread if you need additional time to check the
release.
https://www.timeanddate.com/countdown/vote?iso=20220113T16=0=VOTE+on+the+Apache+Ignite+Release+2.12.0+RC2=sanserif



Proxy (GridServiceProxy) for local services if required

2021-12-27 Thread Vladimir Steshin

Hi, Igniters.


I'd like to suggest modifying `/IgniteServices.serviceProxy(String name, 
Class svcItf, boolean sticky)/` so that it may return proxy 
even for local service. Motivation: service metrics [1]. To measure 
method call we need to wrap service somehow. Also, the method name says 
`proxy`. For local one we return direct instance. Misleading.



Solutions:

1) Let’s return a proxy (`/GridServiceProxy/`) once we need. Let’s 
change the javadoc to say `@return Proxy over service’. Simply works.


Pros: invocation like `/MyServiceImpl svc = serviceProxy(«myservice», 
MyService.class)`/ would fail. Wierd usage to me. But possible.


2) Introduce a new method with forced-proxy flag 
like`IgniteSerives.serviceProxy(…, boolead focedProxy)`. And add a 
warning to other service-obtain methods like: «`/You enabled service 
metrics but it doesn’t work for local service instanse. Use forced-proxy 
serviceProxy(…, boolean forcedProxy)/` Pros: we’ve already have about 
5-6 methods to get services in `/IgniteServices/`. I doubt we need one more.


3) Fix the documentation so that it tells the service metrics would work 
only with proxy. Pros: service metrics just won’t work for local 
services. Suddenly.



My vote is for #1: let's use a proxy. WDYT?


[1] https://issues.apache.org/jira/browse/IGNITE-12464



Re: Defrag?

2021-07-06 Thread Vladimir Steshin

            Hi, Ryan.

Current implementation doesn’t share free pages among cache partitions 
when persistence is enabled. This happens because page is bound to 
certain partition file and cannot be ‘moved’ and reused in another 
partition. And there is no share of free pages among caches. Each has 
its own. Free pages are never deleted. Under intensive CRUD operations 
this can lead to allocation and keeping unused extra pages. Despite page 
is empty, it has its reserved position and space in partition file.


Only the way is to re-create cache or stop/start node with clearing its 
persistent storage so that it would be re-filled at start by PME process.


Since 2.10 there is maintenance mode for node where defragmentation task 
goes with. Not documented yet as I can see. I could be started from 
command line or through JMX. But node requires restart in maintenance mode.



03.07.2021 17:10, Ryan Trollip пишет:

A rebuild of the cash reduced the size of the data dramatically.
Apparently ignite is not doing anything to rebalance or clean up pages.
I can't see how anyone using ignite native seriously will not have this
problem.

I wonder if this impacts the indexing also? And could be part of the lousy
performance we are having with ignite native.


On Wed, Jun 30, 2021, 8:27 AM Ryan Trollip  wrote:


Hey Ilya

It's the data tables that keep growing not the WAL.
We will try to rebuild the cache and see if that fixes the issue

On Mon, Jun 28, 2021 at 8:46 AM Ilya Kasnacheev 
wrote:


Hello!

Is it WAL (wal/) that is growing or checkpoint space (db/)? If latter,
any specific caches that are growing unbound?

If letter, you can try creating a new cache, moving the relevant data to
this new cache, switch to using it, and then drop the old cache - should
reclaim the space.

Regards,
--
Ilya Kasnacheev


пн, 28 июн. 2021 г. в 17:34, Ryan Trollip :


Is this why the native disk storage just keeps growing and does not
reduce after we delete from ignite using SQL?
We are up to 80GB on disk now on some instances. We implemented a custom
archiving feature to move older data out of ignite cache to a PostgresSQL
database but when we delete that data from ignite instance, the disk data
size ignite is using stays the same, and then keeps growing, and
growing

On Thu, Jun 24, 2021 at 7:10 PM Denis Magda  wrote:


Ignite fellows,

I remember some of us worked on the persistence defragmentation
features. Has it been merged?

@Valentin Kulichenko  probably you know
the latest state.

-
Denis

On Thu, Jun 24, 2021 at 11:59 AM Ilya Kasnacheev <
ilya.kasnach...@gmail.com> wrote:


Hello!

You can probably drop the entire cache and then re-populate it via
loadCache(), etc.

Regards,
--
Ilya Kasnacheev


ср, 23 июн. 2021 г. в 21:47, Ryan Trollip :


Thanks, Ilya, we may have to consider moving back to non-native
storage and caching more selectively as the performance degrades when there
is a lot of write/delete activity or tables with large amounts of rows.
This is with SQL with indexes and the use of query plans etc.

Is there any easy way to rebuild the entire native database after
hours? e.g. with a batch run on the weeknds?

On Wed, Jun 23, 2021 at 7:39 AM Ilya Kasnacheev <
ilya.kasnach...@gmail.com> wrote:


Hello!

I don't think there's anything ready to use, but "killing
performance" from fragmentation is also not something reported too often.

Regards,
--
Ilya Kasnacheev


ср, 16 июн. 2021 г. в 04:39, Ryan Trollip 
:
We see continual very large growth to data with ignite native. We
have a very chatty use case that's creating and deleting stuff often. The
data on disk just keeps growing at an explosive rate. So much so we ported
this to a DB to see the difference and the DB is much smaller. I was
searching to see if someone has the same issue. This is also killing
performance.

Founds this:

https://cwiki.apache.org/confluence/display/IGNITE/IEP-47%3A+Native+persistence+defragmentation

Apparently, there is no auto-rebalancing of pages? or cleanup of
pages?

Has anyone implemented a workaround to rebuild the cache and
indexes say on a weekly basis to get it to behave reasonably?

Thanks



[jira] [Created] (IGNITE-14452) Add cehcking of the iptables settings applied.

2021-03-31 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14452:
-

 Summary: Add cehcking of the iptables settings applied.
 Key: IGNITE-14452
 URL: https://issues.apache.org/jira/browse/IGNITE-14452
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Sometimes, we lack settings of iptables for unknows reason. Let's monitor this 
issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14437) Adjust test params: exclude input net failures with disabled connRecovery

2021-03-29 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14437:
-

 Summary: Adjust test params: exclude input net failures with 
disabled connRecovery
 Key: IGNITE-14437
 URL: https://issues.apache.org/jira/browse/IGNITE-14437
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14378) Remove delay from node ping.

2021-03-22 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14378:
-

 Summary: Remove delay from node ping.
 Key: IGNITE-14378
 URL: https://issues.apache.org/jira/browse/IGNITE-14378
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Remove U.sleep(200) from the node ping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14377) Enchance log of node ping failure.

2021-03-22 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14377:
-

 Summary: Enchance log of node ping failure.
 Key: IGNITE-14377
 URL: https://issues.apache.org/jira/browse/IGNITE-14377
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Log of unsuccessful ping during the joining is insufficient. No failure reason 
is logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14096) Try to bring randomization in node waiting with TcpDiscoverySpi.reconnectDelay.

2021-01-28 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14096:
-

 Summary: Try to bring randomization in node waiting with 
TcpDiscoverySpi.reconnectDelay.
 Key: IGNITE-14096
 URL: https://issues.apache.org/jira/browse/IGNITE-14096
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


To speed up cluster start slyghtly, try to bring randomization in node waiting 
with TcpDiscoverySpi.reconnectDelay. Check with the ducktape integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14095) Try fasten cluster start in the ducktests with decreasing 'spi.reconnectDelay'

2021-01-28 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14095:
-

 Summary: Try fasten cluster start in the ducktests with decreasing 
'spi.reconnectDelay'
 Key: IGNITE-14095
 URL: https://issues.apache.org/jira/browse/IGNITE-14095
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14068) Infinite node persistance in the ring while outcoming connections are lost

2021-01-26 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14068:
-

 Summary: Infinite node persistance in the ring while outcoming 
connections are lost
 Key: IGNITE-14068
 URL: https://issues.apache.org/jira/browse/IGNITE-14068
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14054) Improve discovery ducktest: add partial network drop.

2021-01-25 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14054:
-

 Summary: Improve discovery ducktest: add partial network drop.
 Key: IGNITE-14054
 URL: https://issues.apache.org/jira/browse/IGNITE-14054
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14053) Remove status check message at all.

2021-01-25 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14053:
-

 Summary: Remove status check message at all.
 Key: IGNITE-14053
 URL: https://issues.apache.org/jira/browse/IGNITE-14053
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14038) Separate JVM settings in the ducktests.

2021-01-22 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14038:
-

 Summary: Separate JVM settings in the ducktests.
 Key: IGNITE-14038
 URL: https://issues.apache.org/jira/browse/IGNITE-14038
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14037) Separate JVM settings in the ducktests.

2021-01-22 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14037:
-

 Summary: Separate JVM settings in the ducktests.
 Key: IGNITE-14037
 URL: https://issues.apache.org/jira/browse/IGNITE-14037
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13980) Remove duplicated ping: status check message.

2021-01-12 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13980:
-

 Summary: Remove duplicated ping: status check message.
 Key: IGNITE-13980
 URL: https://issues.apache.org/jira/browse/IGNITE-13980
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSSION] Java 11 for Ignite 3.0 development

2020-12-10 Thread Vladimir Steshin

My + for java 11

On 08.12.2020 15:00, Nikolay Izhikov wrote:

+1 for using java 11.


8 дек. 2020 г., в 13:18, ткаленко кирилл  написал(а):

+1

08.12.2020, 12:48, "Philipp Masharov" :

Hello!

Andrey's arguments are solid.

On Tue, Dec 8, 2020 at 12:23 PM Pavel Tupitsyn  wrote:


  +1, Java 11 seems to be the only right choice at the moment.

  On Tue, Dec 8, 2020 at 12:08 PM Alexey Zinoviev 
  wrote:

  > I totally support Java 11 for development. It's time to go forward
  >
  > вт, 8 дек. 2020 г. в 11:40, Andrey Gura :
  >
  > > Igniters,
  > >
  > > We already had some discussion about using modern Java versions for
  > > Ignite 3.0 development [1] but we still don't have consensus.
  > > As I see from this discussion the strongest argument for Java 11 is
  > > the fact that Java 11 is the latest LTS release which will have
  > > premier support until September 2023. So I don't see any reason for
  > > preferring any other version of Java at this moment.
  > >
  > > The purpose of this thread is to gather opinions about using Java 11
  > > in the Ignite 3.0 project and, eventually, reach a consensus on this.
  > >
  > > I want to share my several arguments in favor of abandoning Java 8 and
  > > preferring Java 11:
  > >
  > > * Java 8 has gone through the End of Public Updates process for legacy
  > > releases. So it doesn't make sense to start new development on Java 8.
  > >
  > > * Java 9+ brings Jigsaw modularization which allows us to have more
  > > fine-grained structure of Ignite modules and APIs in the future.
  > >
  > > * Ignite actively uses Unsafe functionality which, firstly, isn't
  > > public, and secondly, leads to problems with running Ignite under Java
  > > 9+ (modularization which requires dozens of command-line options in
  > > order to forcibly export corresponding packages) and GraalVM. Such a
  > > situation could be described as bad user experience and we should fix
  > > it. Var handles [2] could be used for solving described problems.
  > >
  > > * Java 9+ introduces Flight Recorder API [3] which could be used in
  > > the Ignite project for lightweight profiling of internal processes.
  > >
  > > Please, share your opinions, objections and ideas about this topic. I
  > > hope we will not have serious disagreements and the consensus will be
  > > reached quickly.
  > >
  > >
  > > 1.
  > >
  >
  
http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSS-Ignite-3-0-development-approach-tp49922p50295.html
  > > 2.
  > >
  >
  https://docs.oracle.com/javase/9/docs/api/java/lang/invoke/VarHandle.html
  > > 3.
  > >
  >
  
https://docs.oracle.com/en/java/javase/11/docs/api/jdk.jfr/jdk/jfr/FlightRecorder.html
  > >
  >


[jira] [Created] (IGNITE-13835) Improve discovery ducktape test to research small timeouts and behavior on large cluster.

2020-12-10 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13835:
-

 Summary: Improve discovery ducktape test to research small 
timeouts and behavior on large cluster.
 Key: IGNITE-13835
 URL: https://issues.apache.org/jira/browse/IGNITE-13835
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Improve discovery ducktape test to research the cluster behavior with bigger 
node number and smaller timeouts. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13705) Fix middle node failed when failed next node and previous.

2020-11-13 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13705:
-

 Summary: Fix middle node failed when failed next node and previous.
 Key: IGNITE-13705
 URL: https://issues.apache.org/jira/browse/IGNITE-13705
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


The discovery ducktape test has detected failure of third node in the middle of 
2 simulateously failed nodes. First research shows the trouble in backward 
connection checking: next node has checked itself:

[2020-11-13 14:50:44,463][INFO ][tcp-disco-sock-reader-[47cc6f70 
10.53.125.224:35381]-#7-#79][org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi1]
 Connection check done 
[liveAddr=tkles-pprb00188.vm.esrt.cloud.sbrf.ru/10.53.125.160:47500, 
previousNode=TcpDiscoveryNode [id=8331a61c-ea93-4bf5-bc8c-b24c032068d0, 
consistentId=tkles-pprb00188.vm.esrt.cloud.sbrf.ru, addrs=ArrayList 
[10.53.125.160], sockAddrs=HashSet 
[tkles-pprb00188.vm.esrt.cloud.sbrf.ru/10.53.125.160:47500], discPort=47500, 
order=1, intOrder=1, lastExchangeTime=1605268203598, loc=false, 
ver=2.10.0#20201113-sha1:, isClient=false], 
addressesToCheck=[tkles-pprb00188.vm.esrt.cloud.sbrf.ru/10.53.125.160:47500], 
connectingNodeId=47cc6f70-9fe4-437d-b183-826f2687aac8]





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13704) Try failuredetectionTimeout==500 in ducktape integration test.

2020-11-13 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13704:
-

 Summary: Try failuredetectionTimeout==500 in ducktape integration 
test.
 Key: IGNITE-13704
 URL: https://issues.apache.org/jira/browse/IGNITE-13704
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Try failuredetectionTimeout==500 in ducktape integration test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13702) Fix description of soLibger for DiscoveryTcpSpi.

2020-11-12 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13702:
-

 Summary: Fix description of soLibger for DiscoveryTcpSpi.
 Key: IGNITE-13702
 URL: https://issues.apache.org/jira/browse/IGNITE-13702
 Project: Ignite
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.10
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin
 Fix For: 2.10


Fix description of soLibger for DiscoveryTcpSpi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13695) Move javadoc of affection of several addresses on failure detection.

2020-11-11 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13695:
-

 Summary: Move javadoc of affection of several addresses on failure 
detection.
 Key: IGNITE-13695
 URL: https://issues.apache.org/jira/browse/IGNITE-13695
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Current javadoc of affection several node addresses of failure detection is 
located under `TcpDiscoverySpi.setIpFinder()`. Correct place is by 
`TcpDiscoverySpi.setLocalAddress()`.
Perhaps, the test might be slightly changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13663) Represent in the documenttion affection of several node addresses on failure detection v2.

2020-11-03 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13663:
-

 Summary: Represent in the documenttion affection of several node 
addresses on failure detection v2.
 Key: IGNITE-13663
 URL: https://issues.apache.org/jira/browse/IGNITE-13663
 Project: Ignite
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.9
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin
 Fix For: 2.10






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13662) Discribe soLinger setting in TCP Discovery and SSL issues.

2020-11-03 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13662:
-

 Summary: Discribe soLinger setting in TCP Discovery and SSL issues.
 Key: IGNITE-13662
 URL: https://issues.apache.org/jira/browse/IGNITE-13662
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Discribe soLinger setting in TCP Discovery and SSL issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13646) Discovery ducktape test might have setting for socket linger.

2020-10-30 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13646:
-

 Summary: Discovery ducktape test might have setting for socket 
linger.
 Key: IGNITE-13646
 URL: https://issues.apache.org/jira/browse/IGNITE-13646
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Since IGNITE-13643, discovery ducktape test might have additional setting for 
socket linger. This could unveil new issues with the linger and start fixing or 
redeeming tcp discovery settings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13645) Discovery ducktape test should detect failed nodes by asking the cluster.

2020-10-30 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13645:
-

 Summary: Discovery ducktape test should detect failed nodes by 
asking the cluster.
 Key: IGNITE-13645
 URL: https://issues.apache.org/jira/browse/IGNITE-13645
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Discovery ducktape test should measure detection time of failed nodes by asking 
whole rest of the cluster. Currently, we measure by asking only one watching 
node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13644) Close socket bravely.

2020-10-29 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13644:
-

 Summary: Close socket bravely.
 Key: IGNITE-13644
 URL: https://issues.apache.org/jira/browse/IGNITE-13644
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


We should not to wait for socket closing once we finisshed logical connection 
and data exchange. This can violate configured timeouts and detection 
guaranties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13643) Fix long closing of the socker in ServerImpl (TcpDiscoverySpi)

2020-10-29 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13643:
-

 Summary: Fix long closing of the socker in ServerImpl 
(TcpDiscoverySpi)
 Key: IGNITE-13643
 URL: https://issues.apache.org/jira/browse/IGNITE-13643
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Current IgniteUtils.closeQuiet(@Nullable Socket sock) takes about 5sec to close 
socket. Probably it is default soTimeout. This violates node detection failure. 
Despite we set failureDetectionTiemout == 1000, node failure is detected within 
6.5 secs in average. Logging shows delay on socket closing in 
IgniteUtils.closeQuiet(@Nullable Socket sock).

Suggestion: use forced closing, set soLinger=0, do now wait for rest of the 
socket IO. We close socket in TcpDiscoverySpi when we already waited for target 
timeouts and consider connection is lost or invalid. We do not need to wait for 
any traffic on the socket any more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13641) More logs for debugging DiscoveryTcpSpi

2020-10-29 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13641:
-

 Summary: More logs for debugging DiscoveryTcpSpi
 Key: IGNITE-13641
 URL: https://issues.apache.org/jira/browse/IGNITE-13641
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Logs in DiscoveryTcp (ServerImpl) are insufficient. We do not see actual passed 
timeouts in sockets. It's difficult to realise why the timeouts, awaits 
happened are what they are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13638) Bring log config to ducktape tests

2020-10-28 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13638:
-

 Summary: Bring log config to ducktape tests
 Key: IGNITE-13638
 URL: https://issues.apache.org/jira/browse/IGNITE-13638
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13625) Make network timeout rely on failureDetectionTimeout in TcpDiscovery

2020-10-26 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13625:
-

 Summary: Make network timeout rely on failureDetectionTimeout in 
TcpDiscovery
 Key: IGNITE-13625
 URL: https://issues.apache.org/jira/browse/IGNITE-13625
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13620) Bind ignite node to 1 address in the ducktests

2020-10-23 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13620:
-

 Summary: Bind ignite node to 1 address in the ducktests
 Key: IGNITE-13620
 URL: https://issues.apache.org/jira/browse/IGNITE-13620
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13603) TcpDiscoverySpi seems do not drop network recovery state and it's timer.

2020-10-21 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13603:
-

 Summary: TcpDiscoverySpi seems do not drop network recovery state 
and it's timer.
 Key: IGNITE-13603
 URL: https://issues.apache.org/jira/browse/IGNITE-13603
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


ServerImpl keeps sndState (CrossRingMessageSendState) in its message send 
cycle. Once created with a failure recovery timer, it is not cleared or 
refreshed any more. This may issue instant timeout on next send failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13282) Fix TcpDiscoveryCoordinatorFailureTest.testClusterFailedNewCoordinatorInitialized()

2020-07-21 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13282:
-

 Summary: Fix 
TcpDiscoveryCoordinatorFailureTest.testClusterFailedNewCoordinatorInitialized()
 Key: IGNITE-13282
 URL: https://issues.apache.org/jira/browse/IGNITE-13282
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13208) Refactoring of IgniteSpiOperationTimeoutHelper

2020-07-02 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13208:
-

 Summary: Refactoring of IgniteSpiOperationTimeoutHelper
 Key: IGNITE-13208
 URL: https://issues.apache.org/jira/browse/IGNITE-13208
 Project: Ignite
  Issue Type: Task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


IgniteSpiOperationTimeoutHelper has many timeout fields. It looks like to get 
simplified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13206) Represent in the doc affection of several node addresses on failure detection.

2020-07-02 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13206:
-

 Summary: Represent in the doc affection of several node addresses 
on failure detection.
 Key: IGNITE-13206
 URL: https://issues.apache.org/jira/browse/IGNITE-13206
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13205) Represent in logs, javadoc affection of several node addresses on failure detection.

2020-07-02 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13205:
-

 Summary: Represent in logs, javadoc affection of several node 
addresses on failure detection.
 Key: IGNITE-13205
 URL: https://issues.apache.org/jira/browse/IGNITE-13205
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Current TcpDiscoverySpi can prolong detection of node failure which has several 
IP addresses. This happens because most of the timeouts like 
failureDetectionTimeout, sockTimeout, ackTimeout works per address. And the 
node addresses are sorted out serially. This affection on failure detection 
should be noted in logs, javadocs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13194) Fix testNodeWithIncompatibleMetadataIsProhibitedToJoinTheCluster()

2020-06-29 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13194:
-

 Summary: Fix 
testNodeWithIncompatibleMetadataIsProhibitedToJoinTheCluster()
 Key: IGNITE-13194
 URL: https://issues.apache.org/jira/browse/IGNITE-13194
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13134) Fix connection recovery timout.

2020-06-08 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13134:
-

 Summary: Fix connection recovery timout.
 Key: IGNITE-13134
 URL: https://issues.apache.org/jira/browse/IGNITE-13134
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.8.1
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


If node experiences connection issues it must establish new connection or fail 
within failureDetectionTimeout + connectionRecoveryTimout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Question: network issues of single node.

2020-06-05 Thread Vladimir Steshin

Denis,

I have no nodes that I'm unable to interconnect. This case is simulated 
in IgniteDiscoveryMassiveNodeFailTest.testMassiveFailSelfKill()

Introduced in [1].

I’m asking if it is real or supposed problem. Where it was met? Which 
network configuration/issues could be?



[1] https://issues.apache.org/jira/browse/IGNITE-7163

05.06.2020 1:01, Denis Magda пишет:

Vladimir,

I'm suggesting to share the log files from the nodes that are unable to
interconnect so that the community can check them for potential issues.
Instead of sharing the logs from all the 5 nodes, try to start a two-nodes
cluster with the nodes that fail to discover each other and attach the logs
from those.

-
Denis


On Thu, Jun 4, 2020 at 1:57 PM Vladimir Steshin  wrote:


Denis, hi.

  Sorry, I didn’t catch your idea. Are you saying this can happen and
suggest experiment? I’m not descripting a probable case. It is already
done in [1]. I’m asking is it real, where it was met.


04.06.2020 23:33, Denis Magda пишет:

Vladimir,

Please do the following experiment. Start a 2-nodes cluster booting node

3

and, for instance, node 5. Those won't be able to interconnect according

to

your description. Attach the log files from both nodes for analysis. This
should be a networking issue.

-
Denis


On Thu, Jun 4, 2020 at 1:24 PM Vladimir Steshin 

wrote:

   Hi, Igniters.


   I wanted to ask how one node may not be able to connect to another
whereas rest of the cluster can. This got covered in [1]. In short: node
3 can't connect to nodes 4 and 5 but can to 1. At the same time, node 2
can connect to 4. Questions:

1) Is it real case? Where this problem came from?

2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t connect
to 4 (and 5) too?

Sergey, Dmitry maybe you bring light (I see you in [1])? I'm
participating in [2] and found this backward connection checking.
Answering would help us a lot.

Thanks!

[1]
https://issues.apache.org/jira/browse/IGNITE-7163<
https://issues.apache.org/jira/browse/IGNITE-7163>

[2]



https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up

<


https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up


Re: Question: network issues of single node.

2020-06-04 Thread Vladimir Steshin

Denis, hi.

    Sorry, I didn’t catch your idea. Are you saying this can happen and 
suggest experiment? I’m not descripting a probable case. It is already 
done in [1]. I’m asking is it real, where it was met.



04.06.2020 23:33, Denis Magda пишет:

Vladimir,

Please do the following experiment. Start a 2-nodes cluster booting node 3
and, for instance, node 5. Those won't be able to interconnect according to
your description. Attach the log files from both nodes for analysis. This
should be a networking issue.

-
Denis


On Thu, Jun 4, 2020 at 1:24 PM Vladimir Steshin  wrote:


  Hi, Igniters.


  I wanted to ask how one node may not be able to connect to another
whereas rest of the cluster can. This got covered in [1]. In short: node
3 can't connect to nodes 4 and 5 but can to 1. At the same time, node 2
can connect to 4. Questions:

1) Is it real case? Where this problem came from?

2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t connect
to 4 (and 5) too?

Sergey, Dmitry maybe you bring light (I see you in [1])? I'm
participating in [2] and found this backward connection checking.
Answering would help us a lot.

Thanks!

[1]
https://issues.apache.org/jira/browse/IGNITE-7163<
https://issues.apache.org/jira/browse/IGNITE-7163>

[2]

https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
<
https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up



Question: network issues of single node.

2020-06-04 Thread Vladimir Steshin

    Hi, Igniters.


    I wanted to ask how one node may not be able to connect to another 
whereas rest of the cluster can. This got covered in [1]. In short: node 
3 can't connect to nodes 4 and 5 but can to 1. At the same time, node 2 
can connect to 4. Questions:


1) Is it real case? Where this problem came from?

2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t connect 
to 4 (and 5) too?


Sergey, Dmitry maybe you bring light (I see you in [1])? I'm 
participating in [2] and found this backward connection checking. 
Answering would help us a lot.


Thanks!

[1] 
https://issues.apache.org/jira/browse/IGNITE-7163


[2] 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up




[jira] [Created] (IGNITE-13111) Simplify backward checking of node connection.

2020-06-03 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13111:
-

 Summary: Simplify backward checking of node connection.
 Key: IGNITE-13111
 URL: https://issues.apache.org/jira/browse/IGNITE-13111
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


We should fix several drawbacks in the backward checking of failed node. They 
prolong node failure detection upto: 
ServerImpl.CON_CHECK_INTERVAL + 2 * IgniteConfiguretion.failureDetectionTimeout 
+ 300ms. 

See:
* ‘_NodeFailureResearch.patch_'. It creates test 'FailureDetectionResearch' 
which emulates long answears on a failed node and measures failure detection 
delays.
* '_FailureDetectionResearch.txt_' - results of the test.
* '_FailureDetectionResearch_fixed.txt_' - results of the test after this fix.
* '_WostCaseStepByStep.txt_' - description how the worst case happens.


*Suggestion:*

1) We can simplify backward connection checking as we implement IGNITE-13012. 
Once we get robust, predictable connection ping, we don't need to check 
previous node because we can see whether it sent ping to current node within 
failure detection timeout. If not, previous node can be considered lost.

Instead of:
{code:java}
// Node cannot connect to it's next (for local node it's previous).
// Need to check connectivity to it.
long rcvdTime = lastRingMsgReceivedTime;
long now = U.currentTimeMillis();

// We got message from previous in less than double 
connection check interval.
boolean ok = rcvdTime + effectiveExchangeTimeout() >= 
now;
TcpDiscoveryNode previous = null;

if (ok) {
// Check case when previous node suddenly died. 
This will speed up
// node failing.

  Checking connection to previous node
 }
{code}

2) Then, seems we can remove:
{code:java}
ServerImpl.SocketReader.isConnectionRefused(SocketAddress addr);
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13090) Add parameter of connection check period to TcpDiscoverySpi

2020-05-28 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13090:
-

 Summary: Add parameter of connection check period to 
TcpDiscoverySpi
 Key: IGNITE-13090
 URL: https://issues.apache.org/jira/browse/IGNITE-13090
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


We should add parameter of connection check period to TcpDiscoverySpi. If it 
isn't automatically set by IgniteConfiguration.setFailureDetectionTimeout(), 
user should be able to adjust it. Similar params:


{code:java}
TcpDiscoverySpi.setReconnectCount()
TcpDiscoverySpi.setAckTimeout()
TcpDiscoverySpi.setSocketTimeout()
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13040) Remove unused parameter from TcpDiscoverySpi.writeToSocket()

2020-05-20 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13040:
-

 Summary: Remove unused parameter from 
TcpDiscoverySpi.writeToSocket()
 Key: IGNITE-13040
 URL: https://issues.apache.org/jira/browse/IGNITE-13040
 Project: Ignite
  Issue Type: Task
 Environment: Unused parameter {code:java}TcpDiscoveryAbstractMessage 
msg{code} should be removed from
{code:java}
TcpDiscovery.writeToSocket(Socket sock, TcpDiscoveryAbstractMessage msg, byte[] 
data, long timeout){code}

This method seems to send raw data, not a message.
 
Reporter: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13018) Get rid of duplicated checking of failed node.

2020-05-15 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13018:
-

 Summary: Get rid of duplicated checking of failed node.
 Key: IGNITE-13018
 URL: https://issues.apache.org/jira/browse/IGNITE-13018
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Failed node checking should be simplified to one step: ping node (send a 
message) from previous one in the ring and wait for response within 
IgniteConfiguration.failureDetectionTimeout. If node doesn't respond, we should 
consider it failed. Extra steps of connection checking may seriously delay 
failure detection, bring confusion and weird behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13017) Remove delay of 200ms from re-marking failed node as alive.

2020-05-15 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13017:
-

 Summary: Remove delay of 200ms from re-marking failed node as 
alive.
 Key: IGNITE-13017
 URL: https://issues.apache.org/jira/browse/IGNITE-13017
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


We should remove hardcoded timeout from:

{code:java}
boolean ServerImpl.CrossRingMessageSendState.markLastFailedNodeAlive() {
if (state == RingMessageSendState.FORWARD_PASS || state == 
RingMessageSendState.BACKWARD_PASS) {
   ...

if (--failedNodes <= 0) {
...

state = RingMessageSendState.STARTING_POINT;

try {
Thread.sleep(200);
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}

return true;
}

return false;
}
{code}

This can bring additional 200ms to duration of failed node detection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13016) Remove hardcoded values/timeouts from backward checking of failed node.

2020-05-15 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13016:
-

 Summary: Remove hardcoded values/timeouts from backward checking 
of failed node.
 Key: IGNITE-13016
 URL: https://issues.apache.org/jira/browse/IGNITE-13016
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Backward checking of failed node rely on hardcoced timeout 100ms:

{code:java}
private boolean ServerImpls.isConnectionRefused(SocketAddress addr) {
try (Socket sock = new Socket()) {
sock.connect(addr, 100);
}
catch (ConnectException e) {
return true;
}
catch (IOException e) {
return false;
}

return false;
}
{code}

We should make it bound to configurable params like 
IgniteConfiguration.failureDetectionTimeout




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13015) Use nono time instead of currentMills() in node failure ddetection.

2020-05-15 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13015:
-

 Summary: Use nono time instead of currentMills() in node failure 
ddetection.
 Key: IGNITE-13015
 URL: https://issues.apache.org/jira/browse/IGNITE-13015
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Make sure in node failure detection not used:
{code:java}
System.currentTimeMillis()
and
IgniteUtils.currentTimeMillis()
{code}

Disadventages:

1)  Current system time has no quarantine of strict forward movement. 
System time can be adjusted, synchronized by NTP as example. This can lead to 
incorrect and negative delays.

2)   IgniteUtils.currentTimeMillis() is granulated by 10ms



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13014) Remove long, double checking of node availability. Fix hardcoded values.

2020-05-15 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13014:
-

 Summary: Remove long, double checking of node availability. Fix 
hardcoded values.
 Key: IGNITE-13014
 URL: https://issues.apache.org/jira/browse/IGNITE-13014
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


For the present, we have duplicated checking of node availability. This 
prolongs node failure detection and gives no additional benefits. There are 
mesh and hardcoded values in this routine.
Let's imagine node 2 doesn't answer any more. Node 1 becomes unable to ping 
node 2 and asks Node 3 to establish permanent connection instead of node 2. 
Despite node 2 has been already pinged within configured timeouts, node 3 try 
to connect to node 2 too. 
Disadvantages:
1)  Possible long detection of node failure up to 
ServerImpl.CON_CHECK_INTERVAL + 2 * IgniteConfiguretion.failureDetectionTimeout 
+ 300ms. See ‘WostCase.txt’

2)  Unexpected, not-configurable decision to check availability of previous 
node based on ‘2 * ServerImpl.CON_CHECK_INTERVAL‘:

// We got message from previous in less than double connection check interval.
boolean ok = rcvdTime + CON_CHECK_INTERVAL * 2 >= now; 

If ‘ok == true’ node 3 checks node 2.

3)  Double node checking brings several not-configurable hardcoded delays:
Node 3 checks node 2 with hardcoded timeout 100ms:
ServerImpl.isConnectionRefused():

sock.connect(addr, 100);

Checking availability of previous node considers any exception but 
ConnectionException (connection refused) as existing connection. Even a 
timeout. See ServerImpl.isConnectionRefused().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13012) Make node connection checking rely on the configuration. Simplify node ping routine.

2020-05-14 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13012:
-

 Summary: Make node connection checking rely on the configuration. 
Simplify node ping routine.
 Key: IGNITE-13012
 URL: https://issues.apache.org/jira/browse/IGNITE-13012
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin



Current noted-to-node connection checking has several drawbacks:
1)  Minimal connection checking interval is not bound to failure detection 
parameters: 
static int ServerImpls.CON_CHECK_INTERVAL = 500;
2)  Connection checking is made as ability of periodical message sending 
(TcpDiscoveryConnectionCheckMessage). It is bound to own time (ServerImpl. 
RingMessageWorker.lastTimeConnCheckMsgSent), not to common time of last sent 
message. This is weird because any discovery message actually checks 
connection. And TpDiscoveryConnectionCheckMessage is just an addition when 
message queue is empty for a long time.
3)  Period of Node-to-Node connection checking can be sometimes shortened 
for strange reason: if no sent or received message appears within 
failureDetectionTimeout. Here, despite we have minimal period of connection 
checking (ServerImpls.CON_CHECK_INTERVAL), we can also send 
TpDiscoveryConnectionCheckMessage before this period exhausted. Moreover, this 
premature node ping relies also on time of last received message. Imagine: if 
node 2 receives no message from node 1 within some time it decides to do extra 
ping node 3 not waiting for regular ping interval. Such behavior makes 
confusion and gives no additional guaranties.
4)  If #3 happens, node writes in the log on INFO: “Local node seems to be 
disconnected from topology …” whereas it is not actually disconnected. User can 
see this message if he typed failureDetectionTimeout < 500ms. I wouldn’t like 
seeing INFO in a log saying a node is might be disconnected. This sounds like 
some troubles raised in network. But not as everything is OK. 

Suggestions:
1)  Make connection check interval be based on failureDetectionTimeout or 
similar params.
2)  Make connection check interval rely on common time of last sent 
message. Not on dedicated time.
3)  Remove additional, random, quickened connection checking.
4)  Do not worry user with “Node disconnected” when everything is OK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Active nodes aliveness WatchDog

2020-04-08 Thread Vladimir Steshin

Hi everyone.

I think we should check behavior of failure detection with tests or find 
them if already written. I’ll research this question and rise a ticket 
if a reproducer appears.




08.04.2020 12:19, Stephen Darlington пишет:

Yes. Nodes are always chatting to each another even if there are no requests 
coming In.

Here’s the status message: 
https://github.com/apache/ignite/blob/e9b3c4cebaecbeec9fa51bd6ec32a879fb89948a/modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/messages/TcpDiscoveryStatusCheckMessage.java

Regards,
Stephen


On 8 Apr 2020, at 10:04, Anton Vinogradov  wrote:

It seems you're talking about Failure Detection (Timeouts).
Will it detect node failure on still cluster?

On Wed, Apr 8, 2020 at 11:52 AM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:


The configuration parameters that I’m aware of are here:


https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.html

Other people would be better placed to discuss the internals.

Regards.
Stephen


On 8 Apr 2020, at 09:32, Anton Vinogradov  wrote:

Stephen,


Nodes check on their neighbours and notify the remaining nodes if one

disappears.
Could you explain how this works in detail?
How can I set/change check frequency?

On Wed, Apr 8, 2020 at 11:13 AM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:


This is one of the functions of the DiscoverySPI. Nodes check on their
neighbours and notify the remaining nodes if one disappears. When the
topology changes, it triggers a rebalance, which relocates primary
partitions to live nodes. This is entirely transparent to clients.

It gets more complex… like there’s the partition loss policy and
rebalancing doesn’t always happen (configurable, persistence, etc)… but
broadly it does as you expect.

Regards,
Stephen


On 8 Apr 2020, at 08:40, Anton Vinogradov  wrote:

Igniters,
Do we have some feature allows to check nodes aliveness on a regular

basis?

Scenario:
Precondition
The cluster has no load but some node's JVM crashed.

Expected actual
The user performs an operation (eg. cache put) related to this node

(via

another node) and waits for some timeout to gain it's dead.
The cluster starts the switch to relocate primary partitions to alive
nodes.
Now user able to retry the operation.

Desired
Some WatchDog checks nodes aliveness on a regular basis.
Once a failure detected, the cluster starts the switch.
Later, the user performs an operation on an already fixed cluster and
waits for nothing.

It would be good news if the "Desired" case is already Actual.
Can somebody point to the feature that performs this check?










Re: Data vanished from cluster after INACTIVE/ACTIVE switch

2020-04-03 Thread Vladimir Steshin

Slava, hello.


All right, since we have in public API several

/* Deactivation clears in-memory caches (without persistence) including 
the system caches./


We should change in the internals

/    @param forceDeactivation If {@code true}, cluster deactivation will 
be forced./


onto anolugous

/    @param forceDeactivation If {@code true}, cluster deactivation will 
be forced. //Deactivation clears in-memory caches (without persistence) 
including the system caches./


//

We might include this fix in the last [1]. WDYT, can we proceed with [1] 
then ?


[1] https://issues.apache.org/jira/browse/IGNITE-12779



02.04.2020 19:58, Вячеслав Коптилин пишет:

Hi Vladimir,


There are about 15 places in inner logic with this description.
I propose balance between code base size and comment completeness.

I agree with Iva and I also think that this approach is not so good.
Perhaps we can add just a link to the one method which will provide a
comprehensive description, something like as follows
@param forceDeactivation {@code true} if cluster deactivation should be
forced. Please take a look at {@link IgniteCluster#state(ClusterState
newState, boolean force)} for the details.

What do you think?

Thanks,
Slava.

чт, 2 апр. 2020 г. в 18:47, Vladimir Steshin :


Ivan, hello.

Thanks. I vote for keeping the comments as they are now :)

Igniters, it seems we are agreed to merge [1]. And the ticked s to be
reverted in future with new designed solution of keeping in-memory data
after deactivation.

Right?


[1] https://issues.apache.org/jira/browse/IGNITE-12779


01.04.2020 20:20, Ivan Rakov пишет:

I don't think that making javadocs more descriptive can be considered as
harmful code base enlargement.
I'd recommend to extend the docs, but the last word is yours ;)

On Tue, Mar 31, 2020 at 2:44 PM Vladimir Steshin 

wrote:

Ivan, hi.

I absolutely agree this particular description is not enough to see the
deactivation issue. I also vote for brief code.

There are about 15 places in inner logic with this description. I
propose balance between code base size and comment completeness.

Should we enlarge code even if we got several full descriptions?


30.03.2020 20:02, Ivan Rakov пишет:

Vladimir,

@param forceDeactivation If {@code true}, cluster deactivation will be

forced.

It's true that it's possible to infer semantics of forced deactivation

from

other parts of API. I just wanted to highlight that exactly this
description explains something that can be guessed by the parameter

name.

I suppose to shorten the lookup path and shed a light on deactivation
semantics a bit:


@param forceDeactivation If {@code true}, cluster will be deactivated

even

if running in-memory caches are present. All data in the corresponding
caches will be vanished as a result.

Does this make sense?

On Fri, Mar 27, 2020 at 12:00 PM Vladimir Steshin 
wrote:


Ivan, hi.


1) >>> Is it correct? If we are on the same page, let's proceed this

way

It is correct.


2) - In many places in the code I can see the following javadoc


 @param forceDeactivation If {@code true}, cluster deactivation

will

be

forced.

In the internal params/flags. You can also find /@see
ClusterState#INACTIVE/ and full description with several public APIs (
like /Ignite.active(boolean)/ ):

//

/* /

//

/* NOTE:/

//

/* Deactivation clears in-memory caches (without persistence)

including

the system caches./

Should be enough. Is not?


27.03.2020 10:51, Ivan Rakov пишет:

Vladimir, Igniters,

Let's emphasize our final plan.

We are going to add --force flags that will be necessary to pass for

a

deactivation if there are in-memory caches to:
1) Rest API (already implemented in [1])
2) Command line utility (already implemented in [1])
3) JMX bean (going to be implemented in [2])
We are *not* going to change IgniteCluster or any other thick Java

API,

thus we are *not* going to merge [3].
We plan to *fully rollback* [1] and [2] once cache data survival

after

activation-deactivation cycle will be implemented.

Is it correct? If we are on the same page, let's proceed this way.
I propose to:
- Create a JIRA issue for in-memory-data-safe deactivation (possibly,
without IEP and detailed design so far)
- Describe in the issue description what exact parts of API should be
removed under the issue scope.

Also, a few questions on already merged [1]:
- We have removed GridClientClusterState#state(ClusterState) from

Java

client API. Is it a legitimate thing to do? Don't we have to support

API

compatibility for thin clients as well?
- In many places in the code I can see the following javadoc


 @param forceDeactivation If {@code true}, cluster deactivation

will

be forced.

As for me, this javadoc doesn't clarify anything. I'd suggest to

describe

in which cases deactivation won't happen unless it's forced and which
impact forced deactivation will bring on the system.

[1]: https://issues.apache.org/jira/browse/IGNITE-12701
[

Re: Data vanished from cluster after INACTIVE/ACTIVE switch

2020-04-02 Thread Vladimir Steshin

Ivan, hello.

Thanks. I vote for keeping the comments as they are now :)

Igniters, it seems we are agreed to merge [1]. And the ticked s to be 
reverted in future with new designed solution of keeping in-memory data 
after deactivation.


Right?


[1] https://issues.apache.org/jira/browse/IGNITE-12779


01.04.2020 20:20, Ivan Rakov пишет:

I don't think that making javadocs more descriptive can be considered as
harmful code base enlargement.
I'd recommend to extend the docs, but the last word is yours ;)

On Tue, Mar 31, 2020 at 2:44 PM Vladimir Steshin  wrote:


Ivan, hi.

I absolutely agree this particular description is not enough to see the
deactivation issue. I also vote for brief code.

There are about 15 places in inner logic with this description. I
propose balance between code base size and comment completeness.

Should we enlarge code even if we got several full descriptions?


30.03.2020 20:02, Ivan Rakov пишет:

Vladimir,

@param forceDeactivation If {@code true}, cluster deactivation will be

forced.

It's true that it's possible to infer semantics of forced deactivation

from

other parts of API. I just wanted to highlight that exactly this
description explains something that can be guessed by the parameter name.
I suppose to shorten the lookup path and shed a light on deactivation
semantics a bit:


@param forceDeactivation If {@code true}, cluster will be deactivated

even

if running in-memory caches are present. All data in the corresponding
caches will be vanished as a result.

Does this make sense?

On Fri, Mar 27, 2020 at 12:00 PM Vladimir Steshin 
wrote:


Ivan, hi.


1) >>> Is it correct? If we are on the same page, let's proceed this way

It is correct.


2) - In many places in the code I can see the following javadoc


@param forceDeactivation If {@code true}, cluster deactivation will

be

forced.

In the internal params/flags. You can also find /@see
ClusterState#INACTIVE/ and full description with several public APIs (
like /Ignite.active(boolean)/ ):

//

/* /

//

/* NOTE:/

//

/* Deactivation clears in-memory caches (without persistence) including
the system caches./

Should be enough. Is not?


27.03.2020 10:51, Ivan Rakov пишет:

Vladimir, Igniters,

Let's emphasize our final plan.

We are going to add --force flags that will be necessary to pass for a
deactivation if there are in-memory caches to:
1) Rest API (already implemented in [1])
2) Command line utility (already implemented in [1])
3) JMX bean (going to be implemented in [2])
We are *not* going to change IgniteCluster or any other thick Java API,
thus we are *not* going to merge [3].
We plan to *fully rollback* [1] and [2] once cache data survival after
activation-deactivation cycle will be implemented.

Is it correct? If we are on the same page, let's proceed this way.
I propose to:
- Create a JIRA issue for in-memory-data-safe deactivation (possibly,
without IEP and detailed design so far)
- Describe in the issue description what exact parts of API should be
removed under the issue scope.

Also, a few questions on already merged [1]:
- We have removed GridClientClusterState#state(ClusterState) from Java
client API. Is it a legitimate thing to do? Don't we have to support

API

compatibility for thin clients as well?
- In many places in the code I can see the following javadoc


@param forceDeactivation If {@code true}, cluster deactivation will

be forced.

As for me, this javadoc doesn't clarify anything. I'd suggest to

describe

in which cases deactivation won't happen unless it's forced and which
impact forced deactivation will bring on the system.

[1]: https://issues.apache.org/jira/browse/IGNITE-12701
[2]: https://issues.apache.org/jira/browse/IGNITE-12779
[3]: https://issues.apache.org/jira/browse/IGNITE-12614

--
Ivan

On Tue, Mar 24, 2020 at 7:18 PM Vladimir Steshin 

wrote:

Hi, Igniters.

I'd like to remind you that cluster can be deactivated by user with 3
utilities: control.sh, *JMX and the REST*. Proposed in [1] solution is
not about control.sh. It suggests same approach regardless of the
utility user executes. The task touches *only* *API of the user

calls*,

not the internal APIs.

The reasons why flag “--yes” and confirmation prompt hasn’t taken into
account for control.sh are:

-Various commands widely use “--yes” just to start. Even not dangerous
ones require “--yes” to begin. “--force” is dedicated for *harmless
actions*.

-Checking of probable data erasure works after command start and
“--force” may not be required at all.

-There are also JMX and REST. They have no “—yes” but should work

alike.

To get the deactivation safe I propose to merge last ticket

with

the JMX fixes [2]. In future releases, I believe, we should estimate
jobs and fix memory erasure in general. For now, let’s prevent it.

WDYT?


[1] https://issues.apache.org/jira/browse/IGNITE-12614

[2] https://issues.apache.org/jira/browse/IGNITE-12779


24.03.2020 15:55, Вячеслав Коптилин пише

Re: Data vanished from cluster after INACTIVE/ACTIVE switch

2020-03-31 Thread Vladimir Steshin

Ivan, hi.

I absolutely agree this particular description is not enough to see the 
deactivation issue. I also vote for brief code.


There are about 15 places in inner logic with this description. I 
propose balance between code base size and comment completeness.


Should we enlarge code even if we got several full descriptions?


30.03.2020 20:02, Ivan Rakov пишет:

Vladimir,

@param forceDeactivation If {@code true}, cluster deactivation will be

forced.

It's true that it's possible to infer semantics of forced deactivation from
other parts of API. I just wanted to highlight that exactly this
description explains something that can be guessed by the parameter name.
I suppose to shorten the lookup path and shed a light on deactivation
semantics a bit:


@param forceDeactivation If {@code true}, cluster will be deactivated even
if running in-memory caches are present. All data in the corresponding
caches will be vanished as a result.

Does this make sense?

On Fri, Mar 27, 2020 at 12:00 PM Vladimir Steshin 
wrote:


Ivan, hi.


1) >>> Is it correct? If we are on the same page, let's proceed this way

It is correct.


2) - In many places in the code I can see the following javadoc


   @param forceDeactivation If {@code true}, cluster deactivation will be

forced.

In the internal params/flags. You can also find /@see
ClusterState#INACTIVE/ and full description with several public APIs (
like /Ignite.active(boolean)/ ):

//

/* /

//

/* NOTE:/

//

/* Deactivation clears in-memory caches (without persistence) including
the system caches./

Should be enough. Is not?


27.03.2020 10:51, Ivan Rakov пишет:

Vladimir, Igniters,

Let's emphasize our final plan.

We are going to add --force flags that will be necessary to pass for a
deactivation if there are in-memory caches to:
1) Rest API (already implemented in [1])
2) Command line utility (already implemented in [1])
3) JMX bean (going to be implemented in [2])
We are *not* going to change IgniteCluster or any other thick Java API,
thus we are *not* going to merge [3].
We plan to *fully rollback* [1] and [2] once cache data survival after
activation-deactivation cycle will be implemented.

Is it correct? If we are on the same page, let's proceed this way.
I propose to:
- Create a JIRA issue for in-memory-data-safe deactivation (possibly,
without IEP and detailed design so far)
- Describe in the issue description what exact parts of API should be
removed under the issue scope.

Also, a few questions on already merged [1]:
- We have removed GridClientClusterState#state(ClusterState) from Java
client API. Is it a legitimate thing to do? Don't we have to support API
compatibility for thin clients as well?
- In many places in the code I can see the following javadoc


   @param forceDeactivation If {@code true}, cluster deactivation will

be forced.

As for me, this javadoc doesn't clarify anything. I'd suggest to

describe

in which cases deactivation won't happen unless it's forced and which
impact forced deactivation will bring on the system.

[1]: https://issues.apache.org/jira/browse/IGNITE-12701
[2]: https://issues.apache.org/jira/browse/IGNITE-12779
[3]: https://issues.apache.org/jira/browse/IGNITE-12614

--
Ivan

On Tue, Mar 24, 2020 at 7:18 PM Vladimir Steshin 

wrote:

Hi, Igniters.

I'd like to remind you that cluster can be deactivated by user with 3
utilities: control.sh, *JMX and the REST*. Proposed in [1] solution is
not about control.sh. It suggests same approach regardless of the
utility user executes. The task touches *only* *API of the user calls*,
not the internal APIs.

The reasons why flag “--yes” and confirmation prompt hasn’t taken into
account for control.sh are:

-Various commands widely use “--yes” just to start. Even not dangerous
ones require “--yes” to begin. “--force” is dedicated for *harmless
actions*.

-Checking of probable data erasure works after command start and
“--force” may not be required at all.

-There are also JMX and REST. They have no “—yes” but should work alike.

   To get the deactivation safe I propose to merge last ticket with
the JMX fixes [2]. In future releases, I believe, we should estimate
jobs and fix memory erasure in general. For now, let’s prevent it. WDYT?


[1] https://issues.apache.org/jira/browse/IGNITE-12614

[2] https://issues.apache.org/jira/browse/IGNITE-12779


24.03.2020 15:55, Вячеслав Коптилин пишет:

Hello Nikolay,

I am talking about the interactive mode of the control utility, which
requires explicit confirmation from the user.
Please take a look at DeactivateCommand#prepareConfirmation and its

usages.

It seems to me, this mode has the same aim as the forceDeactivation

flag.

We can change the message returned by

DeactivateCommand#confirmationPrompt

as follows:
   "Warning: the command will deactivate the cluster nnn and clear
in-memory caches (without persistence) including system caches."

What do you think?

Thanks,
S.

вт, 24 мар. 2020 г. в 13:

Re: Data vanished from cluster after INACTIVE/ACTIVE switch

2020-03-27 Thread Vladimir Steshin

Ivan, hi.


1) >>> Is it correct? If we are on the same page, let's proceed this way

It is correct.


2) - In many places in the code I can see the following javadoc


  @param forceDeactivation If {@code true}, cluster deactivation will be forced.


In the internal params/flags. You can also find /@see 
ClusterState#INACTIVE/ and full description with several public APIs ( 
like /Ignite.active(boolean)/ ):


//

/* /

//

/* NOTE:/

//

/* Deactivation clears in-memory caches (without persistence) including 
the system caches./


Should be enough. Is not?


27.03.2020 10:51, Ivan Rakov пишет:

Vladimir, Igniters,

Let's emphasize our final plan.

We are going to add --force flags that will be necessary to pass for a
deactivation if there are in-memory caches to:
1) Rest API (already implemented in [1])
2) Command line utility (already implemented in [1])
3) JMX bean (going to be implemented in [2])
We are *not* going to change IgniteCluster or any other thick Java API,
thus we are *not* going to merge [3].
We plan to *fully rollback* [1] and [2] once cache data survival after
activation-deactivation cycle will be implemented.

Is it correct? If we are on the same page, let's proceed this way.
I propose to:
- Create a JIRA issue for in-memory-data-safe deactivation (possibly,
without IEP and detailed design so far)
- Describe in the issue description what exact parts of API should be
removed under the issue scope.

Also, a few questions on already merged [1]:
- We have removed GridClientClusterState#state(ClusterState) from Java
client API. Is it a legitimate thing to do? Don't we have to support API
compatibility for thin clients as well?
- In many places in the code I can see the following javadoc


  @param forceDeactivation If {@code true}, cluster deactivation will be forced.

As for me, this javadoc doesn't clarify anything. I'd suggest to describe

in which cases deactivation won't happen unless it's forced and which
impact forced deactivation will bring on the system.

[1]: https://issues.apache.org/jira/browse/IGNITE-12701
[2]: https://issues.apache.org/jira/browse/IGNITE-12779
[3]: https://issues.apache.org/jira/browse/IGNITE-12614

--
Ivan

On Tue, Mar 24, 2020 at 7:18 PM Vladimir Steshin  wrote:


Hi, Igniters.

I'd like to remind you that cluster can be deactivated by user with 3
utilities: control.sh, *JMX and the REST*. Proposed in [1] solution is
not about control.sh. It suggests same approach regardless of the
utility user executes. The task touches *only* *API of the user calls*,
not the internal APIs.

The reasons why flag “--yes” and confirmation prompt hasn’t taken into
account for control.sh are:

-Various commands widely use “--yes” just to start. Even not dangerous
ones require “--yes” to begin. “--force” is dedicated for *harmless
actions*.

-Checking of probable data erasure works after command start and
“--force” may not be required at all.

-There are also JMX and REST. They have no “—yes” but should work alike.

  To get the deactivation safe I propose to merge last ticket with
the JMX fixes [2]. In future releases, I believe, we should estimate
jobs and fix memory erasure in general. For now, let’s prevent it. WDYT?


[1] https://issues.apache.org/jira/browse/IGNITE-12614

[2] https://issues.apache.org/jira/browse/IGNITE-12779


24.03.2020 15:55, Вячеслав Коптилин пишет:

Hello Nikolay,

I am talking about the interactive mode of the control utility, which
requires explicit confirmation from the user.
Please take a look at DeactivateCommand#prepareConfirmation and its

usages.

It seems to me, this mode has the same aim as the forceDeactivation flag.
We can change the message returned by

DeactivateCommand#confirmationPrompt

as follows:
  "Warning: the command will deactivate the cluster nnn and clear
in-memory caches (without persistence) including system caches."

What do you think?

Thanks,
S.

вт, 24 мар. 2020 г. в 13:07, Nikolay Izhikov :


Hello, Slava.

Are you talking about this commit [1] (sorry for commit message it’s due
to the Github issue)?

The message for this command for now

«Deactivation stopped. Deactivation clears in-memory caches (without
persistence) including the system caches.»

Is it clear enough?

[1]


https://github.com/apache/ignite/commit/4921fcf1fecbd8a1ab02099e09cc2adb0b3ff88a



24 марта 2020 г., в 13:02, Вячеслав Коптилин 
написал(а):

Hi Nikolay,


1. We should add —force flag to the command.sh deactivation command.

I just checked and it seems that the deactivation command
(control-utility.sh) already has a confirmation option.
Perhaps, we need to clearly state the consequences of using this

command

with in-memory caches.

Thanks,
S.

вт, 24 мар. 2020 г. в 12:51, Nikolay Izhikov :


Hello, Alexey.

I just repeat our agreement to be on the same page


The confirmation should only present in the user-facing interfaces.

1. We should add —force flag to the command.sh deactivation command.
2. We should 

Re: Data vanished from cluster after INACTIVE/ACTIVE switch

2020-03-24 Thread Vladimir Steshin

Hi, Igniters.

I'd like to remind you that cluster can be deactivated by user with 3 
utilities: control.sh, *JMX and the REST*. Proposed in [1] solution is 
not about control.sh. It suggests same approach regardless of the 
utility user executes. The task touches *only* *API of the user calls*, 
not the internal APIs.


The reasons why flag “--yes” and confirmation prompt hasn’t taken into 
account for control.sh are:


-Various commands widely use “--yes” just to start. Even not dangerous 
ones require “--yes” to begin. “--force” is dedicated for *harmless 
actions*.


-Checking of probable data erasure works after command start and 
“--force” may not be required at all.


-There are also JMX and REST. They have no “—yes” but should work alike.

    To get the deactivation safe I propose to merge last ticket with 
the JMX fixes [2]. In future releases, I believe, we should estimate 
jobs and fix memory erasure in general. For now, let’s prevent it. WDYT?



[1] https://issues.apache.org/jira/browse/IGNITE-12614

[2] https://issues.apache.org/jira/browse/IGNITE-12779


24.03.2020 15:55, Вячеслав Коптилин пишет:

Hello Nikolay,

I am talking about the interactive mode of the control utility, which
requires explicit confirmation from the user.
Please take a look at DeactivateCommand#prepareConfirmation and its usages.
It seems to me, this mode has the same aim as the forceDeactivation flag.
We can change the message returned by DeactivateCommand#confirmationPrompt
as follows:
 "Warning: the command will deactivate the cluster nnn and clear
in-memory caches (without persistence) including system caches."

What do you think?

Thanks,
S.

вт, 24 мар. 2020 г. в 13:07, Nikolay Izhikov :


Hello, Slava.

Are you talking about this commit [1] (sorry for commit message it’s due
to the Github issue)?

The message for this command for now

«Deactivation stopped. Deactivation clears in-memory caches (without
persistence) including the system caches.»

Is it clear enough?

[1]
https://github.com/apache/ignite/commit/4921fcf1fecbd8a1ab02099e09cc2adb0b3ff88a



24 марта 2020 г., в 13:02, Вячеслав Коптилин 

написал(а):

Hi Nikolay,


1. We should add —force flag to the command.sh deactivation command.

I just checked and it seems that the deactivation command
(control-utility.sh) already has a confirmation option.
Perhaps, we need to clearly state the consequences of using this command
with in-memory caches.

Thanks,
S.

вт, 24 мар. 2020 г. в 12:51, Nikolay Izhikov :


Hello, Alexey.

I just repeat our agreement to be on the same page


The confirmation should only present in the user-facing interfaces.

1. We should add —force flag to the command.sh deactivation command.
2. We should throw the exception if cluster has in-memory caches and
—force=false.
3. We shouldn’t change Java API for deactivation.

Is it correct?


The DROP TABLE command does not have a "yes I am sure" clause in it

I think it because the command itself has a «DROP» word in it’s name.
Which clearly explains what will happen on it’s execution.

On the opposite «deactivation» command doesn’t have any sign of
destructive operation.
That’s why we should warn user about it’s consequences.



24 марта 2020 г., в 12:38, Alexey Goncharuk <

alexey.goncha...@gmail.com>

написал(а):

Igniters, Ivan, Nikolay,

I am strongly against adding confirmation flags to any kind of APIs,
whether we change the deactivation behavior or not (even though I agree
that it makes sense to fix the deactivation to not clean up the

in-memory

data). The confirmation should only present in the user-facing

interfaces.

I cannot recall any software interface which has such a flag. None of

the

syscalls which delete files (a very destructive operation) have this

flag.

The DROP TABLE command does not have a "yes I am sure" clause in it.

As I

already mentioned, when used programmatically, most users will likely
simply pass 'true' as the new flag because they already know the

behavior.

This is a clear sign of a bad design choice.

On top of that, given that it is our intention to change the behavior

of

deactivation to not loose the in-memory data, it does not make sense to

me

to change the API.






Re: Reference of local service.

2020-03-24 Thread Vladimir Steshin

    Hi, folks.

 I'd like to advance the final decision. Is it OK to:

1)Make IgniteService.serviceProxy() return proxy for locally deployed 
service too.


2)Make the proxy collect metrics of service method invocations without 
any additional conditions, interfaces, options.


3)    Deprecate IgniteService.service().

WDYT?


17.03.2020 19:56, Vladimir Steshin пишет:

Andrey,

>>> Is it possible to return actual class instance instead of 
interface from serviceProxy() method?


No, it is not possible. IgniteServiceImpl doesn't allow that:

public  T serviceProxy(final String name, final Class 
svcItf, final boolean sticky, final long timeout) {

    ...
    A.ensure(svcItf.isInterface(), "Service class must be an 
interface: " + svcItf);


    ...

}


17.03.2020 17:34, Andrey Gura пишет:

Vladimir,

Why not using IgniteServices.serviceProxy() for that? Since it 
requires an interface, It could return proxy for local service too and

keep backward compatibility at the same time.

Is it possible to return actual class instance instead of interface
from serviceProxy() method? E.g. could I get ServiceImpl as result of
method call instead of ServiceItf?

On Tue, Mar 17, 2020 at 9:50 AM Vladimir Steshin  
wrote:

Andrey,

IgniteServices.service() method could return actual interface 
implementation instead of interface itself.


IgniteServices.service() always return actual local service 
instance, no proxy, might be without any interface but except Service.



If yes then we can add new method IgniteService.serviceLocalProxy().
It will not break backward compatibility and will always return 
proxy.
Why not using IgniteServices.serviceProxy() for that? Since it 
requires an interface, It could return proxy for local service too and

keep backward compatibility at the same time.

16.03.2020 20:21, Andrey Gura пишет:

Vladimir,


We won’t affect existing services
How exactly will we affect services without special interface? 
Please see

the benchmarks in previous email.

I talk about backward compatibility, not about performance. But it
doesn't matter because... see below.

My fault. From discussion I realized that services doesn't require
interface. But indeed it does require.

If I understand correctly, IgniteServices.service() method could
return actual interface implementation instead of interface itself.
Am I right?

If yes then we can add new method IgniteService.serviceLocalProxy().
It will not break backward compatibility and will always return proxy.

On Thu, Mar 12, 2020 at 2:25 PM Vladimir Steshin 
 wrote:

Andrey, hi.


We won’t affect existing services
How exactly will we affect services without special interface? 
Please see

the benchmarks in previous email.


what if we generate will generate proxy that collects service’s 
metrics

only if service will implement some special interface?


I don’t like idea enabling/disabling metrics involves code change,
compilation. I believe it should be an external option, probably 
available

at runtime through JMX.


we can impose additional requirement for services that want use 
metrics

out of box. … service must have own interface and only invocations of
methods of this interface will be taken into account for metrics 
collection.


Why one more interface? To work via proxy, with remote services user
already has to use an interface additionally to Service. If we 
introduce
proxy for local services too (as suggested earlier), an interface 
will be
required. Current IgniteService#serviceProxy() already requires 
interface
even for local service. I don’t think we need one more special 
interface.




user always can use own metrics framework.
Since we do not significantly affect services user can use both or 
disable

our by an option.


With the discussion before and the benchmark I propose:


- Let IgniteService#serviceProxy() give GridServiceProxy for local 
services
too. It already requires to work via interface. So it’s safe for 
user code.



- Deprecate IgniteService#service()


- Make service metrics enabled by default for all services.


- Bring system param which disables metrics by default for all 
services.



- Bring parameter/method in MetricsMxBean which allows 
disabling/enabling

metrics for all services at run time.

Makes sense?

чт, 5 мар. 2020 г., 16:48 Andrey Gura :


Hi there,

what if we will generate proxy that collects service's metrics 
only if

service will implement some special interface? In such case:

- we won't affect existing services at all.
- we can impose additional requirement for services that want use
metrics out of box (i.e. service that implements our special 
interface
*must* also have own interface and only invocations of methods of 
this

interface will be taken into account for metrics collection).
- user always can use own metrics framework instead of our (just do
not implement this new special interface).

About metrics enabling/disabling. At present IGNITE-11927 doesn't
solve this pr

Re: Reference of local service.

2020-03-17 Thread Vladimir Steshin

Andrey,

>>> Is it possible to return actual class instance instead of interface 
from serviceProxy() method?


No, it is not possible. IgniteServiceImpl doesn't allow that:

public  T serviceProxy(final String name, final Class 
svcItf, final boolean sticky, final long timeout) {

    ...
    A.ensure(svcItf.isInterface(), "Service class must be an interface: 
" + svcItf);


    ...

}


17.03.2020 17:34, Andrey Gura пишет:

Vladimir,


Why not using IgniteServices.serviceProxy() for that? Since it requires an 
interface, It could return proxy for local service too and
keep backward compatibility at the same time.

Is it possible to return actual class instance instead of interface
from serviceProxy() method? E.g. could I get ServiceImpl as result of
method call instead of ServiceItf?

On Tue, Mar 17, 2020 at 9:50 AM Vladimir Steshin  wrote:

Andrey,


IgniteServices.service() method could return actual interface implementation 
instead of interface itself.


IgniteServices.service() always return actual local service instance, no proxy, 
might be without any interface but except Service.


If yes then we can add new method IgniteService.serviceLocalProxy().
It will not break backward compatibility and will always return proxy.

Why not using IgniteServices.serviceProxy() for that? Since it requires an 
interface, It could return proxy for local service too and
keep backward compatibility at the same time.

16.03.2020 20:21, Andrey Gura пишет:

Vladimir,


We won’t affect existing services

How exactly will we affect services without special interface? Please see
the benchmarks in previous email.

I talk about backward compatibility, not about performance. But it
doesn't matter because... see below.

My fault. From discussion I realized that services doesn't require
interface. But indeed it does require.

If I understand correctly, IgniteServices.service() method could
return actual interface implementation instead of interface itself.
Am I right?

If yes then we can add new method IgniteService.serviceLocalProxy().
It will not break backward compatibility and will always return proxy.

On Thu, Mar 12, 2020 at 2:25 PM Vladimir Steshin  wrote:

Andrey, hi.


We won’t affect existing services

How exactly will we affect services without special interface? Please see
the benchmarks in previous email.



what if we generate will generate proxy that collects service’s metrics

only if service will implement some special interface?


I don’t like idea enabling/disabling metrics involves code change,
compilation. I believe it should be an external option, probably available
at runtime through JMX.



we can impose additional requirement for services that want use metrics

out of box. … service must have own interface and only invocations of
methods of this interface will be taken into account for metrics collection.

Why one more interface? To work via proxy, with remote services user
already has to use an interface additionally to Service. If we introduce
proxy for local services too (as suggested earlier), an interface will be
required. Current IgniteService#serviceProxy() already requires interface
even for local service. I don’t think we need one more special interface.



user always can use own metrics framework.

Since we do not significantly affect services user can use both or disable
our by an option.


With the discussion before and the benchmark I propose:


- Let IgniteService#serviceProxy() give GridServiceProxy for local services
too. It already requires to work via interface. So it’s safe for user code.


- Deprecate IgniteService#service()


- Make service metrics enabled by default for all services.


- Bring system param which disables metrics by default for all services.


- Bring parameter/method in MetricsMxBean which allows disabling/enabling
metrics for all services at run time.

Makes sense?

чт, 5 мар. 2020 г., 16:48 Andrey Gura :


Hi there,

what if we will generate proxy that collects service's metrics only if
service will implement some special interface? In such case:

- we won't affect existing services at all.
- we can impose additional requirement for services that want use
metrics out of box (i.e. service that implements our special interface
*must* also have own interface and only invocations of methods of this
interface will be taken into account for metrics collection).
- user always can use own metrics framework instead of our (just do
not implement this new special interface).

About metrics enabling/disabling. At present IGNITE-11927 doesn't
solve this problem. Just because there is no metrics implementation
for services :)
Anyway we should provide a way for configuring service metrics (in
sense of enabled/disabled) during service deploy. It's easy for cases
where deploy() methods have ServiceConfiguration as parameter. But
there are "short cut" methods like deployXxxSingleton(). I have ideas
how to solve this problem. For example we can 

Re: Data vanished from cluster after INACTIVE/ACTIVE switch

2020-03-17 Thread Vladimir Steshin
Nikolay, I think we should reconsider clearing at least system caches 
when deactivating.


17.03.2020 14:18, Nikolay Izhikov пишет:

Hello, Vladimir.

I don’t get it.

What is your proposal?
What we should do?


17 марта 2020 г., в 14:11, Vladimir Steshin  написал(а):

Nikolay, hi.


And should be covered with the  —force parameter we added.

As fix for user cases - yes. My idea is to emphasize overall ability to lose 
various objects, not only data. Probably might be reconsidered in future.


17.03.2020 13:49, Nikolay Izhikov пишет:

Hello, Vladimir.

If there is at lease one persistent data region then system data region also 
becomes persistent.
Your example applies only to pure in-memory clusters.

And should be covered with the —force parameter we added.

What do you think?


17 марта 2020 г., в 13:45, Vladimir Steshin  написал(а):

 Hi, all.

Fixes for control.sh and the REST have been merged. Could anyone take a look to 
the previous email with an issue? Isn't this conductvery wierd?



Re: Data vanished from cluster after INACTIVE/ACTIVE switch

2020-03-17 Thread Vladimir Steshin

Nikolay, hi.


And should be covered with the  —force parameter we added.


As fix for user cases - yes. My idea is to emphasize overall ability to 
lose various objects, not only data. Probably might be reconsidered in 
future.



17.03.2020 13:49, Nikolay Izhikov пишет:

Hello, Vladimir.

If there is at lease one persistent data region then system data region also 
becomes persistent.
Your example applies only to pure in-memory clusters.

And should be covered with the —force parameter we added.

What do you think?


17 марта 2020 г., в 13:45, Vladimir Steshin  написал(а):

 Hi, all.

Fixes for control.sh and the REST have been merged. Could anyone take a look to 
the previous email with an issue? Isn't this conductvery wierd?



Re: Data vanished from cluster after INACTIVE/ACTIVE switch

2020-03-17 Thread Vladimir Steshin

    Hi, all.

Fixes for control.sh and the REST have been merged. Could anyone take a 
look to the previous email with an issue? Isn't this conductvery wierd?




Re: Reference of local service.

2020-03-17 Thread Vladimir Steshin

Andrey,


IgniteServices.service() method could return actual interface implementation 
instead of interface itself.



IgniteServices.service() always return actual local service instance, no proxy, 
might be without any interface but except Service.


If yes then we can add new method IgniteService.serviceLocalProxy().
It will not break backward compatibility and will always return proxy.


Why not using IgniteServices.serviceProxy() for that? Since it requires an 
interface, It could return proxy for local service too and
keep backward compatibility at the same time.

16.03.2020 20:21, Andrey Gura пишет:

Vladimir,


We won’t affect existing services

How exactly will we affect services without special interface? Please see
the benchmarks in previous email.

I talk about backward compatibility, not about performance. But it
doesn't matter because... see below.

My fault. From discussion I realized that services doesn't require
interface. But indeed it does require.

If I understand correctly, IgniteServices.service() method could
return actual interface implementation instead of interface itself.
Am I right?

If yes then we can add new method IgniteService.serviceLocalProxy().
It will not break backward compatibility and will always return proxy.

On Thu, Mar 12, 2020 at 2:25 PM Vladimir Steshin  wrote:

Andrey, hi.


We won’t affect existing services

How exactly will we affect services without special interface? Please see
the benchmarks in previous email.



what if we generate will generate proxy that collects service’s metrics

only if service will implement some special interface?


I don’t like idea enabling/disabling metrics involves code change,
compilation. I believe it should be an external option, probably available
at runtime through JMX.



we can impose additional requirement for services that want use metrics

out of box. … service must have own interface and only invocations of
methods of this interface will be taken into account for metrics collection.

Why one more interface? To work via proxy, with remote services user
already has to use an interface additionally to Service. If we introduce
proxy for local services too (as suggested earlier), an interface will be
required. Current IgniteService#serviceProxy() already requires interface
even for local service. I don’t think we need one more special interface.



user always can use own metrics framework.


Since we do not significantly affect services user can use both or disable
our by an option.


With the discussion before and the benchmark I propose:


- Let IgniteService#serviceProxy() give GridServiceProxy for local services
too. It already requires to work via interface. So it’s safe for user code.


- Deprecate IgniteService#service()


- Make service metrics enabled by default for all services.


- Bring system param which disables metrics by default for all services.


- Bring parameter/method in MetricsMxBean which allows disabling/enabling
metrics for all services at run time.

Makes sense?

чт, 5 мар. 2020 г., 16:48 Andrey Gura :


Hi there,

what if we will generate proxy that collects service's metrics only if
service will implement some special interface? In such case:

- we won't affect existing services at all.
- we can impose additional requirement for services that want use
metrics out of box (i.e. service that implements our special interface
*must* also have own interface and only invocations of methods of this
interface will be taken into account for metrics collection).
- user always can use own metrics framework instead of our (just do
not implement this new special interface).

About metrics enabling/disabling. At present IGNITE-11927 doesn't
solve this problem. Just because there is no metrics implementation
for services :)
Anyway we should provide a way for configuring service metrics (in
sense of enabled/disabled) during service deploy. It's easy for cases
where deploy() methods have ServiceConfiguration as parameter. But
there are "short cut" methods like deployXxxSingleton(). I have ideas
how to solve this problem. For example we can introduce "short cut"
factory methods like nodeSingletonConfiguration(String name, Service
service) and clusterSingletonConfiguration(String name, Service
service). This methods will return configuration which has parameters
for given type of deployment and could be modified, e.g. metrics could
be enabled.

WDYT?

On Wed, Mar 4, 2020 at 8:42 PM Vladimir Steshin 
wrote:

Vyacheslav, Denis, hi again.




I agree with the proposal to introduce a new method which returns

proxy

include the case of locally deployed services.



I see one is restricted to use an interface for service with
IgniteServiceProcessor#serviceProxy(…):



A.ensure(svcItf.isInterface(), "Service class must be an interface: " +
svcItf);



What if we change IgniteService#serviceProxy(...) so that it will return
proxy everytime? That looks safe for user code. Doing so we migh

Re: Reference of local service.

2020-03-12 Thread Vladimir Steshin
Andrey, hi.

>> We won’t affect existing services

How exactly will we affect services without special interface? Please see
the benchmarks in previous email.


>>> what if we generate will generate proxy that collects service’s metrics
only if service will implement some special interface?


I don’t like idea enabling/disabling metrics involves code change,
compilation. I believe it should be an external option, probably available
at runtime through JMX.


>> we can impose additional requirement for services that want use metrics
out of box. … service must have own interface and only invocations of
methods of this interface will be taken into account for metrics collection.

Why one more interface? To work via proxy, with remote services user
already has to use an interface additionally to Service. If we introduce
proxy for local services too (as suggested earlier), an interface will be
required. Current IgniteService#serviceProxy() already requires interface
even for local service. I don’t think we need one more special interface.


>> user always can use own metrics framework.


Since we do not significantly affect services user can use both or disable
our by an option.


With the discussion before and the benchmark I propose:


- Let IgniteService#serviceProxy() give GridServiceProxy for local services
too. It already requires to work via interface. So it’s safe for user code.


- Deprecate IgniteService#service()


- Make service metrics enabled by default for all services.


- Bring system param which disables metrics by default for all services.


- Bring parameter/method in MetricsMxBean which allows disabling/enabling
metrics for all services at run time.

Makes sense?

чт, 5 мар. 2020 г., 16:48 Andrey Gura :

> Hi there,
>
> what if we will generate proxy that collects service's metrics only if
> service will implement some special interface? In such case:
>
> - we won't affect existing services at all.
> - we can impose additional requirement for services that want use
> metrics out of box (i.e. service that implements our special interface
> *must* also have own interface and only invocations of methods of this
> interface will be taken into account for metrics collection).
> - user always can use own metrics framework instead of our (just do
> not implement this new special interface).
>
> About metrics enabling/disabling. At present IGNITE-11927 doesn't
> solve this problem. Just because there is no metrics implementation
> for services :)
> Anyway we should provide a way for configuring service metrics (in
> sense of enabled/disabled) during service deploy. It's easy for cases
> where deploy() methods have ServiceConfiguration as parameter. But
> there are "short cut" methods like deployXxxSingleton(). I have ideas
> how to solve this problem. For example we can introduce "short cut"
> factory methods like nodeSingletonConfiguration(String name, Service
> service) and clusterSingletonConfiguration(String name, Service
> service). This methods will return configuration which has parameters
> for given type of deployment and could be modified, e.g. metrics could
> be enabled.
>
> WDYT?
>
> On Wed, Mar 4, 2020 at 8:42 PM Vladimir Steshin 
> wrote:
> >
> > Vyacheslav, Denis, hi again.
> >
> >
> >
> > >>> I agree with the proposal to introduce a new method which returns
> proxy
> > include the case of locally deployed services.
> >
> >
> >
> > I see one is restricted to use an interface for service with
> > IgniteServiceProcessor#serviceProxy(…):
> >
> >
> >
> > A.ensure(svcItf.isInterface(), "Service class must be an interface: " +
> > svcItf);
> >
> >
> >
> > What if we change IgniteService#serviceProxy(...) so that it will return
> > proxy everytime? That looks safe for user code. Doing so we might only
> > deprecate IgniteService#service(...).
> >
> >
> >
> > вт, 3 мар. 2020 г., 11:03 Vyacheslav Daradur :
> >
> > > Denis, finally I understood your arguments about interfaces check,
> thank
> > > you for the explanation.
> > >
> > > I agree with the proposal to introduce a new method which returns proxy
> > > include the case of locally deployed services.
> > >
> > > Also, such a method should be able to work in mode "local services
> > > preferred", perhaps with load-balancing (in case of multiple locally
> > > deployed instances). This allows our end-users to reach better
> performance.
> > >
> > >
> > >
> > > On Mon, Mar 2, 2020 at 7:51 PM Denis Mekhanikov  >
> > > wrote:
> > >
> > > > Vyaches

[jira] [Created] (IGNITE-12779) Split Ignite and IgniteMXBean, make different behavior of the active(boolean)

2020-03-12 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-12779:
-

 Summary: Split Ignite and IgniteMXBean, make different behavior of 
the active(boolean)
 Key: IGNITE-12779
 URL: https://issues.apache.org/jira/browse/IGNITE-12779
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


To make cluster deactivation through JMX without sudden erasure in-memory data 
we should:

1)  Add _IgniteMXBean#state(String state, boolean force)_.

2)  Let _IgniteMXBean#state(String state)_ and _IgniteMXBean#active(boolean 
active)_  fail when deactivating cluster with in-memory data.

3)  Separate implementations _Ignite_ and _IgniteMXBean_ from 
_IgniteKernal_. They have same method _void active(boolean active)_ which is 
required with different behavior. In case of _Ignite#active(boolean active)_ it 
should not fail when deactivating cluster with in-memory data.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12773) Reduce number of cluster deactivation methods in internal API.

2020-03-11 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-12773:
-

 Summary: Reduce number of cluster deactivation methods in internal 
API.
 Key: IGNITE-12773
 URL: https://issues.apache.org/jira/browse/IGNITE-12773
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


To reduce number of cluster deactivation methods in internal API we might:

1.  Remove
GridClientClusterState#active()

2.  Remove
GridClientClusterState#active(boolean active)

3.  Remove
IGridClusterStateProcessor#changeGlobalState(
boolean activate,
Collection baselineNodes,
boolean forceChangeBaselineTopology
)

4.  Remove
GridClusterStateProcessor#changeGlobalState(
final boolean activate,
Collection baselineNodes,
boolean forceChangeBaselineTopology,
boolean isAutoAdjust
)

5.  Remove
GridClusterStateProcessor#changeGlobalState(
final boolean activate,
Collection baselineNodes,
boolean forceChangeBaselineTopology
)

6.  Remove 
GridClusterStateProcessor#changeGlobalState(
ClusterState state,
boolean forceDeactivation,
Collection baselineNodes,
boolean forceChangeBaselineTopology
)

7.  Add boolean isAutoAdjust to 
IGridClusterStateProcessor#changeGlobalState(
ClusterState state,
boolean forceDeactivation,
Collection baselineNodes,
boolean forceChangeBaselineTopology,
   /* here */ boolean isAutoAdjust /* here */
)

8.  Add @Override to 
/* here */ @Override /* here */
GridClusterStateProcessor#changeGlobalState(
ClusterState state,
boolean forceDeactivation,
Collection baselineNodes,
boolean forceChangeBaselineTopology
)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Reference of local service.

2020-03-10 Thread Vladimir Steshin
Hi, everyone.

I've done some benchmarks and wonder whether we need to disable service
metrics somehow. As Denis suggested, proxies might be involved for local
services. If I enable current proxy for local services, I see the flowing
cases:

1) Service proxy is too slow with trivial operations to care about metrics
performance.
2) Service metrics do not hit performance significantly or do not hit at
all on real operations.

I doubt other implementation of service proxy could bring serious changes
and would worth. Do we need to care about performance hit of the service
metrics and think how to disable?


Without metrics:
- Test [3] : proxy [2] is 6.5 times slower than local [1] (11kk ops/s VS
72kk ops/s)
- Test [4] : proxy is 19% slower than local (990k ops/s VS 1182k ops/s)
- Test [5] : proxy is not slower (same 250k ops/s)

With metrics:
- Test [3] : proxy [2] is 7.5 times slower than local [1] (9.6kk ops/s VS
73kk ops/s)
- Test [4] : proxy is 21% slower than local (995k ops/s VS 1200k ops/s)
- Test [5] : proxy is 10% slower than local (244k ops/s VS 264k ops/s)


private TestService localService [1];
private TestService serviceProxy [2];

private IgniteEx ignite;
private IgniteCache cache;


interface TestService {
Object handleVal(int value);
}

static class TestServiceImpl implements Service, TestService {
…
private IgniteCache cache;

@Override public Object handleVal(int value) {
   [3] return randomInt(value);

   [4] return cache.get(value);

   [5] return cache.getAndPut(randomInt(value), value);
}
}


@Benchmark
public void localService(Blackhole blackhole) throws Exception {
blackhole.consume( localService.handleVal(1+randomInt(MAX_VALUE)) );
}

@Benchmark
public void proxiedService(Blackhole blackhole) throws Exception {
blackhole.consume( serviceProxy.handleVal(1+randomInt(MAX_VALUE)) );
}


@Setup
public void setup() throws Exception {
ignite = (IgniteEx)Ignition.start(configuration("grid0"));

cache = ignite.createCache("cache");

for(int i=0; i(ignite.cluster(),"srv",
TestService.class, true, 0, ignite.context()).proxy();
}






чт, 5 мар. 2020 г., 16:48 Andrey Gura ag...@apache.org:

> Hi there,
>
> what if we will generate proxy that collects service's metrics only if
> service will implement some special interface? In such case:
>
> - we won't affect existing services at all.
> - we can impose additional requirement for services that want use
> metrics out of box (i.e. service that implements our special interface
> *must* also have own interface and only invocations of methods of this
> interface will be taken into account for metrics collection).
> - user always can use own metrics framework instead of our (just do
> not implement this new special interface).
>
> About metrics enabling/disabling. At present IGNITE-11927 doesn't
> solve this problem. Just because there is no metrics implementation
> for services :)
> Anyway we should provide a way for configuring service metrics (in
> sense of enabled/disabled) during service deploy. It's easy for cases
> where deploy() methods have ServiceConfiguration as parameter. But
> there are "short cut" methods like deployXxxSingleton(). I have ideas
> how to solve this problem. For example we can introduce "short cut"
> factory methods like nodeSingletonConfiguration(String name, Service
> service) and clusterSingletonConfiguration(String name, Service
> service). This methods will return configuration which has parameters
> for given type of deployment and could be modified, e.g. metrics could
> be enabled.
>
> WDYT?
>
> On Wed, Mar 4, 2020 at 8:42 PM Vladimir Steshin 
> wrote:
> >
> > Vyacheslav, Denis, hi again.
> >
> >
> >
> > >>> I agree with the proposal to introduce a new method which returns
> proxy
> > include the case of locally deployed services.
> >
> >
> >
> > I see one is restricted to use an interface for service with
> > IgniteServiceProcessor#serviceProxy(…):
> >
> >
> >
> > A.ensure(svcItf.isInterface(), "Service class must be an interface: " +
> > svcItf);
> >
> >
> >
> > What if we change IgniteService#serviceProxy(...) so that it will return
> > proxy everytime? That looks safe for user code. Doing so we might only
> > deprecate IgniteService#service(...).
> >
> >
> >
> > вт, 3 мар. 2020 г., 11:03 Vyacheslav Daradur :
> >
> > > Denis, finally I understood your arguments about interfaces check,
> thank
> > > you for the explanation.
> > >
> > > I agree with the proposal to introduce a new method which returns proxy
> > > include the case of locally deployed services.
&

Re: Data vanished from cluster after INACTIVE/ACTIVE switch

2020-03-09 Thread Vladimir Steshin

Hi, Igniters.

I've found other deactivation issue. Should we consider this behavior as 
a bug?


IgniteEx ignite = startGrids(3);

    IgniteAtomicLong atomicLong = ignite.atomicLong("atomic", 10L, 
true);



    IgniteLock lock = ignite.reentrantLock("lock", true, false, true);


    lock.lock();

    assertEquals(10L, atomicLong.get());
    assertTrue(lock.isLocked());

    lock.unlock();

    ignite.active(false);
    ignite.active(true);

    // Failed: java.lang.NullPointerException at 
GridCacheLockImpl.java:496

    assertFalse(lock.isLocked());

    // Failrd: org.apache.ignite.IgniteException: Failed to find 
atomic long: testAtomic

    assertEquals(10L, atomicLong.get());



Re: Reference of local service.

2020-03-04 Thread Vladimir Steshin
Vyacheslav, Denis, hi again.



>>> I agree with the proposal to introduce a new method which returns proxy
include the case of locally deployed services.



I see one is restricted to use an interface for service with
IgniteServiceProcessor#serviceProxy(…):



A.ensure(svcItf.isInterface(), "Service class must be an interface: " +
svcItf);



What if we change IgniteService#serviceProxy(...) so that it will return
proxy everytime? That looks safe for user code. Doing so we might only
deprecate IgniteService#service(...).



вт, 3 мар. 2020 г., 11:03 Vyacheslav Daradur :

> Denis, finally I understood your arguments about interfaces check, thank
> you for the explanation.
>
> I agree with the proposal to introduce a new method which returns proxy
> include the case of locally deployed services.
>
> Also, such a method should be able to work in mode "local services
> preferred", perhaps with load-balancing (in case of multiple locally
> deployed instances). This allows our end-users to reach better performance.
>
>
>
> On Mon, Mar 2, 2020 at 7:51 PM Denis Mekhanikov 
> wrote:
>
> > Vyacheslav,
> >
> > You can't make service interfaces extend
> > *org.apache.ignite.services.Service*. Currently it works perfectly if
> > *org.apache.ignite.services.Service* and a user-defined interface are
> > independent. This is actually the case in our current examples:
> >
> >
> https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/servicegrid/SimpleMapService.java
> > I mentioned the *Serializable* interface just as an example of an
> interface
> > that can be present, but it's not the one that is going to be called by a
> > user.
> >
> > What I'm trying to say is that there is no way to say whether the service
> > is going to be used through a proxy only, or usage of a local instance is
> > also possible.
> >
> > Vladimir,
> >
> > I don't like the idea, that enabling or disabling of metrics will change
> > the behaviour of the component you collect the metrics for. Such
> behaviour
> > is far from obvious.
> >
> > Nikolay,
> >
> > I agree, that such approach is valid and makes total sense. But making
> the
> > *IgniteServices#serviceProxy()* method always return a proxy instead of a
> > local instance will change the public contract. The javadoc currently
> says
> > the following:
> >
> > > If service is available locally, then local instance is returned,
> > > otherwise, a remote proxy is dynamically created and provided for the
> > > specified service.
> >
> >
> > I propose introducing a new method that will always return a service
> proxy
> > regardless of local availability, and deprecating *serviceProxy()* and
> > *service()
> > *methods. What do you think?
> >
> > Denis
> >
> > пн, 2 мар. 2020 г. в 16:08, Nikolay Izhikov :
> >
> > > Hello, Vladimir.
> > >
> > > > What if we just provide an option to disable service metrics at all?
> > >
> > > I don't think we should create an explicit property for service
> metrics.
> > > We will implement the way to disable any metrics in the scope of
> > > IGNITE-11927 [1].
> > >
> > > > Usage of a proxy instead of service instances can lead to performance
> > > > degradation for local instances, which is another argument against
> such
> > > change.
> > >
> > > As far as I know, many and many modern frameworks use a proxy approach.
> > > Just to name one - Spring framework works with the proxy.
> > >
> > > We should measure the impact on the performance that brings
> proxy+metric
> > > and after it make the decision on local service metrics implementation.
> > > Vladimir, can you, as a contributor of this task make this measurement?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-11927
> > >
> > > пн, 2 мар. 2020 г. в 12:56, Vladimir Steshin :
> > >
> > > > Denis, Vyacheslav, hi.
> > > >
> > > > What if we just provide an option to disable service metrics at all?
> It
> > > > would keep direct references for local services. Also, we can make
> > > service
> > > > metrics disabled by default to keep current code working. A warning
> of
> > > > local service issues will be set with the option.
> > > >
> > > > пн, 2 мар. 2020 г. в 11:26, Vyacheslav Daradur  >:
> > > >
> > > > > >> Moreover, I don't see a way o

Re: Reference of local service.

2020-03-02 Thread Vladimir Steshin
Denis, Vyacheslav, hi.

What if we just provide an option to disable service metrics at all? It
would keep direct references for local services. Also, we can make service
metrics disabled by default to keep current code working. A warning of
local service issues will be set with the option.

пн, 2 мар. 2020 г. в 11:26, Vyacheslav Daradur :

> >> Moreover, I don't see a way of implementing such a check. Are you going
> to look just for any interface? What about Serializable? Will it do?
>
> The check should look for the interface which implements
> "org.apache.ignite.services.Service", it covers the requirement to be
> Serializable.
>
> >> For now though the best thing we can do is to calculate remote
> invocations only, since all of them go through a proxy.
>
> Let's introduce a system property to manage local services monitoring:
> - local services monitoring will be disabled by default - to avoid any
> backward compatibility issues;
> - local services monitoring can be enabled runtime with a known limitation
> for new services for example;
> Moreover, if we introduce such a feature flag to ServiceConfiguration -
> the new feature can be enabled per service separately.
>
> What do you think?
>
>
>
> On Mon, Mar 2, 2020 at 12:33 AM Denis Mekhanikov 
> wrote:
>
>> Vladimir, Slava,
>>
>> In general, I like the idea of abstracting the service deployment from
>> its usage, but there are some backward-compatibility considerations that
>> won't let us do so.
>>
>> Or we can declare usage of services without interfaces incorrect
>>
>>
>> I don't think we can introduce a requirement for all services to have an
>> interface, unfortunately. Such change can potentially break existing code,
>> since such requirement doesn't exist currently.
>> Moreover, I don't see a way of implementing such a check. Are you going
>> to look just for any interface? What about Serializable? Will it do?
>>
>> Usage of a proxy instead of service instances can lead to performance
>> degradation for local instances, which is another argument against such
>> change.
>>
>> I think, it will make sense to make all service invocations work through
>> a proxy in Ignite 3.
>> For now though the best thing we can do is to calculate remote
>> invocations only, since all of them go through a proxy.
>> Another option is to provide a simple way for a user to account the
>> service invocations themselves.
>>
>> What do you guys think?
>>
>> Denis
>>
>>
>> вт, 25 февр. 2020 г. в 16:50, Vyacheslav Daradur :
>>
>>> It is not a change of public API from my point of view.
>>>
>>> Also, there is a check to allow getting proxy only for an interface, not
>>> implementation.
>>>
>>> Denis, what do you think?
>>>
>>>
>>> вт, 25 февр. 2020 г. в 16:28, Vladimir Steshin :
>>>
>>>> Vyacheslav, this is exactly what I found. I'm doing [1] (metrics for
>>>> services) and realized I have to wrap local calls by a proxy. Is it a
>>>> change of public API and should come with major release only? Or we can
>>>> declare usage of services without interfaces incorrect?
>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12464
>>>>
>>>> вт, 25 февр. 2020 г. в 16:17, Vyacheslav Daradur :
>>>>
>>>>> {IgniteServices#service(String name)} returns direct reference in the
>>>>> current implementation.
>>>>>
>>>>> So, class casting should work for your example:
>>>>> ((MyServiceImpl)ignite.services().service(“myService”)).bar();
>>>>>
>>>>> It is safer to use an interface instead of an implementation, there is
>>>>> no guarantee that in future releases direct link will be returned, a
>>>>> service instance might be wrapped for monitoring for example.
>>>>>
>>>>>
>>>>> On Tue, Feb 25, 2020 at 4:09 PM Vladimir Steshin 
>>>>> wrote:
>>>>>
>>>>>> Vyacheslav, Hi.
>>>>>>
>>>>>> I see. But can we consider 'locally deployed service' is a proxy too,
>>>>>> not direct reference? What if I need to wrap it? This would be local
>>>>>> service working via proxy or null.
>>>>>>
>>>>>> вт, 25 февр. 2020 г. в 16:03, Vyacheslav Daradur >>>>> >:
>>>>>>
>>>>>>> Hi, Vladimir
>>>>>>>
>>>>>>> The 

Re: Reference of local service.

2020-02-25 Thread Vladimir Steshin
Vyacheslav, this is exactly what I found. I'm doing [1] (metrics for
services) and realized I have to wrap local calls by a proxy. Is it a
change of public API and should come with major release only? Or we can
declare usage of services without interfaces incorrect?
[1] https://issues.apache.org/jira/browse/IGNITE-12464

вт, 25 февр. 2020 г. в 16:17, Vyacheslav Daradur :

> {IgniteServices#service(String name)} returns direct reference in the
> current implementation.
>
> So, class casting should work for your example:
> ((MyServiceImpl)ignite.services().service(“myService”)).bar();
>
> It is safer to use an interface instead of an implementation, there is no
> guarantee that in future releases direct link will be returned, a service
> instance might be wrapped for monitoring for example.
>
>
> On Tue, Feb 25, 2020 at 4:09 PM Vladimir Steshin 
> wrote:
>
>> Vyacheslav, Hi.
>>
>> I see. But can we consider 'locally deployed service' is a proxy too, not
>> direct reference? What if I need to wrap it? This would be local service
>> working via proxy or null.
>>
>> вт, 25 февр. 2020 г. в 16:03, Vyacheslav Daradur :
>>
>>> Hi, Vladimir
>>>
>>> The answer is in API docs: "Gets *locally deployed service* with
>>> specified name." [1]
>>>
>>> That means {IgniteServices#service(String name)} returns only locally
>>> deployed instance or null.
>>>
>>> {IgniteServices#serviceProxy(…)} returns proxy to call instances across
>>> the cluster. Might be used for load-balancing.
>>>
>>> [1]
>>> https://github.com/apache/ignite/blob/56975c266e7019f307bb9da42333a6db4e47365e/modules/core/src/main/java/org/apache/ignite/IgniteServices.java#L569
>>>
>>> On Tue, Feb 25, 2020 at 3:51 PM Vladimir Steshin 
>>> wrote:
>>>
>>>> Hello, Igniters.
>>>>
>>>> Previous e-mail was with wrong topic 'daradu...@gmail.com' :)
>>>>
>>>> I got a question what exactly IgniteServices#service(String name) is
>>>> supposed to return: reference to the object or a proxy for some reason like
>>>> IgniteServices#serviceProxy(…)? Vyacheslav D., can you tell me your 
>>>> opinion?
>>>>
>>>> public interface MyService {
>>>>
>>>>public void foo();
>>>>
>>>> }
>>>>
>>>> public class MyServiceImpl implements Service, MyService {
>>>>
>>>>@Override public void foo(){ … }
>>>>
>>>>public void bar(){ … };
>>>>
>>>> }
>>>>
>>>>
>>>> // Is it required to support
>>>>
>>>> MyServiceImpl srvc = ignite.services().service(“myService”);
>>>>
>>>> srvc.foo();
>>>>
>>>> srvc.bar();
>>>>
>>>>
>>>>
>>>> // Or is the only correct way:
>>>>
>>>> MyService srvc = ignite.services().service(“myService”);
>>>>
>>>> srvc.foo();
>>>>
>>>>
>>>
>>> --
>>> Best Regards, Vyacheslav D.
>>>
>>
>
> --
> Best Regards, Vyacheslav D.
>


Re: Reference of local service.

2020-02-25 Thread Vladimir Steshin
Vyacheslav, Hi.

I see. But can we consider 'locally deployed service' is a proxy too, not
direct reference? What if I need to wrap it? This would be local service
working via proxy or null.

вт, 25 февр. 2020 г. в 16:03, Vyacheslav Daradur :

> Hi, Vladimir
>
> The answer is in API docs: "Gets *locally deployed service* with
> specified name." [1]
>
> That means {IgniteServices#service(String name)} returns only locally
> deployed instance or null.
>
> {IgniteServices#serviceProxy(…)} returns proxy to call instances across
> the cluster. Might be used for load-balancing.
>
> [1]
> https://github.com/apache/ignite/blob/56975c266e7019f307bb9da42333a6db4e47365e/modules/core/src/main/java/org/apache/ignite/IgniteServices.java#L569
>
> On Tue, Feb 25, 2020 at 3:51 PM Vladimir Steshin 
> wrote:
>
>> Hello, Igniters.
>>
>> Previous e-mail was with wrong topic 'daradu...@gmail.com' :)
>>
>> I got a question what exactly IgniteServices#service(String name) is
>> supposed to return: reference to the object or a proxy for some reason like
>> IgniteServices#serviceProxy(…)? Vyacheslav D., can you tell me your opinion?
>>
>> public interface MyService {
>>
>>public void foo();
>>
>> }
>>
>> public class MyServiceImpl implements Service, MyService {
>>
>>@Override public void foo(){ … }
>>
>>public void bar(){ … };
>>
>> }
>>
>>
>> // Is it required to support
>>
>> MyServiceImpl srvc = ignite.services().service(“myService”);
>>
>> srvc.foo();
>>
>> srvc.bar();
>>
>>
>>
>> // Or is the only correct way:
>>
>> MyService srvc = ignite.services().service(“myService”);
>>
>> srvc.foo();
>>
>>
>
> --
> Best Regards, Vyacheslav D.
>


  1   2   >