Re: Ever increasing startup times as data grow in persistent storage

2021-01-13 Thread Naveen
Hi Raymond

It does block writes until the checkpoint is complete, but this only happens
when we restart our nodes, that time all the piled up requests (during the
shutdown) gets processed, thats when bulk data ingestion happens, otherwise
for normal day to day real time operations it does not really hurt us since
we do not have any bulk writes etc.

Thanks




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Error:data-streamer-stripe....Timed out while waiting for schema update.

2021-01-13 Thread siva
Hi,
I have .Net ClientServer Ignite Application.I am using DataStreamer to load
data into ignite caches.

while loading data client node getting stop with stackoverflow message .And
on server node displaying message on console like
Error:data-streamer-stripeTimed out while waiting for schema update.


And while again starting client node getting the following message on
console.

Nodes started on local machine require more than 80% of physical RAM what
can lead to significant slowdown due to swapping (please decrease JVM heap
size, data region size or checkpoint buffer size) [required=78152MB,
available=65535MB]

what might be the issue and how to solve?

Any other inform needed please let me know.
Thanks.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: rebalancing and jobs

2021-01-13 Thread narges saleh
To clarify, I do not want to pause rebalancing. I want to see if a
rebalancing job is in progress, delay my job and a rebalancing job is not
in progress, delay any upcoming rebalancing job, till my job is finished.

On Wed, Jan 13, 2021 at 5:04 PM narges saleh  wrote:

> Thanks Alex. I will study the links you provided. I need to deal with
> rebalancing programmatically.
>
> On Wed, Jan 13, 2021 at 4:33 PM akorensh  wrote:
>
>> Hi,
>>   You can monitor using JMX as described here:
>>
>> https://ignite.apache.org/docs/latest/monitoring-metrics/metrics#monitoring-rebalancing
>>
>> you can also visually monitor rebalance via a special widget in control
>> center:
>>
>> https://www.gridgain.com/docs/control-center/latest/monitoring/configuring-widgets#rebalance-widget
>> https://ignite.apache.org/docs/latest/tools/gg-control-center
>>
>>There is no way to manually pause the rebalancing process as it is
>> performed automatically in response to specific events. You can, however,
>> use config settings to delay the start of the rebalance process in
>> response
>> to nodes joining/leaving, change batch size, etc..
>> see: https://ignite.apache.org/docs/latest/data-rebalancing#throttling
>> https://ignite.apache.org/docs/latest/data-rebalancing#other-properties
>>
>>
>>   As an aside, this system view gives you a view into those properties:
>>
>> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views#caches
>>
>> Thanks, Alex
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>


Re: rebalancing and jobs

2021-01-13 Thread narges saleh
Thanks Alex. I will study the links you provided. I need to deal with
rebalancing programmatically.

On Wed, Jan 13, 2021 at 4:33 PM akorensh  wrote:

> Hi,
>   You can monitor using JMX as described here:
>
> https://ignite.apache.org/docs/latest/monitoring-metrics/metrics#monitoring-rebalancing
>
> you can also visually monitor rebalance via a special widget in control
> center:
>
> https://www.gridgain.com/docs/control-center/latest/monitoring/configuring-widgets#rebalance-widget
> https://ignite.apache.org/docs/latest/tools/gg-control-center
>
>There is no way to manually pause the rebalancing process as it is
> performed automatically in response to specific events. You can, however,
> use config settings to delay the start of the rebalance process in response
> to nodes joining/leaving, change batch size, etc..
> see: https://ignite.apache.org/docs/latest/data-rebalancing#throttling
> https://ignite.apache.org/docs/latest/data-rebalancing#other-properties
>
>
>   As an aside, this system view gives you a view into those properties:
>
> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views#caches
>
> Thanks, Alex
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: rebalancing and jobs

2021-01-13 Thread akorensh
Hi,
  You can monitor using JMX as described here:
https://ignite.apache.org/docs/latest/monitoring-metrics/metrics#monitoring-rebalancing

you can also visually monitor rebalance via a special widget in control
center:
https://www.gridgain.com/docs/control-center/latest/monitoring/configuring-widgets#rebalance-widget
https://ignite.apache.org/docs/latest/tools/gg-control-center

   There is no way to manually pause the rebalancing process as it is
performed automatically in response to specific events. You can, however,
use config settings to delay the start of the rebalance process in response
to nodes joining/leaving, change batch size, etc..
see: https://ignite.apache.org/docs/latest/data-rebalancing#throttling
https://ignite.apache.org/docs/latest/data-rebalancing#other-properties


  As an aside, this system view gives you a view into those properties:
https://ignite.apache.org/docs/latest/monitoring-metrics/system-views#caches

Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


rebalancing and jobs

2021-01-13 Thread narges saleh
Hi All,

What is the best practice regarding partition rebalancing and jobs?
1) Before I start a job, how do I check whether a rebalancing process is in
progress?
2) If no rebalancing is in progress, how do I delay or pause upcoming
rebalancing processes until my job is finished, assuming I don't want
to/can't use affinityrun/call? Is setting rebalanceDelay sufficient? I
don't want to do manual rebalancing.

thanks.


Ignite rebalancing when a server is rebooted w/ persistance enabled.

2021-01-13 Thread maxi628
 Hello everyone.

I have several ignite clusters with version 2.7.6 and persistence enabled.
I have a 3 caches on every cluster, with 10M~ records each.

Sometimes when I reboot a node, it takes a lot of time to boot, it can be
hours.

With rebooting I mean stopping the container that's running ignite and
starting it again, without ever changing the baseline topology, it can take
2 minutes to restart the container.
The node joins the topology just fine but takes a long time to start serving
traffic.

Checking the logs I've found that there are several lines like this ones
here:



So for some reason after booting it starts a process called
PartitionsEvictManager, which can take a lot of time.
What is the intended functionality behind PartitionsEvictManager?
It is something that we should expect?

This is a problem because a rolling restart of all nodes in a cluster can
take up to a day.

Thanks.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Looking for feedback on the Ignite 3.0.0 Alpha

2021-01-13 Thread sri hari kali charan Tummala
Ok thanks , but I am looking forward for the Aws steps.

On Wed, 13 Jan 2021 at 11:39, Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Hi Sri,
>
> This will be available in future releases. Current alpha focuses on
> demonstrating how Ignite 3.0 will be delivered, and how you will run
> operations using the new CLI tool.
>
> -Val
>
> On Wed, Jan 13, 2021 at 11:21 AM sri hari kali charan Tummala <
> kali.tumm...@gmail.com> wrote:
>
>> Sure few steps are missing.
>>
>> Creating a ignite cluster is missing (example 3 ec2 instances)
>> Creating a ignite cluster on Aws would be nice step to add
>>
>>
>> Thanks
>> Sri
>>
>> On Wed, 13 Jan 2021 at 04:14, Kseniya Romanova 
>> wrote:
>>
>>> Here's the link for the online gathering:
>>> https://www.meetup.com/Apache-Ignite-Virtual-Meetup/events/275722317/
>>>
>>>
>>> ср, 13 янв. 2021 г. в 13:47, Pavel Tupitsyn :
>>>
 Getting Started Guide:

 https://ignite.apache.org/docs/3.0.0-alpha/quick-start/getting-started-guide

 On Wed, Jan 13, 2021 at 1:29 PM Stephen Darlington <
 stephen.darling...@gridgain.com> wrote:

> What is the link to the Getting Started Guide?
>
> On 13 Jan 2021, at 03:55, Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
> Igniters,
>
> I'm excited to announce that the first alpha build of the Ignite 3 is
> out and available for download!
>
> Ignite 3 is the new project that was initiated by the Ignite community
> last year. Please refer to this page if you want to learn more:
> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0
>
> The just-released alpha build is a sneak peek into the future of
> Ignite. It doesn't represent a fully-functional product (no discovery,
> caches, compute, etc.), but demonstrates major mechanics of how you will
> interact with Ignite going forward.
>
> The main goal of the release is to gather feedback from the community
> so that we can adjust further development if needed. That said, I would 
> ask
> and encourage everyone to go through the Getting Started Guide [1] and 
> play
> with the build. If you have any questions, issues, concerns, wishes,
> requests, or any other thoughts, please reply directly to this thread. We
> will carefully accumulate all the feedback and make sure it is considered
> going forward.
>
> Another opportunity to share your feedback will come closer to the end
> of January when we will have a virtual meetup. I will present a quick demo
> of the alpha build, after which we will have an open discussion. Please
> stay tuned - I will send a message here when the meetup is scheduled.
>
> -Val
>
>
>
> --
>> Thanks & Regards
>> Sri Tummala
>>
>> --
Thanks & Regards
Sri Tummala


Re: Looking for feedback on the Ignite 3.0.0 Alpha

2021-01-13 Thread Valentin Kulichenko
Hi Sri,

This will be available in future releases. Current alpha focuses on
demonstrating how Ignite 3.0 will be delivered, and how you will run
operations using the new CLI tool.

-Val

On Wed, Jan 13, 2021 at 11:21 AM sri hari kali charan Tummala <
kali.tumm...@gmail.com> wrote:

> Sure few steps are missing.
>
> Creating a ignite cluster is missing (example 3 ec2 instances)
> Creating a ignite cluster on Aws would be nice step to add
>
>
> Thanks
> Sri
>
> On Wed, 13 Jan 2021 at 04:14, Kseniya Romanova 
> wrote:
>
>> Here's the link for the online gathering:
>> https://www.meetup.com/Apache-Ignite-Virtual-Meetup/events/275722317/
>>
>>
>> ср, 13 янв. 2021 г. в 13:47, Pavel Tupitsyn :
>>
>>> Getting Started Guide:
>>>
>>> https://ignite.apache.org/docs/3.0.0-alpha/quick-start/getting-started-guide
>>>
>>> On Wed, Jan 13, 2021 at 1:29 PM Stephen Darlington <
>>> stephen.darling...@gridgain.com> wrote:
>>>
 What is the link to the Getting Started Guide?

 On 13 Jan 2021, at 03:55, Valentin Kulichenko <
 valentin.kuliche...@gmail.com> wrote:

 Igniters,

 I'm excited to announce that the first alpha build of the Ignite 3 is
 out and available for download!

 Ignite 3 is the new project that was initiated by the Ignite community
 last year. Please refer to this page if you want to learn more:
 https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0

 The just-released alpha build is a sneak peek into the future of
 Ignite. It doesn't represent a fully-functional product (no discovery,
 caches, compute, etc.), but demonstrates major mechanics of how you will
 interact with Ignite going forward.

 The main goal of the release is to gather feedback from the community
 so that we can adjust further development if needed. That said, I would ask
 and encourage everyone to go through the Getting Started Guide [1] and play
 with the build. If you have any questions, issues, concerns, wishes,
 requests, or any other thoughts, please reply directly to this thread. We
 will carefully accumulate all the feedback and make sure it is considered
 going forward.

 Another opportunity to share your feedback will come closer to the end
 of January when we will have a virtual meetup. I will present a quick demo
 of the alpha build, after which we will have an open discussion. Please
 stay tuned - I will send a message here when the meetup is scheduled.

 -Val



 --
> Thanks & Regards
> Sri Tummala
>
>


Re: Looking for feedback on the Ignite 3.0.0 Alpha

2021-01-13 Thread Valentin Kulichenko
Hi Wesley,

This is quite a long-term project, so it's hard to have exact predictions -
it all depends on how fast the code is contributed. But based on the scope,
I think we're looking towards around the end of this year - early next year.

-Val


On Tue, Jan 12, 2021 at 8:26 PM Wesley Peng  wrote:

> When will the stable version of 3.0 get released? thanks.
>
> Valentin Kulichenko wrote:
> > I'm excited to announce that the first alpha build of the Ignite 3 is
> > out and available for download!
> >
> > Ignite 3 is the new project that was initiated by the Ignite community
> > last year. Please refer to this page if you want to learn more:
> > https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0
> > 
> >
> > The just-released alpha build is a sneak peek into the future of Ignite.
> > It doesn't represent a fully-functional product (no discovery, caches,
> > compute, etc.), but demonstrates major mechanics of how you will
> > interact with Ignite going forward.
>


Re: Looking for feedback on the Ignite 3.0.0 Alpha

2021-01-13 Thread Valentin Kulichenko
The meetup has been scheduled, please RSVP here:
https://www.meetup.com/Apache-Ignite-Virtual-Meetup/events/275722317/

-Val

On Wed, Jan 13, 2021 at 11:21 AM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Link to the Getting Started Guide:
> https://ignite.apache.org/docs/3.0.0-alpha/quick-start/getting-started-guide
>
> -Val
>
> On Tue, Jan 12, 2021 at 7:55 PM Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
>> Igniters,
>>
>> I'm excited to announce that the first alpha build of the Ignite 3 is out
>> and available for download!
>>
>> Ignite 3 is the new project that was initiated by the Ignite community
>> last year. Please refer to this page if you want to learn more:
>> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0
>>
>> The just-released alpha build is a sneak peek into the future of Ignite.
>> It doesn't represent a fully-functional product (no discovery, caches,
>> compute, etc.), but demonstrates major mechanics of how you will interact
>> with Ignite going forward.
>>
>> The main goal of the release is to gather feedback from the community so
>> that we can adjust further development if needed. That said, I would ask
>> and encourage everyone to go through the Getting Started Guide [1] and play
>> with the build. If you have any questions, issues, concerns, wishes,
>> requests, or any other thoughts, please reply directly to this thread. We
>> will carefully accumulate all the feedback and make sure it is considered
>> going forward.
>>
>> Another opportunity to share your feedback will come closer to the end of
>> January when we will have a virtual meetup. I will present a quick demo of
>> the alpha build, after which we will have an open discussion. Please stay
>> tuned - I will send a message here when the meetup is scheduled.
>>
>> -Val
>>
>


Re: Looking for feedback on the Ignite 3.0.0 Alpha

2021-01-13 Thread Valentin Kulichenko
Link to the Getting Started Guide:
https://ignite.apache.org/docs/3.0.0-alpha/quick-start/getting-started-guide

-Val

On Tue, Jan 12, 2021 at 7:55 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Igniters,
>
> I'm excited to announce that the first alpha build of the Ignite 3 is out
> and available for download!
>
> Ignite 3 is the new project that was initiated by the Ignite community
> last year. Please refer to this page if you want to learn more:
> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0
>
> The just-released alpha build is a sneak peek into the future of Ignite.
> It doesn't represent a fully-functional product (no discovery, caches,
> compute, etc.), but demonstrates major mechanics of how you will interact
> with Ignite going forward.
>
> The main goal of the release is to gather feedback from the community so
> that we can adjust further development if needed. That said, I would ask
> and encourage everyone to go through the Getting Started Guide [1] and play
> with the build. If you have any questions, issues, concerns, wishes,
> requests, or any other thoughts, please reply directly to this thread. We
> will carefully accumulate all the feedback and make sure it is considered
> going forward.
>
> Another opportunity to share your feedback will come closer to the end of
> January when we will have a virtual meetup. I will present a quick demo of
> the alpha build, after which we will have an open discussion. Please stay
> tuned - I will send a message here when the meetup is scheduled.
>
> -Val
>


Re: Looking for feedback on the Ignite 3.0.0 Alpha

2021-01-13 Thread sri hari kali charan Tummala
Sure few steps are missing.

Creating a ignite cluster is missing (example 3 ec2 instances)
Creating a ignite cluster on Aws would be nice step to add


Thanks
Sri

On Wed, 13 Jan 2021 at 04:14, Kseniya Romanova 
wrote:

> Here's the link for the online gathering:
> https://www.meetup.com/Apache-Ignite-Virtual-Meetup/events/275722317/
>
>
> ср, 13 янв. 2021 г. в 13:47, Pavel Tupitsyn :
>
>> Getting Started Guide:
>>
>> https://ignite.apache.org/docs/3.0.0-alpha/quick-start/getting-started-guide
>>
>> On Wed, Jan 13, 2021 at 1:29 PM Stephen Darlington <
>> stephen.darling...@gridgain.com> wrote:
>>
>>> What is the link to the Getting Started Guide?
>>>
>>> On 13 Jan 2021, at 03:55, Valentin Kulichenko <
>>> valentin.kuliche...@gmail.com> wrote:
>>>
>>> Igniters,
>>>
>>> I'm excited to announce that the first alpha build of the Ignite 3 is
>>> out and available for download!
>>>
>>> Ignite 3 is the new project that was initiated by the Ignite community
>>> last year. Please refer to this page if you want to learn more:
>>> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0
>>>
>>> The just-released alpha build is a sneak peek into the future of Ignite.
>>> It doesn't represent a fully-functional product (no discovery, caches,
>>> compute, etc.), but demonstrates major mechanics of how you will interact
>>> with Ignite going forward.
>>>
>>> The main goal of the release is to gather feedback from the community so
>>> that we can adjust further development if needed. That said, I would ask
>>> and encourage everyone to go through the Getting Started Guide [1] and play
>>> with the build. If you have any questions, issues, concerns, wishes,
>>> requests, or any other thoughts, please reply directly to this thread. We
>>> will carefully accumulate all the feedback and make sure it is considered
>>> going forward.
>>>
>>> Another opportunity to share your feedback will come closer to the end
>>> of January when we will have a virtual meetup. I will present a quick demo
>>> of the alpha build, after which we will have an open discussion. Please
>>> stay tuned - I will send a message here when the meetup is scheduled.
>>>
>>> -Val
>>>
>>>
>>>
>>> --
Thanks & Regards
Sri Tummala


RE: incorrect partition map exchange behaviour

2021-01-13 Thread tschauenberg
Sorry about mixing the terminology.  My post was meant to be about the PME
and the primary keys.

So the summary of my post and what it was trying to show was the PME was
only happening on cluster node leaves (server or visor) but not cluster node
joins (at least with previously joined nodes - haven't tested with joining a
brand new node for the first time such as expanding the cluster from 3 nodes
to 4 nodes).

The PME doc suggests the PME should happen on the joins but the logs and
visor/stats are showing that's not happening and it's only happening on the
leaves.

So what I am trying to identify is:
* is this a known bug and if so, which versions is this fixed in?
* what is the impact of the database state where one node has no designated
primaries?  
** This probably effectively reduces the get/put performance to n-1 nodes?
** Also, for things like compute tasks that operate on local data such as
those using SqlFieldsQuery.setLocal(true) the node with no primaries will do
nothing?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: incorrect partition map exchange behaviour

2021-01-13 Thread tschauenberg
Haven't tested on 2.9.1 as we don't have that database provisioned and sadly
won't for awhile.  When we do though I will update.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Client stuck on startup

2021-01-13 Thread VeenaMithare
Let me debug this a bit more. Also I will try and capture another set of
thread dumps when this occurs again. This issue occurs on my windows machine
sometimes - doesnt happen all the time. I have not seen this on my linux
env. yet.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Client stuck on startup

2021-01-13 Thread VeenaMithare
I dont think the server dumps were taken after the client disconnected . They
were taken when the client was in the stuck  state. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: 2.8.1 : INFO org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi [] - Accepted incoming communication connection

2021-01-13 Thread VeenaMithare
Kindly confirm the below : 

a. Are you on :2.8.1
b. Is this cod uncommented : 
 ignite.active(true);
addPersistentCacheConfiguration(
ignite);
c. using both server and client on the reproducer?

I get this on all env. ( even in our linux env. )



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: 2.8.1 : INFO org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi [] - Accepted incoming communication connection

2021-01-13 Thread Ilya Kasnacheev
Hello!

Maybe it's some kind of Windows thing or otherwise depends on your
environment? I have tried to run your reproducer, but I never get these
exceptions. Instead, I will only see:

янв 13, 2021 6:38:56 PM org.apache.ignite.logger.java.JavaLogger info
INFO: Accepted incoming communication connection
[locAddr=/0:0:0:0:0:0:0:1:47100, rmtAddr=/0:0:0:0:0:0:0:1:34890]

Once on the server node after client restart.

Regards,
-- 
Ilya Kasnacheev


вт, 12 янв. 2021 г. в 18:11, VeenaMithare :

> This issue is also observed if two different clients exist on the same
> box.
> Steps to reproduce  :
>
> 1. Both the clients are not running
> 2. Start the client1
> 3. Stop the client 1 and start client 2 .
> 4. The huge set of logs are visible on client 2 logs.
>
> regards,
> Veena.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Client stuck on startup

2021-01-13 Thread Ilya Kasnacheev
Hello!

But are they collected after the client node has already left? Not to
mention that server nodes too have an obscene number of threads, and as
such are vulnerable to the same problem.

Regards,
-- 
Ilya Kasnacheev


ср, 13 янв. 2021 г. в 18:25, VeenaMithare :

> Hello,
>
> Thread dump from the server nodes has been provided in the original post,
>
> regards,
> Veena
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Client stuck on startup

2021-01-13 Thread VeenaMithare
Hello, 

Thread dump from the server nodes has been provided in the original post, 

regards,
Veena



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


RE: incorrect partition map exchange behaviour

2021-01-13 Thread Alexandr Shapkin
Hi, As you correctly pointed to the PME implementation details webpage, this is a process of exchanging information about partition holders. And it’s happening on every topology change, cluster deactivation, etc. The process itself is not about data rebalancing, it’s about what node should store a particular partition.  If you want to check whether the data rebalance happened you need to find something like  [2020-01-15 15:46:57,042][INFO ][sys-#50][GridDhtPartitionDemander] Starting rebalance routine [ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], supplier=9e88a103-4465-4e5b-865f-4edaa909fee1, fullPartitions=[0-99], histPartitions=[]] It also depends on whether your cluster is under load during the rolling upgrade, if there are no updates happening then no data rebalance should happen as well.  I’m not pretty sure about the metric and visor. Anyway you can perform the checks explicitly from code: ignite.cache("myCache").localSize(CachePeekMode.BACKUP);ignite.cache("myCache").localSize(CachePeekMode.PRIMARY);   From: tschauenbergSent: Friday, January 8, 2021 3:59 AMTo: user@ignite.apache.orgSubject: incorrect partition map exchange behaviour Hi, We have a cluster of Ignite 2.8.1 server nodes and have recently startedlooking at the individual cache metrics for primary keysorg.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl.OffHeapPrimaryEntriesCount In our configuration we have a replicated cache with 2 backups.  Our clusterhas 3 nodes in it so the primaries should be spread equally on the 3 nodesand each node has backups from the other two nodes.  All these server nodesare in the baseline.  Additionally we have some thick clients connected butI don't think they are relevant to the discussion. Whenever we do a rolling restart one node at a time, at the end after thelast node is restarted it always owns zero primaries and owns solelybackups.  The two nodes restarted earlier during the rolling restart own allthe primaries. When our cluster is in this scenario, if we start and stop visor, when visorleaves the cluster it triggers a PME where all keys get balanced on allserver nodes.  Looking at the visor cache stats between the start and stopwe can see a min of 0 keys on the nodes for our cache so visor and the jmxmetrics line up on that front.  After stopping visor, the jmx metrics showthe evenly distributed primaries and then starting visor a second time wecan confirm that again the min, average, max node keys are all evenlydistributed. Every join and leave during the rolling restart and during visor start/stopshows reflects a topology increment and node leave and join events in thelogs.   According tohttps://cwiki.apache.org/confluence/display/IGNITE/%2528Partition+Map%2529+Exchange+-+under+the+hoodeach leave and join should trigger the PME but we only see the keys changingon the leaves. Additionally, we tried waiting longer between the stop and start part of therolling restart to see if that had any effect.  We ensured we waited longenough for a PME to do any moving but waiting longer didn't have any effect. The stop always has the PME move the keys off that node and the start neversees the PME move any primaries back. Why are we only seeing the PME change keys when nodes (server or visor) stopand never when they join?   --Sent from: http://apache-ignite-users.70518.x6.nabble.com/ 


Re: incorrect partition map exchange behaviour

2021-01-13 Thread Ilya Kasnacheev
Hello!

Does it happen to work on 2.9.1, or will fail too? I recommend checking it
since I vaguely remember some discussions about late affinity assignments
fix.

Regards,
-- 
Ilya Kasnacheev


сб, 9 янв. 2021 г. в 03:11, tschauenberg :

> Here's my attempt to demonstrate and also provide logs
>
> Standup 3 node cluster and load with data
>
>
> Using a thick client, 250k devices are loaded into the device cache.  The
> thick client then leaves.  There's one other thick client connected the
> whole time for serving requests but I think that's irrelevant for the test
> but want to point it out in case someone notices there's still a client
> connected.
>
> Show topology from logs of the client leaving:
>
>
>
> [2021-01-08T23:08:05.012Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
> Node left topology: TcpDiscoveryNode
> [id=611e30ee-b7c6-4ead-a746-f609b206cfb4,
> consistentId=611e30ee-b7c6-4ead-a746-f609b206cfb4, addrs=ArrayList
> [127.0.0.1, 172.17.0.3], sockAddrs=HashSet [/127.0.0.1:0, /172.17.0.3:0],
> discPort=0, order=7, intOrder=6, lastExchangeTime=1610146373751, loc=false,
> ver=2.8.1#20200521-sha1:86422096, isClient=true]
>
> [2021-01-08T23:08:05.013Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
> Topology snapshot [ver=8, locNode=75e4ddea, servers=3, clients=1,
> state=ACTIVE, CPUs=7, offheap=3.0GB, heap=3.1GB]
>
> Start visor on one of the nodes
>
>
> Show topology from logs
>
>
> [2021-01-08T23:30:33.461Z][INFO][tcp-disco-msg-worker-[4ea8efe1
> 10.12.3.76:47500]-#2][TcpDiscoverySpi] New next node
> [newNext=TcpDiscoveryNode [id=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7,
> consistentId=10.12.2.110:47501, addrs=ArrayList [10.12.2.110],
> sockAddrs=HashSet [/10.12.2.110:47501], discPort=47501, order=0,
> intOrder=7,
> lastExchangeTime=1610148633458, loc=false,
> ver=2.8.1#20200521-sha1:86422096,
> isClient=false]]
>
> [2021-01-08T23:30:34.045Z][INFO][sys-#1011][GridDhtPartitionsExchangeFuture]
> Completed partition exchange
> [localNode=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
> [topVer=9, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
> [id=1cca94e3-f15f-4a8b-9f65-d9b9055a5fa7, consistentId=10.12.2.110:47501,
> addrs=ArrayList [10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47501],
> discPort=47501, order=9, intOrder=7, lastExchangeTime=1610148633458,
> loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true,
> newCrdFut=null], topVer=AffinityTopologyVersion [topVer=9, minorTopVer=0]]
>
> Show data balanced in visor
>
>
>
> +---+-+---+-+---+---+---+---++
> | Devices(@c2)  | PARTITIONED | 3 | 25 (0 / 25)
>
> | min: 80315 (0 / 80315)| min: 0| min: 0| min: 0|
> min: 25|
> |   | |   |
>
> | avg: 8.33 (0.00 / 8.33)   | avg: 0.00 | avg: 0.00 | avg: 0.00 |
> avg: 25.00 |
> |   | |   |
>
> | max: 86968 (0 / 86968)| max: 0| max: 0| max: 0|
> max: 25|
>
> +---+-+---+-+---+---+---+---++
>
> At this point the data is all relatively balanced and the topology
> increased
> when visor connected.
>
> Stop ignite on one node
>
>
> Show topology and PME from logs (from a different ignite node as the ignite
> process was stopped)
>
>
>
> [2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
> Node left topology: TcpDiscoveryNode
> [id=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
> consistentId=3a4a497f-5a89-4f2c-8531-b2b05f2ede22, addrs=ArrayList
> [10.12.2.110], sockAddrs=HashSet [/10.12.2.110:47500], discPort=47500,
> order=3, intOrder=3, lastExchangeTime=1610139164908, loc=false,
> ver=2.8.1#20200521-sha1:86422096, isClient=false]
>
> [2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
> Topology snapshot [ver=10, locNode=4ea8efe1, servers=2, clients=1,
> state=ACTIVE, CPUs=5, offheap=2.0GB, heap=2.1GB]
> [2021-01-08T23:35:39.333Z][INFO][disco-event-worker-#40][GridDiscoveryManager]
>
> ^-- Baseline [id=0, size=3, online=2, offline=1]
> [2021-01-08T23:35:39.335Z][INFO][exchange-worker-#41][time] Started
> exchange
> init [topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0], crd=true,
> evt=NODE_LEFT, evtNode=75e4ddea-1927-4e93-82e9-fdfbb7b58d1c,
> customEvt=null,
> allowMerge=false, exchangeFreeSwitch=true]
> [2021-01-08T23:35:39.338Z][INFO][sys-#1031][GridAffinityAssignmentCache]
> Local node affinity assignment distribution is not ideal [cache=Households,
> expectedPrimary=512.00, actualPrimary=548, expectedBackups=1024.00,
> actualBackups=476, warningThreshold=50.00%]
> 

Re: Client stuck on startup

2021-01-13 Thread Ilya Kasnacheev
Hello!

Please provide full thread dump from server nodes.

Of course, you will need to kill client JVM first.

Regards,
-- 
Ilya Kasnacheev


ср, 13 янв. 2021 г. в 17:09, VeenaMithare :

>
> a. If you see the thread dump, it shows these locked synchronizers
>Locked ownable synchronizers:
> - <0x0006da73f540> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> - <0x0006da73f690> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> - <0x0006da84add8> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> - <0x0006da84aee0> (a
>
>
> If the shutdown is clean, we dont see these locked synchronizers .
>
> Looks like it has registered some state with the server before the shutdown
> hook was invoked. Also this registered state is now preventing further
> restarts .
>
> b. Even though the log says 'invoking shutdown hook' , the jvm does not
> shutdown - it is blocked on starting the ignite.
>
> I will try and see if I can debug more.
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Client stuck on startup

2021-01-13 Thread VeenaMithare


a. If you see the thread dump, it shows these locked synchronizers 
   Locked ownable synchronizers:
- <0x0006da73f540> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
- <0x0006da73f690> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
- <0x0006da84add8> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
- <0x0006da84aee0> (a


If the shutdown is clean, we dont see these locked synchronizers .

Looks like it has registered some state with the server before the shutdown
hook was invoked. Also this registered state is now preventing further
restarts . 

b. Even though the log says 'invoking shutdown hook' , the jvm does not
shutdown - it is blocked on starting the ignite.

I will try and see if I can debug more. 




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Client stuck on startup

2021-01-13 Thread Ilya Kasnacheev
Hello!

I'm not sure what happens on the server nodes. Restarting the JVM with this
client should be enough.

Regards,
-- 
Ilya Kasnacheev


вт, 12 янв. 2021 г. в 19:14, VeenaMithare :

> Okay, thanks Ilya .  After it gets this issue, the app doesnt startup till
> I
> restart my server nodes.
>
> Is there anyway I can ensure clean shutdown when I face issues like this  ?
> It looks like some ignite state is not cleaned up .
>
> regards,
> Veena.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


RE: Critical Workers Health Check on client side

2021-01-13 Thread Alexandr Shapkin
Hi, It’s too internal details and it’s not possible to catch *Critical Workers HealthCheck* on client side.  I think you might want to listen for Ignite events on your clients and apply your custom logic accordingly. The most essential candidate - EVT_NODE_FAILED You might check more available events here: https://ignite.apache.org/docs/latest/events/events and here:https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/events/EventType.html   From: ihalilaltunSent: Monday, January 4, 2021 2:52 PMTo: user@ignite.apache.orgSubject: Critical Workers Health Check on client side hi there, I am curious about whether we can manage somehow *Critical Workers HealthCheck*on client side? What i need to do is catch critical workers healthcheck results on client side, can this be done by implementing customStopNodeOrHaltFailureHandler on client side? We are on ignite v2.7.6 thanks   -İbrahim Halil AltunSenior Software Engineer @ Segmentify--Sent from: http://apache-ignite-users.70518.x6.nabble.com/ 


Re: Looking for feedback on the Ignite 3.0.0 Alpha

2021-01-13 Thread Kseniya Romanova
Here's the link for the online gathering:
https://www.meetup.com/Apache-Ignite-Virtual-Meetup/events/275722317/


ср, 13 янв. 2021 г. в 13:47, Pavel Tupitsyn :

> Getting Started Guide:
>
> https://ignite.apache.org/docs/3.0.0-alpha/quick-start/getting-started-guide
>
> On Wed, Jan 13, 2021 at 1:29 PM Stephen Darlington <
> stephen.darling...@gridgain.com> wrote:
>
>> What is the link to the Getting Started Guide?
>>
>> On 13 Jan 2021, at 03:55, Valentin Kulichenko <
>> valentin.kuliche...@gmail.com> wrote:
>>
>> Igniters,
>>
>> I'm excited to announce that the first alpha build of the Ignite 3 is out
>> and available for download!
>>
>> Ignite 3 is the new project that was initiated by the Ignite community
>> last year. Please refer to this page if you want to learn more:
>> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0
>>
>> The just-released alpha build is a sneak peek into the future of Ignite.
>> It doesn't represent a fully-functional product (no discovery, caches,
>> compute, etc.), but demonstrates major mechanics of how you will interact
>> with Ignite going forward.
>>
>> The main goal of the release is to gather feedback from the community so
>> that we can adjust further development if needed. That said, I would ask
>> and encourage everyone to go through the Getting Started Guide [1] and play
>> with the build. If you have any questions, issues, concerns, wishes,
>> requests, or any other thoughts, please reply directly to this thread. We
>> will carefully accumulate all the feedback and make sure it is considered
>> going forward.
>>
>> Another opportunity to share your feedback will come closer to the end of
>> January when we will have a virtual meetup. I will present a quick demo of
>> the alpha build, after which we will have an open discussion. Please stay
>> tuned - I will send a message here when the meetup is scheduled.
>>
>> -Val
>>
>>
>>
>>


Re: Ever increasing startup times as data grow in persistent storage

2021-01-13 Thread Pavel Tupitsyn
Raymond,

Please use ICluster.SetActive [1] instead, the API linked above is obsolete


[1]
https://ignite.apache.org/releases/latest/dotnetdoc/api/Apache.Ignite.Core.Cluster.ICluster.html?#Apache_Ignite_Core_Cluster_ICluster_SetActive_System_Boolean_

On Wed, Jan 13, 2021 at 11:54 AM Raymond Wilson 
wrote:

> Of course. Obvious! :)
>
> Sent from my iPhone
>
> On 13/01/2021, at 9:15 PM, Zhenya Stanilovsky  wrote:
>
> 
>
>
>
>
>
> Is there an API version of the cluster deactivation?
>
>
>
> https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Cache/PersistentStoreTestObsolete.cs#L131
>
>
> On Wed, Jan 13, 2021 at 8:28 PM Zhenya Stanilovsky  > wrote:
>
>
>
>
>
> Hi Zhenya,
>
> Thanks for confirming performing checkpoints more often will help here.
>
> Hi Raymond !
>
>
> I have established this configuration so will experiment with settings
> little.
>
> On a related note, is there any way to automatically trigger a checkpoint,
> for instance as a pre-shutdown activity?
>
>
> If you shutdown your cluster gracefully = with deactivation [1] further
> start will not trigger wal readings.
>
> [1]
> https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster
>
>
> Checkpoints seem to be much faster than the process of applying WAL
> updates.
>
> Raymond.
>
> On Wed, Jan 13, 2021 at 8:07 PM Zhenya Stanilovsky  > wrote:
>
>
>
>
>
>
> We have noticed that startup time for our server nodes has been slowly
> increasing in time as the amount of data stored in the persistent store
> grows.
>
> This appears to be closely related to recovery of WAL changes that were
> not checkpointed at the time the node was stopped.
>
> After enabling debug logging we see that the WAL file is scanned, and for
> every cache, all partitions in the cache are examined, and if there are any
> uncommitted changes in the WAL file then the partition is updated (I assume
> this requires reading of the partition itself as a part of this process).
>
> We now have ~150Gb of data in our persistent store and we see WAL update
> times between 5-10 minutes to complete, during which the node is
> unavailable.
>
> We use fairly large WAL files (512Mb) and use 10 segments, with WAL
> archiving enabled.
>
> We anticipate data in persistent storage to grow to Terabytes, and if the
> startup time continues to grow as storage grows then this makes deploys and
> restarts difficult.
>
> Until now we have been using the default checkpoint time out of 3 minutes
> which may mean we have significant uncheckpointed data in the WAL files. We
> are moving to 1 minute checkpoint but don't yet know if this improve
> startup times. We also use the default 1024 partitions per cache, though
> some partitions may be large.
>
> Can anyone confirm this is expected behaviour and recommendations for
> resolving it?
>
> Will reducing checking pointing intervals help?
>
>
> yes, it will help. Check
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>
> Is the entire content of a partition read while applying WAL changes?
>
>
> don`t think so, may be someone else suggest here?
>
> Does anyone else have this issue?
>
> Thanks,
> Raymond.
>
>
> --
> 
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wil...@trimble.com
> 
>
>
>
> 
>
>
>
>
>
>
>
>
> --
> 
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wil...@trimble.com
> 
>
>
>
> 
>
>
>
>
>
>
>
>
> --
> 
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wil...@trimble.com
> 
>
>
>
> 
>
>
>
>
>
>
>


Re: Looking for feedback on the Ignite 3.0.0 Alpha

2021-01-13 Thread Pavel Tupitsyn
Getting Started Guide:
https://ignite.apache.org/docs/3.0.0-alpha/quick-start/getting-started-guide

On Wed, Jan 13, 2021 at 1:29 PM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> What is the link to the Getting Started Guide?
>
> On 13 Jan 2021, at 03:55, Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
> Igniters,
>
> I'm excited to announce that the first alpha build of the Ignite 3 is out
> and available for download!
>
> Ignite 3 is the new project that was initiated by the Ignite community
> last year. Please refer to this page if you want to learn more:
> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0
>
> The just-released alpha build is a sneak peek into the future of Ignite.
> It doesn't represent a fully-functional product (no discovery, caches,
> compute, etc.), but demonstrates major mechanics of how you will interact
> with Ignite going forward.
>
> The main goal of the release is to gather feedback from the community so
> that we can adjust further development if needed. That said, I would ask
> and encourage everyone to go through the Getting Started Guide [1] and play
> with the build. If you have any questions, issues, concerns, wishes,
> requests, or any other thoughts, please reply directly to this thread. We
> will carefully accumulate all the feedback and make sure it is considered
> going forward.
>
> Another opportunity to share your feedback will come closer to the end of
> January when we will have a virtual meetup. I will present a quick demo of
> the alpha build, after which we will have an open discussion. Please stay
> tuned - I will send a message here when the meetup is scheduled.
>
> -Val
>
>
>
>


Re: Looking for feedback on the Ignite 3.0.0 Alpha

2021-01-13 Thread Stephen Darlington
What is the link to the Getting Started Guide?

> On 13 Jan 2021, at 03:55, Valentin Kulichenko  
> wrote:
> 
> Igniters,
> 
> I'm excited to announce that the first alpha build of the Ignite 3 is out and 
> available for download!
> 
> Ignite 3 is the new project that was initiated by the Ignite community last 
> year. Please refer to this page if you want to learn more: 
> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0 
> 
> 
> The just-released alpha build is a sneak peek into the future of Ignite. It 
> doesn't represent a fully-functional product (no discovery, caches, compute, 
> etc.), but demonstrates major mechanics of how you will interact with Ignite 
> going forward.
> 
> The main goal of the release is to gather feedback from the community so that 
> we can adjust further development if needed. That said, I would ask and 
> encourage everyone to go through the Getting Started Guide [1] and play with 
> the build. If you have any questions, issues, concerns, wishes, requests, or 
> any other thoughts, please reply directly to this thread. We will carefully 
> accumulate all the feedback and make sure it is considered going forward.
> 
> Another opportunity to share your feedback will come closer to the end of 
> January when we will have a virtual meetup. I will present a quick demo of 
> the alpha build, after which we will have an open discussion. Please stay 
> tuned - I will send a message here when the meetup is scheduled.
> 
> -Val




Re: Ever increasing startup times as data grow in persistent storage

2021-01-13 Thread Raymond Wilson
Of course. Obvious! :)

Sent from my iPhone

On 13/01/2021, at 9:15 PM, Zhenya Stanilovsky  wrote:







Is there an API version of the cluster deactivation?


https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Cache/PersistentStoreTestObsolete.cs#L131


On Wed, Jan 13, 2021 at 8:28 PM Zhenya Stanilovsky > wrote:





Hi Zhenya,

Thanks for confirming performing checkpoints more often will help here.

Hi Raymond !


I have established this configuration so will experiment with settings
little.

On a related note, is there any way to automatically trigger a checkpoint,
for instance as a pre-shutdown activity?


If you shutdown your cluster gracefully = with deactivation [1] further
start will not trigger wal readings.

[1]
https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster


Checkpoints seem to be much faster than the process of applying WAL updates.

Raymond.

On Wed, Jan 13, 2021 at 8:07 PM Zhenya Stanilovsky http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:






We have noticed that startup time for our server nodes has been slowly
increasing in time as the amount of data stored in the persistent store
grows.

This appears to be closely related to recovery of WAL changes that were not
checkpointed at the time the node was stopped.

After enabling debug logging we see that the WAL file is scanned, and for
every cache, all partitions in the cache are examined, and if there are any
uncommitted changes in the WAL file then the partition is updated (I assume
this requires reading of the partition itself as a part of this process).

We now have ~150Gb of data in our persistent store and we see WAL update
times between 5-10 minutes to complete, during which the node is
unavailable.

We use fairly large WAL files (512Mb) and use 10 segments, with WAL
archiving enabled.

We anticipate data in persistent storage to grow to Terabytes, and if the
startup time continues to grow as storage grows then this makes deploys and
restarts difficult.

Until now we have been using the default checkpoint time out of 3 minutes
which may mean we have significant uncheckpointed data in the WAL files. We
are moving to 1 minute checkpoint but don't yet know if this improve
startup times. We also use the default 1024 partitions per cache, though
some partitions may be large.

Can anyone confirm this is expected behaviour and recommendations for
resolving it?

Will reducing checking pointing intervals help?


yes, it will help. Check
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood

Is the entire content of a partition read while applying WAL changes?


don`t think so, may be someone else suggest here?

Does anyone else have this issue?

Thanks,
Raymond.


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com












--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com












--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com






Ignite best practice for restarting k8s pod

2021-01-13 Thread vbm
Hi,

I had raised this ticket:
https://issues.apache.org/jira/browse/IGNITE-13974


Currently I do not see any cleanup functions getting called when we do a
'kubectl delete pod'.

May I know, what is the best practice for restarting k8s ignite pod ?
How do we handle scenario when we need to scale down Ignite pods ? I think
internally when we do kubectl scale down it calls kubectl delete pod.


Regards,
Vishwas



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Looking for feedback on the Ignite 3.0.0 Alpha

2021-01-13 Thread Wesley Peng

When will the stable version of 3.0 get released? thanks.

Valentin Kulichenko wrote:
I'm excited to announce that the first alpha build of the Ignite 3 is 
out and available for download!


Ignite 3 is the new project that was initiated by the Ignite community 
last year. Please refer to this page if you want to learn more: 
https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0 



The just-released alpha build is a sneak peek into the future of Ignite. 
It doesn't represent a fully-functional product (no discovery, caches, 
compute, etc.), but demonstrates major mechanics of how you will 
interact with Ignite going forward.


Re[4]: Ever increasing startup times as data grow in persistent storage

2021-01-13 Thread Zhenya Stanilovsky




 
>Is there an API version of the cluster deactivation?
 
https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Cache/PersistentStoreTestObsolete.cs#L131
 
>On Wed, Jan 13, 2021 at 8:28 PM Zhenya Stanilovsky < arzamas...@mail.ru > 
>wrote:
>>
>>
>> 
>>>Hi Zhenya,
>>> 
>>>Thanks for confirming performing checkpoints more often will help here.
>>Hi Raymond !
>>> 
>>>I have established this configuration so will experiment with settings 
>>>little.
>>> 
>>>On a related note, is there any way to automatically trigger a checkpoint, 
>>>for instance as a pre-shutdown activity?
>> 
>>If you shutdown your cluster gracefully = with deactivation [1] further start 
>>will not trigger wal readings.
>> 
>>[1]  
>>https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster
>> 
>>>Checkpoints seem to be much faster than the process of applying WAL updates.
>>> 
>>>Raymond.  
>>>On Wed, Jan 13, 2021 at 8:07 PM Zhenya Stanilovsky < arzamas...@mail.ru > 
>>>wrote:



 
>We have noticed that startup time for our server nodes has been slowly 
>increasing in time as the amount of data stored in the persistent store 
>grows.
> 
>This appears to be closely related to recovery of WAL changes that were 
>not checkpointed at the time the node was stopped.
> 
>After enabling debug logging we see that the WAL file is scanned, and for 
>every cache, all partitions in the cache are examined, and if there are 
>any uncommitted changes in the WAL file then the partition is updated (I 
>assume this requires reading of the partition itself as a part of this 
>process).
> 
>We now have ~150Gb of data in our persistent store and we see WAL update 
>times between 5-10 minutes to complete, during which the node is 
>unavailable.
> 
>We use fairly large WAL files (512Mb) and use 10 segments, with WAL 
>archiving enabled.
> 
>We anticipate data in persistent storage to grow to Terabytes, and if the 
>startup time continues to grow as storage grows then this makes deploys 
>and restarts difficult.
> 
>Until now we have been using the default checkpoint time out of 3 minutes 
>which may mean we have significant uncheckpointed data in the WAL files. 
>We are moving to 1 minute checkpoint but don't yet know if this improve 
>startup times. We also use the default 1024 partitions per cache, though 
>some partitions may be large. 
> 
>Can anyone confirm this is expected behaviour and recommendations for 
>resolving it?
> 
>Will reducing checking pointing intervals help?
 
yes, it will help. Check  
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>Is the entire content of a partition read while applying WAL changes?
 
don`t think so, may be someone else suggest here?
>Does anyone else have this issue?
> 
>Thanks,
>Raymond.
> 
>  --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>11 Birmingham Drive |  Christchurch, New Zealand
>raymond_wil...@trimble.com
>         
> 
 
 
 
  
>>> 
>>>  --
>>>
>>>Raymond Wilson
>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>raymond_wil...@trimble.com
>>>         
>>> 
>> 
>> 
>> 
>>  
> 
>  --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>11 Birmingham Drive |  Christchurch, New Zealand
>raymond_wil...@trimble.com
>         
>