Alexey
I'm ok for the suggested way in [1]
1. https://issues.apache.org/jira/browse/IGNITE-12263
On Tue, Oct 8, 2019 at 9:59 PM Denis Magda wrote:
> Anton,
>
> Seems like we have a name for the defragmentation mode with a downtime -
> Rolling Defrag )
>
> -
> Denis
>
>
> On Mon, Oct 7, 2019
Anton,
Seems like we have a name for the defragmentation mode with a downtime -
Rolling Defrag )
-
Denis
On Mon, Oct 7, 2019 at 11:04 PM Anton Vinogradov wrote:
> Denis,
>
> I like the idea that defragmentation is just an additional step on a node
> (re)start like we perform PDS recovery
Denis,
I like the idea that defragmentation is just an additional step on a node
(re)start like we perform PDS recovery now.
We may just use special key to specify node should defragment persistence
on (re)start.
Defragmentation can be the part of Rolling Upgrade in this case :)
It seems to be
Alex, thanks for the summary and proposal. Anton, Ivan and others who took
part in this discussion, what're your thoughts? I see this
rolling-upgrades-based approach as a reasonable solution. Even though a
node shutdown is expected, the procedure doesn't lead to the cluster outage
meaning it can
Created a ticket for the first stage of this improvement. This can be a
first change towards the online mode suggested by Sergey and Anton.
https://issues.apache.org/jira/browse/IGNITE-12263
пт, 4 окт. 2019 г. в 19:38, Alexey Goncharuk :
> Maxim,
>
> Having a cluster-wide lock for a cache does
Hello!
I think that good robust approach is to start background thread which will
try to compact pages and remove unneeded ones. It should only be active
when system is reasonably idle, or if there's severe fragmentation problem.
However, I am aware that implementing such heurestical cleaner is
Maxim,
Having a cluster-wide lock for a cache does not improve availability of the
solution. A user cannot defragment a cache if the cache is involved in a
mission-critical operation, so having a lock on such a cache is equivalent
to the whole cluster shutdown.
We should decide between either a
Igniters,
This thread seems to be endless, but we if some kind of cache group
distributed write lock (exclusive for some of the internal Ignite
process) will be introduced? I think it will help to solve a batch of
problems, like:
1. defragmentation of all cache group partitions on the local node
Hi
I'm not sure that node offline is a best way to do that.
Cons:
- different caches may have different defragmentation but we force to stop
whole node
- offline node is a maintenance operation will require to add +1 backup to
reduce the risk of data loss
- baseline auto adjustment?
- impact
Alexey,
As for me, it does not matter will it be IEP, umbrella or a single issue.
The most important thing is Assignee :)
On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk
wrote:
> Anton, do you think we should file a single ticket for this or should we go
> with an IEP? As of now, the change
Anton, do you think we should file a single ticket for this or should we go
with an IEP? As of now, the change does not look big enough for an IEP for
me.
чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov :
> Alexey,
>
> Sounds good to me.
>
> On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
>
Alexey,
Sounds good to me.
On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk
wrote:
> Anton,
>
> Switching a partition to and from the SHRINKING state will require
> intricate synchronizations in order to properly determine the start
> position for historical rebalance without PME.
>
> I would
Anton,
Switching a partition to and from the SHRINKING state will require
intricate synchronizations in order to properly determine the start
position for historical rebalance without PME.
I would still go with an offline-node approach, but instead of cleaning the
persistence, we can do
Alexei,
>> stopping fragmented node and removing partition data, then starting it
again
That's exactly what we're doing to solve the fragmentation issue.
The problem here is that we have to perform N/B restart-rebalance
operations (N - cluster size, B - backups count) and it takes a lot of time
Probably this should be allowed to do using public API, actually this is
same as manual rebalancing.
пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
alexey.scherbak...@gmail.com>:
> The poor man's solution for the problem would be stopping fragmented node
> and removing partition data, then
The poor man's solution for the problem would be stopping fragmented node
and removing partition data, then starting it again allowing full state
transfer already without deletes.
Rinse and repeat for all owners.
Anton Vinogradov, would this work for you as workaround ?
чт, 19 сент. 2019 г. в
Alexey,
Let's combine your and Ivan's proposals.
>> vacuum command, which acquires exclusive table lock, so no concurrent
activities on the table are possible.
and
>> Could the problem be solved by stopping a node which needs to be
defragmented, clearing persistence files and restarting the
Anton,
> >> The solution which Anton suggested does not look easy because it will
> most likely significantly hurt performance
> Mostly agree here, but what drop do we expect? What price do we ready to
> pay?
> Not sure, but seems some vendors ready to pay, for example, 5% drop for
> this.
5%
Alexey,
>> The solution which Anton suggested does not look easy because it will
most likely significantly hurt performance
Mostly agree here, but what drop do we expect? What price do we ready to
pay?
Not sure, but seems some vendors ready to pay, for example, 5% drop for
this.
>> it is hard
Denis,
It's not fundamental, but quite complex. In postgres, for example, this is
not maintained automatically and store compaction is performed using the
full vacuum command, which acquires exclusive table lock, so no concurrent
activities on the table are possible.
The solution which Anton
The issue starts hitting others who deploy Ignite persistence in production:
https://issues.apache.org/jira/browse/IGNITE-12152
Alex, I'm curious is this a fundamental problem. Asked the same question in
JIRA but, probably, this discussion is a better place to get to the bottom
first:
Dmitriy,
This does not look like a production-ready case :)
How about
1) Once you need to write an entry - you have to chose not random "page
from free-list with enough space"
but "page from free-list with enough space closest to the beginning of the
file".
2) Once you remove entry you have to
In the TC Bot, I used to create the second cache with CacheV2 name and
migrate needed data from Cache V1 to V2.
After CacheV1 destroy(), files are removed and disk space is freed.
ср, 9 янв. 2019 г. в 12:04, Павлухин Иван :
> Vyacheslav,
>
> Have you investigated how other vendors (Oracle,
Vyacheslav,
Have you investigated how other vendors (Oracle, Postgres) tackle this problem?
I have one wild idea. Could the problem be solved by stopping a node
which need to be defragmented, clearing persistence files and
restarting the node? After rebalance the node will receive all data
back
Yes, it's about Page Memory defragmentation.
Pages in partitions files are stored sequentially, possible, it makes
sense to defragment pages first to avoid interpages gaps since we use
pages offset to manage them.
I filled an issue [1], I hope we will be able to find resources to
solve the issue
I suppose it is about Ignite Page Memory pages defragmentation.
We can get 100 allocated pages each of which becomes only e.g. 50%
filled after removal some entries. But they will occupy a space for
100 pages on a hard drive.
пт, 28 дек. 2018 г. в 20:45, Denis Magda :
>
> Shouldn't the OS care
Shouldn't the OS care of defragmentation? What we need to do is to give a
way to remove stale data and "release" the allocated space somehow through
the tools, MBeans or API methods.
--
Denis
On Fri, Dec 28, 2018 at 6:24 AM Vladimir Ozerov
wrote:
> Hi Vyacheslav,
>
> AFAIK this is not
Igniters, we have faced with the following problem on one of our deployments.
Let's imagine that we have used IgniteCache with enabled PDS during the time:
- hardware disc space has been occupied during growing up of an amount
of data, e.g. 100Gb;
- then, we removed non-actual data, e.g 50Gb,
28 matches
Mail list logo