subject:"\[jira\] \[Commented\] \(KAFKA\-1712\) Excessive storage usage on newly added node"

[jira] [Commented] (KAFKA-1712) Excessive storage usage on newly added node

2017-02-28 Thread Alan Braithwaite (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889137#comment-15889137
 ] 

Alan Braithwaite commented on KAFKA-1712:
-

Has this been looked at recently?  We've found it's an issue for nodes which 
are coming back into a cluster as well.

> Excessive storage usage on newly added node
> ---
>
> Key: KAFKA-1712
> URL: https://issues.apache.org/jira/browse/KAFKA-1712
> Project: Kafka
>  Issue Type: Bug
>Reporter: Oleg Golovin
>
> When a new node is added to cluster data starts replicating into it. The 
> mtime of creating segments will be set on the last message being written to 
> them. Though the replication is a prolonged process, let's assume (for 
> simplicity of explanation) that their mtime is very close to the time when 
> the new node was added.
> After the replication is done, new data will start to flow into this new 
> node. After `log.retention.hours` the amount of data will be 2 * 
> daily_amount_of_data_in_kafka_node (first one is the replicated data from 
> other nodes when the node was added (let us call it `t1`) and the second is 
> the amount of replicated data from other nodes which happened from `t1` to 
> `t1 + log.retention.hours`). So by that time the node will have twice as much 
> data as the other nodes.
> This poses a big problem to us as our storage is chosen to fit normal amount 
> of data (not twice this amount).
> In our particular case it poses another problem. We have an emergency segment 
> cleaner which runs in case storage is nearly full (>90%). We try to balance 
> the amount of data for it not to run to rely solely on kafka internal log 
> deletion, but sometimes emergency cleaner runs.
> It works this way:
> - it gets all kafka segments for the volume
> - it filters out last segments of each partition (just to avoid unnecessary 
> recreation of last small-size segments)
> - it sorts them by segment mtime
> - it changes mtime of the first N segements (with the lowest mtime) to 1, so 
> they become really really old. Number N is chosen to free specified 
> percentage of volume (3% in our case).  Kafka deletes these segments later 
> (as they are very old).
> Emergency cleaner works very well. Except for the case when the data is 
> replicated to the newly added node. 
> In this case segment mtime is the time the segment was replicated and does 
> not reflect the real creation time of original data stored in this segment.
> So in this case kafka emergency cleaner will delete segments with the lowest 
> mtime, which may hold the data which is much more recent than the data in 
> other segments.
> This is not a big problem until we delete the data which hasn't been fully 
> consumed.
> In this case we loose data and this makes it a big problem.
> Is it possible to retain segment mtime during initial replication on a new 
> node?
> This will help not to load the new node with the twice as large amount of 
> data as other nodes have.
> Or maybe there are another ways to sort segments by data creation times (or 
> close to data creation time)? (for example if this ticket is implemented 
> https://issues.apache.org/jira/browse/KAFKA-1403, we may take time of the 
> first message from .index). In our case it will help with kafka emergency 
> cleaner, which will be deleting really the oldest data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KAFKA-1712) Excessive storage usage on newly added node

2014-10-20 Thread Jun Rao (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177500#comment-14177500
]

Jun Rao commented on KAFKA-1712:

This is being discussed in
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Enriched+Message+Metadata

Excessive storage usage on newly added node
---

Key: KAFKA-1712
URL: https://issues.apache.org/jira/browse/KAFKA-1712
Project: Kafka
Issue Type: Bug
Reporter: Oleg Golovin

When a new node is added to cluster data starts replicating into it. The
mtime of creating segments will be set on the last message being written to
them. Though the replication is a prolonged process, let's assume (for
simplicity of explanation) that their mtime is very close to the time when
the new node was added.
After the replication is done, new data will start to flow into this new
node. After `log.retention.hours` the amount of data will be 2 *
daily_amount_of_data_in_kafka_node (first one is the replicated data from
other nodes when the node was added (let us call it `t1`) and the second is
the amount of replicated data from other nodes which happened from `t1` to
`t1 + log.retention.hours`). So by that time the node will have twice as much
data as the other nodes.
This poses a big problem to us as our storage is chosen to fit normal amount
of data (not twice this amount).
In our particular case it poses another problem. We have an emergency segment
cleaner which runs in case storage is nearly full (90%). We try to balance
the amount of data for it not to run to rely solely on kafka internal log
deletion, but sometimes emergency cleaner runs.
It works this way:
- it gets all kafka segments for the volume
- it filters out last segments of each partition (just to avoid unnecessary
recreation of last small-size segments)
- it sorts them by segment mtime
- it changes mtime of the first N segements (with the lowest mtime) to 1, so
they become really really old. Number N is chosen to free specified
percentage of volume (3% in our case). Kafka deletes these segments later
(as they are very old).
Emergency cleaner works very well. Except for the case when the data is
replicated to the newly added node.
In this case segment mtime is the time the segment was replicated and does
not reflect the real creation time of original data stored in this segment.
So in this case kafka emergency cleaner will delete segments with the lowest
mtime, which may hold the data which is much more recent than the data in
other segments.
This is not a big problem until we delete the data which hasn't been fully
consumed.
In this case we loose data and this makes it a big problem.
Is it possible to retain segment mtime during initial replication on a new
node?
This will help not to load the new node with the twice as large amount of
data as other nodes have.
Or maybe there are another ways to sort segments by data creation times (or
close to data creation time)? (for example if this ticket is implemented
https://issues.apache.org/jira/browse/KAFKA-1403, we may take time of the
first message from .index). In our case it will help with kafka emergency
cleaner, which will be deleting really the oldest data.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1712) Excessive storage usage on newly added node

[jira] [Commented] (KAFKA-1712) Excessive storage usage on newly added node

2 matches

Site Navigation

Mail list logo

Footer information