[jira] [Commented] (KAFKA-1712) Excessive storage usage on newly added node

2017-02-28 Thread Alan Braithwaite (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889137#comment-15889137
 ] 

Alan Braithwaite commented on KAFKA-1712:
-

Has this been looked at recently?  We've found it's an issue for nodes which 
are coming back into a cluster as well.

> Excessive storage usage on newly added node
> ---
>
> Key: KAFKA-1712
> URL: https://issues.apache.org/jira/browse/KAFKA-1712
> Project: Kafka
>  Issue Type: Bug
>Reporter: Oleg Golovin
>
> When a new node is added to cluster data starts replicating into it. The 
> mtime of creating segments will be set on the last message being written to 
> them. Though the replication is a prolonged process, let's assume (for 
> simplicity of explanation) that their mtime is very close to the time when 
> the new node was added.
> After the replication is done, new data will start to flow into this new 
> node. After `log.retention.hours` the amount of data will be 2 * 
> daily_amount_of_data_in_kafka_node (first one is the replicated data from 
> other nodes when the node was added (let us call it `t1`) and the second is 
> the amount of replicated data from other nodes which happened from `t1` to 
> `t1 + log.retention.hours`). So by that time the node will have twice as much 
> data as the other nodes.
> This poses a big problem to us as our storage is chosen to fit normal amount 
> of data (not twice this amount).
> In our particular case it poses another problem. We have an emergency segment 
> cleaner which runs in case storage is nearly full (>90%). We try to balance 
> the amount of data for it not to run to rely solely on kafka internal log 
> deletion, but sometimes emergency cleaner runs.
> It works this way:
> - it gets all kafka segments for the volume
> - it filters out last segments of each partition (just to avoid unnecessary 
> recreation of last small-size segments)
> - it sorts them by segment mtime
> - it changes mtime of the first N segements (with the lowest mtime) to 1, so 
> they become really really old. Number N is chosen to free specified 
> percentage of volume (3% in our case).  Kafka deletes these segments later 
> (as they are very old).
> Emergency cleaner works very well. Except for the case when the data is 
> replicated to the newly added node. 
> In this case segment mtime is the time the segment was replicated and does 
> not reflect the real creation time of original data stored in this segment.
> So in this case kafka emergency cleaner will delete segments with the lowest 
> mtime, which may hold the data which is much more recent than the data in 
> other segments.
> This is not a big problem until we delete the data which hasn't been fully 
> consumed.
> In this case we loose data and this makes it a big problem.
> Is it possible to retain segment mtime during initial replication on a new 
> node?
> This will help not to load the new node with the twice as large amount of 
> data as other nodes have.
> Or maybe there are another ways to sort segments by data creation times (or 
> close to data creation time)? (for example if this ticket is implemented 
> https://issues.apache.org/jira/browse/KAFKA-1403, we may take time of the 
> first message from .index). In our case it will help with kafka emergency 
> cleaner, which will be deleting really the oldest data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-1712) Excessive storage usage on newly added node

2014-10-20 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177500#comment-14177500
 ] 

Jun Rao commented on KAFKA-1712:


This is being discussed in 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Enriched+Message+Metadata

 Excessive storage usage on newly added node
 ---

 Key: KAFKA-1712
 URL: https://issues.apache.org/jira/browse/KAFKA-1712
 Project: Kafka
  Issue Type: Bug
Reporter: Oleg Golovin

 When a new node is added to cluster data starts replicating into it. The 
 mtime of creating segments will be set on the last message being written to 
 them. Though the replication is a prolonged process, let's assume (for 
 simplicity of explanation) that their mtime is very close to the time when 
 the new node was added.
 After the replication is done, new data will start to flow into this new 
 node. After `log.retention.hours` the amount of data will be 2 * 
 daily_amount_of_data_in_kafka_node (first one is the replicated data from 
 other nodes when the node was added (let us call it `t1`) and the second is 
 the amount of replicated data from other nodes which happened from `t1` to 
 `t1 + log.retention.hours`). So by that time the node will have twice as much 
 data as the other nodes.
 This poses a big problem to us as our storage is chosen to fit normal amount 
 of data (not twice this amount).
 In our particular case it poses another problem. We have an emergency segment 
 cleaner which runs in case storage is nearly full (90%). We try to balance 
 the amount of data for it not to run to rely solely on kafka internal log 
 deletion, but sometimes emergency cleaner runs.
 It works this way:
 - it gets all kafka segments for the volume
 - it filters out last segments of each partition (just to avoid unnecessary 
 recreation of last small-size segments)
 - it sorts them by segment mtime
 - it changes mtime of the first N segements (with the lowest mtime) to 1, so 
 they become really really old. Number N is chosen to free specified 
 percentage of volume (3% in our case).  Kafka deletes these segments later 
 (as they are very old).
 Emergency cleaner works very well. Except for the case when the data is 
 replicated to the newly added node. 
 In this case segment mtime is the time the segment was replicated and does 
 not reflect the real creation time of original data stored in this segment.
 So in this case kafka emergency cleaner will delete segments with the lowest 
 mtime, which may hold the data which is much more recent than the data in 
 other segments.
 This is not a big problem until we delete the data which hasn't been fully 
 consumed.
 In this case we loose data and this makes it a big problem.
 Is it possible to retain segment mtime during initial replication on a new 
 node?
 This will help not to load the new node with the twice as large amount of 
 data as other nodes have.
 Or maybe there are another ways to sort segments by data creation times (or 
 close to data creation time)? (for example if this ticket is implemented 
 https://issues.apache.org/jira/browse/KAFKA-1403, we may take time of the 
 first message from .index). In our case it will help with kafka emergency 
 cleaner, which will be deleting really the oldest data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)