Re: deleting data automatically
Thank you! On Mon, Jul 27, 2015 at 1:43 PM, Ewen Cheslack-Postava e...@confluent.io wrote: As I mentioned, adjusting any settings such that files are small enough that you don't get the benefits of append-only writes or file creation/deletion become a bottleneck might affect performance. It looks like the default setting for log.segment.bytes is 1GB, so given fast enough cleanup of old logs, you may not need to adjust that setting -- assuming you have a reasonable amount of storage, you'll easily fit many dozen log files of that size. -Ewen On Mon, Jul 27, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you! what performance impacts will it be if I change log.segment.bytes? Thanks. On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io wrote: I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search for log.retention.check.interval.ms here: http://kafka.apache.org/documentation.html As for setting the values too low hurting performance, I'd guess it's probably only an issue if you set them extremely small, such that file creation and cleanup become a bottleneck. -Ewen On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen -- Thanks, Ewen -- Thanks, Ewen
Re: deleting data automatically
If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen
Re: deleting data automatically
As I mentioned, adjusting any settings such that files are small enough that you don't get the benefits of append-only writes or file creation/deletion become a bottleneck might affect performance. It looks like the default setting for log.segment.bytes is 1GB, so given fast enough cleanup of old logs, you may not need to adjust that setting -- assuming you have a reasonable amount of storage, you'll easily fit many dozen log files of that size. -Ewen On Mon, Jul 27, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you! what performance impacts will it be if I change log.segment.bytes? Thanks. On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io wrote: I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search for log.retention.check.interval.ms here: http://kafka.apache.org/documentation.html As for setting the values too low hurting performance, I'd guess it's probably only an issue if you set them extremely small, such that file creation and cleanup become a bottleneck. -Ewen On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen -- Thanks, Ewen -- Thanks, Ewen
Re: deleting data automatically
Thank you! what performance impacts will it be if I change log.segment.bytes? Thanks. On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io wrote: I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search for log.retention.check.interval.ms here: http://kafka.apache.org/documentation.html As for setting the values too low hurting performance, I'd guess it's probably only an issue if you set them extremely small, such that file creation and cleanup become a bottleneck. -Ewen On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen -- Thanks, Ewen
Re: deleting data automatically
I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search for log.retention.check.interval.ms here: http://kafka.apache.org/documentation.html As for setting the values too low hurting performance, I'd guess it's probably only an issue if you set them extremely small, such that file creation and cleanup become a bottleneck. -Ewen On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen -- Thanks, Ewen
deleting data automatically
Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.!
Re: deleting data automatically
You can configure that in the Configs by setting log retention : http://kafka.apache.org/07/configuration.html Thanks, Mayuresh Sent from my iPhone On Jul 24, 2015, at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.!
Re: deleting data automatically
You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen